MySQL to PostgreSQL Migration Tool
This tool migrates MySQL schema and data to PostgreSQL. It creates missing tables/columns in PostgreSQL from the MySQL schema and then copies data efficiently in batches.
- Schema sync: creates missing tables/columns in PostgreSQL using MySQL schema
- Large-data support: keyset/limit pagination and batch inserts
- Resume support: tracks last processed primary key per table in
migration_state.json - Safe bulk-load options: optional truncate-before-load, optional trigger disable/enable
- Type mapping and coercion (e.g.,
TINYINT(1)→BOOLEAN, ints to timestamps where needed) - Duplicate-safe inserts via
ON CONFLICT DO NOTHING - Sequence reset for SERIAL/IDENTITY
- Python 3.x
psqland MySQL clients (optional but helpful for verification)
migrate_db.py: main migration scriptset_env.sh: your local DB credentials (do not commit)README.md: this guidemigration_state.json: auto-generated resume file (safe to delete)connect_db.sh(optional): helper to diff schemas withmysqldump/pg_dump
- Go to the project directory
cd /mysql-to-postgresql-migration/migration-db
- Provide database credentials in
set_env.sh
#!/bin/bash
export MYSQL_HOST='localhost'
export MYSQL_USER='root'
export MYSQL_PASSWORD='12345678'
export MYSQL_DATABASE='dc_live_2024'
export DATABASE_URI='postgresql://USER:PASS@HOST:PORT/DBNAME'
- Install Python dependencies
python3 -m pip install --user --upgrade mysql-connector-python psycopg2-binary --break-system-packages
Basic run (schema sync + full data migration):
source ./set_env.sh
python3 -u migrate_db.py
Recommended high-performance run for large datasets:
source ./set_env.sh
BATCH_SIZE=10000 \
PRESERVE_IDS=true \
TRUNCATE_BEFORE_LOAD=true \
DISABLE_TRIGGERS=true \
PRINT_PROGRESS_EVERY=100000 \
python3 -u migrate_db.py
Migrate only specific tables:
source ./set_env.sh
ONLY_TABLES=client,casestudy,casestudy_client_rel python3 -u migrate_db.py
Skip specific tables:
source ./set_env.sh
SKIP_TABLES=queue,log_table python3 -u migrate_db.py
BATCH_SIZE(default: 10000): rows per batchPRESERVE_IDS(default: true): keep MySQL IDs when possibleTRUNCATE_BEFORE_LOAD(default: false): truncate target tables firstDISABLE_TRIGGERS(default: false): setsession_replication_role=replicaduring loadPRINT_PROGRESS_EVERY(default: 100000): progress log frequencyONLY_TABLES: comma-separated whitelist of tables to migrateSKIP_TABLES: comma-separated list of tables to skipRESUME_FILE(default: migration_state.json): resume state file path
migration_state.json is created automatically and lets you resume large migrations. If you set TRUNCATE_BEFORE_LOAD=true, the resume state for each truncated table is reset automatically.
Check counts for a few tables:
psql "$DATABASE_URI" -At -c "select 'casestudy', count(*) from casestudy union all select 'client', count(*) from client;"
Check/reset a sequence if needed (usually automatic):
# Show current max(id) and sequence last value
psql "$DATABASE_URI" -At -c "select max(id) from activity_log; select last_value from public.activity_log_id_seq;"
# Manually reset (if required)
psql "$DATABASE_URI" -c "select setval('public.activity_log_id_seq'::regclass, (select coalesce(max(id),0) from activity_log), true);"
- Unused files removed: the old Node script (
migrate.js) and editor/venv folders were deleted from this folder. The Python script is the single source of truth. connect_db.shis kept as an optional helper to diff schemas usingmysqldump/pg_dump.
Recommended cleanup commands (safe):
cd /mysql-to-postgresql-migration/migration-db
rm -f migration_state.json # optional: reset resume state
# rm -f connect_db.sh # optional: if you don't use the schema diff helper
If this folder is under git, add:
set_env.sh
migration_state.json
- Ensure
set_env.shis correct and both databases are reachable - Managed PostgreSQL may not allow
DISABLE_TRIGGERS=true; if you see permission errors, set it tofalse - If types clash (e.g., integer to timestamp), the script applies safe coercions; adjust data manually if needed for custom cases