MySQL to PostgreSQL Migration Tool

This tool migrates MySQL schema and data to PostgreSQL. It creates missing tables/columns in PostgreSQL from the MySQL schema and then copies data efficiently in batches.

Key features

Schema sync: creates missing tables/columns in PostgreSQL using MySQL schema
Large-data support: keyset/limit pagination and batch inserts
Resume support: tracks last processed primary key per table in migration_state.json
Safe bulk-load options: optional truncate-before-load, optional trigger disable/enable
Type mapping and coercion (e.g., TINYINT(1)→BOOLEAN, ints to timestamps where needed)
Duplicate-safe inserts via ON CONFLICT DO NOTHING
Sequence reset for SERIAL/IDENTITY

Prerequisites

Python 3.x
psql and MySQL clients (optional but helpful for verification)

Folder contents

migrate_db.py: main migration script
set_env.sh: your local DB credentials (do not commit)
README.md: this guide
migration_state.json: auto-generated resume file (safe to delete)
connect_db.sh (optional): helper to diff schemas with mysqldump/pg_dump

Setup

Go to the project directory

cd /mysql-to-postgresql-migration/migration-db

Provide database credentials in set_env.sh

#!/bin/bash
export MYSQL_HOST='localhost'
export MYSQL_USER='root'
export MYSQL_PASSWORD='12345678'
export MYSQL_DATABASE='dc_live_2024'
export DATABASE_URI='postgresql://USER:PASS@HOST:PORT/DBNAME'

Install Python dependencies

python3 -m pip install --user --upgrade mysql-connector-python psycopg2-binary --break-system-packages

How to run (terminal)

Basic run (schema sync + full data migration):

source ./set_env.sh
python3 -u migrate_db.py

Recommended high-performance run for large datasets:

source ./set_env.sh
BATCH_SIZE=10000 \
PRESERVE_IDS=true \
TRUNCATE_BEFORE_LOAD=true \
DISABLE_TRIGGERS=true \
PRINT_PROGRESS_EVERY=100000 \
python3 -u migrate_db.py

Migrate only specific tables:

source ./set_env.sh
ONLY_TABLES=client,casestudy,casestudy_client_rel python3 -u migrate_db.py

Skip specific tables:

source ./set_env.sh
SKIP_TABLES=queue,log_table python3 -u migrate_db.py

Environment options

BATCH_SIZE (default: 10000): rows per batch
PRESERVE_IDS (default: true): keep MySQL IDs when possible
TRUNCATE_BEFORE_LOAD (default: false): truncate target tables first
DISABLE_TRIGGERS (default: false): set session_replication_role=replica during load
PRINT_PROGRESS_EVERY (default: 100000): progress log frequency
ONLY_TABLES: comma-separated whitelist of tables to migrate
SKIP_TABLES: comma-separated list of tables to skip
RESUME_FILE (default: migration_state.json): resume state file path

migration_state.json is created automatically and lets you resume large migrations. If you set TRUNCATE_BEFORE_LOAD=true, the resume state for each truncated table is reset automatically.

Verifying in PostgreSQL

Check counts for a few tables:

psql "$DATABASE_URI" -At -c "select 'casestudy', count(*) from casestudy union all select 'client', count(*) from client;"

Check/reset a sequence if needed (usually automatic):

# Show current max(id) and sequence last value
psql "$DATABASE_URI" -At -c "select max(id) from activity_log; select last_value from public.activity_log_id_seq;"

# Manually reset (if required)
psql "$DATABASE_URI" -c "select setval('public.activity_log_id_seq'::regclass, (select coalesce(max(id),0) from activity_log), true);"

Notes

Unused files removed: the old Node script (migrate.js) and editor/venv folders were deleted from this folder. The Python script is the single source of truth.
connect_db.sh is kept as an optional helper to diff schemas using mysqldump/pg_dump.

Cleanup and .gitignore

Recommended cleanup commands (safe):

cd /mysql-to-postgresql-migration/migration-db
rm -f migration_state.json # optional: reset resume state
# rm -f connect_db.sh      # optional: if you don't use the schema diff helper

If this folder is under git, add:

set_env.sh
migration_state.json

Troubleshooting

Ensure set_env.sh is correct and both databases are reachable
Managed PostgreSQL may not allow DISABLE_TRIGGERS=true; if you see permission errors, set it to false
If types clash (e.g., integer to timestamp), the script applies safe coercions; adjust data manually if needed for custom cases