MySQL to PostgreSQL Migration Tool

Project Date: 2025-09-17
Client: Saif Islam

This tool migrates MySQL schema and data to PostgreSQL. It creates missing tables/columns in PostgreSQL from the MySQL schema and then copies data efficiently in batches.

Key features

  • Schema sync: creates missing tables/columns in PostgreSQL using MySQL schema
  • Large-data support: keyset/limit pagination and batch inserts
  • Resume support: tracks last processed primary key per table in migration_state.json
  • Safe bulk-load options: optional truncate-before-load, optional trigger disable/enable
  • Type mapping and coercion (e.g., TINYINT(1)BOOLEAN, ints to timestamps where needed)
  • Duplicate-safe inserts via ON CONFLICT DO NOTHING
  • Sequence reset for SERIAL/IDENTITY

Prerequisites

  • Python 3.x
  • psql and MySQL clients (optional but helpful for verification)

Folder contents

  • migrate_db.py: main migration script
  • set_env.sh: your local DB credentials (do not commit)
  • README.md: this guide
  • migration_state.json: auto-generated resume file (safe to delete)
  • connect_db.sh (optional): helper to diff schemas with mysqldump/pg_dump

Setup

  1. Go to the project directory
cd /mysql-to-postgresql-migration/migration-db
  1. Provide database credentials in set_env.sh
#!/bin/bash
export MYSQL_HOST='localhost'
export MYSQL_USER='root'
export MYSQL_PASSWORD='12345678'
export MYSQL_DATABASE='dc_live_2024'
export DATABASE_URI='postgresql://USER:PASS@HOST:PORT/DBNAME'
  1. Install Python dependencies
python3 -m pip install --user --upgrade mysql-connector-python psycopg2-binary --break-system-packages

How to run (terminal)

Basic run (schema sync + full data migration):

source ./set_env.sh
python3 -u migrate_db.py

Recommended high-performance run for large datasets:

source ./set_env.sh
BATCH_SIZE=10000 \
PRESERVE_IDS=true \
TRUNCATE_BEFORE_LOAD=true \
DISABLE_TRIGGERS=true \
PRINT_PROGRESS_EVERY=100000 \
python3 -u migrate_db.py

Migrate only specific tables:

source ./set_env.sh
ONLY_TABLES=client,casestudy,casestudy_client_rel python3 -u migrate_db.py

Skip specific tables:

source ./set_env.sh
SKIP_TABLES=queue,log_table python3 -u migrate_db.py

Environment options

  • BATCH_SIZE (default: 10000): rows per batch
  • PRESERVE_IDS (default: true): keep MySQL IDs when possible
  • TRUNCATE_BEFORE_LOAD (default: false): truncate target tables first
  • DISABLE_TRIGGERS (default: false): set session_replication_role=replica during load
  • PRINT_PROGRESS_EVERY (default: 100000): progress log frequency
  • ONLY_TABLES: comma-separated whitelist of tables to migrate
  • SKIP_TABLES: comma-separated list of tables to skip
  • RESUME_FILE (default: migration_state.json): resume state file path

migration_state.json is created automatically and lets you resume large migrations. If you set TRUNCATE_BEFORE_LOAD=true, the resume state for each truncated table is reset automatically.

Verifying in PostgreSQL

Check counts for a few tables:

psql "$DATABASE_URI" -At -c "select 'casestudy', count(*) from casestudy union all select 'client', count(*) from client;"

Check/reset a sequence if needed (usually automatic):

# Show current max(id) and sequence last value
psql "$DATABASE_URI" -At -c "select max(id) from activity_log; select last_value from public.activity_log_id_seq;"

# Manually reset (if required)
psql "$DATABASE_URI" -c "select setval('public.activity_log_id_seq'::regclass, (select coalesce(max(id),0) from activity_log), true);"

Notes

  • Unused files removed: the old Node script (migrate.js) and editor/venv folders were deleted from this folder. The Python script is the single source of truth.
  • connect_db.sh is kept as an optional helper to diff schemas using mysqldump/pg_dump.

Cleanup and .gitignore

Recommended cleanup commands (safe):

cd /mysql-to-postgresql-migration/migration-db
rm -f migration_state.json # optional: reset resume state
# rm -f connect_db.sh      # optional: if you don't use the schema diff helper

If this folder is under git, add:

set_env.sh
migration_state.json

Troubleshooting

  • Ensure set_env.sh is correct and both databases are reachable
  • Managed PostgreSQL may not allow DISABLE_TRIGGERS=true; if you see permission errors, set it to false
  • If types clash (e.g., integer to timestamp), the script applies safe coercions; adjust data manually if needed for custom cases

Technologies Used

Backend Technologies

python

Database & Storage

MYSQL Postgress

Project Details

Client: Saif Islam
Project Date: 2025-09-17
Published: November 30, 2025
×