MySQL Binary Logs and Replication

If you’re a MySQL admin and use replication in either a master-slave or master-master setup, you no doubt had replication fail at one point or another.  This situation can be quite scary and fixing it can range from being simple to being very complex where rebuilding a database server is the only option.

Hopefully I can lend a few tips and tricks to help you resolve your issue.

Binary Logs

MySQL binary logs can be thought of as files that contain a record of SQL statements, in chronological order, that would alter your data.  Some types of commands that are stored are CREATE, INSERT, DELETE, etc.  These commands are stored with the position at which they occurred.  So, lets say for instance you had database at position 10 (real world positions are much larger), a dump taken at position 5, had the binary logs that contained 5-10.  Then, with only the dump and binary log, you can rebuild your database to position 10.

Backup Your Data

First things first, I hope you regularly backup your data.  There are 2 easy ways to backup your data, full dump with master data (contains binary log information) and databases only.  The problem with database only is that the most you can restore to is that point in time that snapshot was taken at and you would need to rebuild all your DB servers.  At least with master data, you have the binary logs to rebuild to in case your file became corrupted and can rebuild just the server that went down.

Master Data:
mysqldump –all-databases –master-data -u user -ppass > mydata.sql

Databases Only:
mysqldump –all-databases –single-transaction -u user -ppass > mydata.sql

Show Slave Status

Open mysql command line interface and connect it to your running MySQL server box, then run:

SHOW SLAVE STATUS \G

There are 4 fields that are extremely useful to look at Slave_IO_Running, Slave_SQL_Running, Seconds_Behind_Master and Last_Error.

Slave_IO_Running – Should be yes unless STOP SLAVE was issued.

Slave_SQL_Running – Tells if the commands from thy logs are running.

Seconds_Behind_Master – This should normally be near 0, a high number (or NULL) indicates that the slave is far behind the master.  I should mention, this is not really the number of seconds behind the master, but more of the bigger the number, the farther behind master.

Last_Error – The last SQL statement that caused an error when running the relay log.  We want this to be blank.

Fixing The Error

I see a lot of posts out there that say just do SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; and while this will hopefully get you past the error, you really should know why the statement is failing.  If you just run that command, you will most likely lose data.

Ask yourself, if you skip the SQL statement, will it be bad for my data?  If it’s just a inserting logging data, it’s not such a big deal, but if it inserts an order into your ecommerce website, you will lose that order in the slave database.  The master and slave databases will then be inconsistent and will not mirror each other.  I first like to verify if that SQL statement executed on the master.  If it was an insert, see if it inserted, if it was a delete, see if it deleted.

After you verify the failing statement, and only after you know there aren’t any dire side effects, then you can run SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; .  This will skip the current error statement and try the next.  Remember, never skip more then 1 at a time, that this record may be a key to another insert (foreign key) and might cause a chain reaction, and never skip if you cant recover lost data.

Catastrophe Strikes!

You skipped a bunch of statements, perhaps your file is corrupted, or maybe the drive just died.  Any one of these things could happen, just know when to tap out.  As a last resort, I just go onto the master server and create a dump with –master-data as described above, then on the slave, make sure MySQL is not running, delete the data and binary log files, start MySQL and import the dump via command line.  After the import, run SHOW SLAVE STATUS to find out the master file and position, then run CHANGE MASTER TO to set all slave information and also using the values that were imported (and shown in slave status) set MASTER_LOG_FILE,  MASTER_LOG_POS appropriately (needs to be done in 1 command).  The CHANGE MASTER TO must contain all your slave info, then when done, START SLAVE.  This will catch up to the master in a short time.

Conclusion

There are many ways to deal with MySQL and as time goes on, you’ll pick up tricks.  Always have a local and remote backup and never panic.  And remember, call a DB admin if you can’t figure it out before it becomes unrepairable.