Tuesday, August 28, 2012

Recovering from a Mysql crash

Our Bugzilla and Mediawiki was down today due to MySQL corruption. Mysql daemon was not running and attempting to connect would give us:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
The error log showed:
120828 14:55:33 InnoDB: Page checksum 3094081080, prior-to-4.0.14-form checksum 1158860183 InnoDB: stored checksum 841888054, prior-to-4.0.14-form stored checksum 808333620 InnoDB: Page lsn 1529887023 1298231855, low 4 bytes of lsn at page end 824192288 InnoDB: Page number (if stored to page already) 942551086, InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 976564768 InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 34110.InnoDB: You may have to recover from a backup. InnoDB: It is also possible that your operating InnoDB: system has corrupted its own file cache InnoDB: and rebooting your computer removes the InnoDB: error. InnoDB: If the corrupt page is an index page InnoDB: you can also try to fix the corruption InnoDB: by dumping, dropping, and reimporting InnoDB: the corrupt table. You can use CHECK InnoDB: TABLE to scan your table for corruption. InnoDB: See also http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. InnoDB: Ending processing because of a corrupt database page. 120828 14:55:33 [Note] Plugin 'FEDERATED' is disabled. InnoDB: Log scan progressed past the checkpoint lsn 0 2062642898 120828 14:55:33 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Doing recovery: scanned up to log sequence number 0 2062644241 120828 14:55:33 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 6 4 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed 120828 14:55:33 InnoDB: Started; log sequence number 0 2062644241 120828 14:55:33 [Note] Event Scheduler: Loaded 0 events 120828 14:55:33 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.49-1ubuntu8.1' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)
So visiting the page as suggested gave us the hint we needed.

Solution 1

There is no way you can guarantee no disruption. The database will need to go down for repair and no process should be able to interfere with that so I recommend to listen to a different port as you ca see below:
  1. Communicate there will be disruption to the proper channels. The database will be unavailable
  2. Edit the mysql configuration file to force a MySQL InnoDB recovery and start mysql in a different port:
    $ sudo vi /etc/mysql/my.cnf
    ...
    [client]
    port            = 13306
    ...
    ...
    [mysqld]
    innodb_force_recovery = 2
    ...
    
  3. Restart MySQL
  4. Backup the databases:
    mysqldump --routines -P 13306 -u root -p --databases bugzilla wiki > ~/db-backup.sql
    
  5. Comment out the "innodb_force_recovery" line from /etc/mysql/my.cnf
  6. Restart MySQL
  7. Restore from backup
    mysql -P 13306 -u root -p < ~/db-backup.sql 
    
  8. Change back the port to the original
  9. Restart MySQL

Solution 2

If the above does nnot fix the issues you can probably try a couple of other recipes, however the one that should not fail is reinstalling mysql:
  1. Communicate there will be disruption to the proper channels. The database will be unavailable
  2. Edit the mysql configuration file to force MySQL listening in a different port:
    $ sudo vi /etc/mysql/my.cnf
    ...
    [client]
    port = 13306
    ...
    
  3. Restart MySQL
  4. Make sure you have a file (like security.sql) with all permissions saved in a secure place. You can extract such information from the current db using show-grants-mysql.sh
  5. Backup the databases:
    $ databases="mydb"
    $ mysqldump --routines -P 13306 -u root -p --databases $databases > ~/db-backup.sql
    
  6. Backup my.cnf in a safe place (You should really keep this in a version control repository and use POB recipes to deploy it in the server instead of touching the file directly.
  7. Reinstall mysql using a POB recipe but from the local machine:
    $ wget https://raw.github.com/nestoru/pob-recipes/master/common/mysql/mysql-reinstall.sh
    $ chmod +x mysql-reinstall.sh
    $ ./mysql-reinstall.sh
    
  8. Restore the my.cnf file from your version control or (less ideal as said above) from the backup you have made
  9. Change the port to 13306 as explained above
  10. Restart mysql (service mysql restart)
  11. Restore mysql databases from the backup
    $ mysql -P 13306 -u root -p < ~/db-backup.sql
    
  12. Run security.sql script to set the correct permissions. Be sure to delete that file after
  13. Change port to original 3306 a explained before
  14. Restart mysql (service mysql restart)

3 comments:

Deryl Spielman said...

Be careful with the step find / -name 'mysql*' -exec rm -fr {} \; as this could remove any files on the filesystem that start with mysql. We also had an issue with php5 mysql dependencies being uninstalled when using aptitude, which cause our Wikimedia to fail.

Nestor Urquiza said...

@Deryl, Thanks for the feedback. I have updated now the command for:

#Take a peak at the mysql files still in your system and decide if you still need them
$ find / -name 'mysql*'

Nestor Urquiza said...

I actually built a POB recipe that does the trick thanks to the apt-get purge. Post has been updated, get the file from https://raw.github.com/nestoru/pob-recipes/master/common/mysql/mysql-reinstall.sh

Followers