Wednesday, January 28, 2009

How to repair a corrupt Firebird database?


Disconnect users and disable incoming connections to the database

* make a copy of database file (or two copies) and work on that
* use GFIX with -v option to validate the database file
* use GFIX with -v and -f to do full validation

If problem is not too serious, you can try to backup the broken db and restore under a new name:

* use GFIX -mend to prepare corrupt database for backup
* use GBAK -b -g to backup the database. -g disables garbage collection
* use GBAK -c to restore backup to a new database.

Tuesday, January 27, 2009

Firebird SuperServer running at 100% CPU?

This problem usually shows up when you have an UPDATE intensive database and the sweep process has to kick in periodically to clean up the old versions of records. This is also known as the garbage collection. When it starts running on the SuperServer, it takes most of the server's resources until it completes (This is due to be fixed in Firebird 3). The default sweep interval setting is 20000, meaning that it would kick in when the difference between the Oldest Interesting Transaction (OIT) and Oldest Active Transaction (OAT) becomes larger than that. Each transaction checks this gap when it starts and starts the sweep if interval is breached. Sweep interval is set on database basis, i.e. you can have a different setting for each database.

Cure to this problem is to disable the automatic sweep (by setting the interval to zero) and running the sweep in the off-line hours. This works if you don't have a 24/7 application. To set the sweep interval, you can use your administration tool (Database properties window in FlameRobin) or gfix:

gfix -sweep -user SYSDBA -password ***** myserver:/path/to/db.fdb

Sweep is automatically done by GBAK while backing up the database, unless you use -g option which disables it. If you run regular nightly backup, there is no need for a special sweep job.

Beside this, there is a known problem when you set the database cache to a very high value (high compared to total RAM available to the machine). When this happens, swap space starts to get used and the whole system slows down.

Why are sweep and mend exclusive operations?


Mend is used to prepare a broken database for backup. As it is broken, there is a high chance that sweep would fail - it's enough gfix needs to deal with valid records, why should it bother with outdated ones. In any case, if you have a corrupted database and are using mend to fix it, it doesn't make much sense to clear up the broken database file from old record versions. If you really want it, you can try to run sweep separately.Please note that mend is not normally required - only when you have a corrupt database. In such cases, you need to mend, backup and if all goes well, restore that backup. The newly restored database would not have any garbage anyway, so there's no need for sweep at all.

Is it dangerous to run sweep on live database with active users?

No. Sweep is just another process accessing the database. All other users can keep working without any problems. Sweep only cleans up the records that nobody is using, so there is no chance to lose some data or see some invalid data.The only issue with sweep is that it consumes a lot of server's resoures, and if you have a large database it might hog the server until it completes. So, even there are no data integrity concerns, if performance is important, it's better to run sweep on off-hours, or periods of low user activity.