I had a client with a broken LDAP database on their OS X Server today. The power had failed and the UPS didn’t do it’s thing, so the power was yanked from the server.
Upon rebooting, things weren’t looking too good – no-one could authenticate over the network to it. Logging in as the localadmin on the console worked, but I couldn’t authenticate to it as anyone via the VPN (which the server was hosting, not a firewall). Fortunately I could ssh in to it, and that’s where it all started from.
The logs were showing a heap of errors with slapd, and my first guess was that the database had hosed itself when the power went out. It’s supposed to run an automatic recovery, but this wasn’t working either.
After checking the logs, I tried running slapd manually in tool mode to test the database:
[localadmin@hostname /var/log]$ sudo /usr/libexec/slapd -Tt overlay_config(): warning, overlay "dynid" already in list overlay_config(): warning, overlay "dynid" already in list overlay_config(): warning, overlay "dynid" already in list overlay_config(): warning, overlay "dynid" already in list overlay_config(): warning, overlay "dynid" already in list bdb_db_open: unclean shutdown detected; attempting recovery. bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered. config file testing succeeded [localadmin@hostname /var/log]$
Well, that sure didn’t look good. A lot of the guides I saw online dealt with OpenLDAP running on Linux, and they generally recommended running slapd_dp_recover – unfortunately this command doesn’t exist on Mac OS X. It turns out that it’s simply named do_recover so I tried running it but that had a whole heap of errors happening:
[localadmin@hostname /var/log]$ sudo db_recover -h /var/db/openldap/openldap-data db_recover: Log sequence error: page LSN 6 5493846; previous LSN 6 5840038 db_recover: Recovery function for LSN 6 5840196 failed on forward pass db_recover: PANIC: Invalid argument db_recover: PANIC: fatal region error detected; run recovery ... snip... db_recover: PANIC: fatal region error detected; run recovery db_recover: DB_ENV->open: DB_RUNRECOVERY: Fatal error, run database recovery [localadmin@hostname /var/log]$
This was also not looking very good. After trying to interpret the error messages and reading a few other pages regarding openldap on the web, it sounded like the logs may have become corrupt, so I backed them up and deleted them.
I then ran db_recover once more, and this time it didn’t report any errors. Finally, running slapd in tool mode again didn’t throw up any errors, just a couple of warnings so I rebooted the box and everything is looking a lot better now. Except for the VPN service still not letting me in, but that’s another problem for another day…