Monday, 22 May 2006

Why keep a journal

Well, just the other day I was called by a colleague of my fathers and asked to help out with his ailing Mac G5. So, off I went to pay a visit thinking that I would delete a few cache files, run the Mac OS/X maintenance stuff and what not and be on my way.
What in fact happened is that I ended up taking his hobbling, unhappy Mac and euthanizing the poor thing. So what happened. Well, it boils down to it having a jounaled filesystem and yet still having a very sick filesystem, so much for journaling.
When I arrived I was informed that the system had been occasionally showing the bouncing beach ball of death, and with one particular account, was refusing to empty the trash. Apparently an attempt to run the repair permissions utility had resulted in a few repairs and then the program stopping responding with no errors in sight. So, I delete the trash contents for the rogue account by hand from the command line using a power user account, figuring that whatever program though it used those files would, when the account next logged in, complain or more likely create new cache files. I then ran the disk verify tool - oh dear. This reported problems with nodes and the catalogue. Being the boot volume I obviously couldn't run the repair program on it. So, before taking any action on the filesystem it seemed sensible to backup the all important data to. So we popped in a CD and asked the Mac to dump a load of stuff on it. While this was running I went off to do something a little more interesting. What I cam back to a little while later was an error message and a CD which when put back into the Mac couldn't be mounted. Oddly enough, a later attempt at reading the CD in a windows machine showed that not everything was written to the CD but a fair amount had been, so why the Mac wouldn't read it is anyones guess. Anyway, this made me believe trying to backup the dodgy filesystem was a doomed endeavor (200+Gb of doomed endeavor in fact, so I couldn't even "dd" somewhere as I don't have that much storage space anywhere else).
Well, having failed to backup the data I booted from the install CD and tried the repair program from there. Another oh dear moment ensued as I was informed that the repair tool couldn't salvage the disk. Next, I risked using fsck_hfs in a last ditch attempt and was informed by that program that it couldn't fix the disk either. So, having applecare for the system we phoned apple who confirmed I had taken appropriate action, but recommended using the TechTool Delux CD from the applecare pack which has another disk recovery program. The irony here as I found out is that the TechTool Delux program is no better equipped to repair dodgy filesystems than the apple built in tool, but the company that provide it will sell you a "professional" rather than Delux product that is apparently quite good at repairing such problems. Anyway, having done all these things and concluding that I couldn't fix the disk I decided I may as well go back to trying to backup the data to CD, having at this point found that the CD was readable on an NT machine.This is where I discovered I have euthanized the sickly Mac. You see, having tried and failed to repair the disk, the kindly repair programs had marked the disk as "bad". As a result I could no longer boot into a limping version of Mac OS/X because at each boot attempt the OS would notice and try to fix the "bad" disk which would obviously fail sadly resulting in the Mac switching off.
I then had the slightly embarrassing task of explaining to the Mac owner my attempts to fix his problems appeared to have made things worse. The explanation is that as the disk is journaled, the assumption of the OS is that it can't be bad so there is no point checking it at boot which means the system boots and throws odd errors or behaves strangely when it does try to use the bad bits of filesystem, examples of which would including failing on backups. However, as mentioned before the programs that would normally repair problem, as they had failed, marked the disk as bad thus overriding the OS's normally blaze approach to filesystem integrity at boot-up.
Anyway not wanting to leave the guy in a worse state than when I arrived, I invested in a copy of DiskWarrior which I found recommended in several books and favorably compared against other disk repair tools in web reviews. The use of this tool did solve the filesystem integrity problem, including making the disk now pass DiskUtility verification, meaning we could mount the disk as a firewire target and at least recover some data. The disk still wouldn't boot however.
In order to get the disk to actually boot, I had to then boot from the apple install CD and perform a rather time consuming "archive and install" which installs / re-installs the tiger OS around the user data. After this, and the then obligatory software update to get back up to date, voila. We now had a working system again.
So, what is my conclusion from this. Well, it's simple really, I still don't trust journaled filesystems. This is not limited to journaled HFS+, I have had similar issue with journaled filesystems on Solaris machines and it makes me wonder if journaled filesystems don't in fact not protect your data better but rather simply make machine boot-ups quicker while hiding some rather nasty issues away from you. Let's just hope the ZFS (or whatever it's now called) filesystem from Sun is all it's hyped to be rather than just another evolution of the existing journaling technologies I have had problems with. I'd also like to know why it is fsck_hfs doesn't understand journal files, I mean surely a repair program for a filesystem should understand how to repair all the varieties of that filesystem, rather than being likely to cause additional damage. I hope Sun provide a working fsk_zfs or whatever for their ZFS filesystem. If nothing else it makes people like me a little more comfortable to know you can on-line verify a disk, and if the journaling has let you down, repair it safely and simply.

No comments:

Post a Comment