Chris' journal has moved to chris.boyle.name - Notes from a disk rescue [entries|archive|friends|userinfo]
Chris Boyle

[ website | chris.boyle.name ]
[ userinfo | insanejournal userinfo ]
[ archive | journal archive ]

Notes from a disk rescue [Jul. 3rd, 2008|04:09 pm]
Previous Entry Add to Memories Tell a Friend Next Entry

[Tags|]
[Mood | annoyed]
[Music |Mesh - Not Prepared]

For 9 or 10 months, I've had a media server at home running MythTV on Ubuntu. It does the usual DVR operations and network streaming fairly well (there's only one tuner, so we can only receive one channel at once, but it's quite possible to, for example, watch two recordings at once in different places). On Monday, we noticed that machine was complaining of disk errors in syslog.

Rescuing 500GB is not fun, but every last byte was eventually recovered. The main tool used was GNU ddrescue (not to be confused with dd_rescue). It can do partial copies (leaving blanks where errors occur), keep a state file to cope with interruptions, go back to the areas with errors and do binary chops to retrieve as much information as possible. That's essentially what I did (first pass -n, then ran with -r 3 a few times, using a log file (and sacrificing a steady stream of small rodents) throughout). A very odd thing happened when the first pass hit the first few errors, at about 380GB: the speed of copying slowed down from about 50MB/s to about 7MB/s. At first I thought some increased error correction/paranoia had kicked in somewhere in the hardware, but I realised the disk or head might be in a physically disadvantageous state, so as a shot in the dark, I stopped the copy, did hdparm -Y (sleep command) and resumed. This worked; it was 50MB/s again. Determining why this voodoo worked is left as an exercise for the reader. :-)

Getting into an environment where I could run ddrescue was slightly complicated, because the only big enough disk I had available to rescue onto was the replacement disk, both of them are SATA, and no machine in the building has more than two SATA ports, meaning I needed to borrow the IDE disk from [info]dougalwuff's machine to boot (I didn't like the idea of doing this from a CD, needed somewhere to keep the state file and had no USB keys in the building). Since the new disk is twice the size, I could have done a temporary install onto it first, but I wouldn't then have been able to easily grow the filesystem afterwards to the full size of the disk (asking said temporary install to obliterate itself). As a final piece of silliness, I will of course need to redo grub onto the new disk by running grub-install on the media server itself (running it on the spare machine didn't seem to produce something bootable).
LinkReply

Comments:
From: [info]robhu.livejournal.com
2008-07-03 11:48 am (UTC)

(Link)

Thanks for that - that's helpful to know. I have a friend whose 2.5" SATA disk has physical damage which meant I was unable to mount her HFS+ partition. Perhaps if I make an image with gnu dd_image I'll be able to mount the image?

Did you not consider saving the log file in a ram disk (e.g. in /dev/shm or something), or are the logs very big (or your memory very small ;-))
[User Picture]From: [info]shortcipher
2008-07-03 12:03 pm (UTC)

(Link)

You may indeed be able to mount a copy created with ddrescue, perhaps after using a fsck-like tool (if such exists for HFS+) to turn FS-with-holes into something valid.

The log/state file is plain text, each line being an offset, size and status character for a region of the device (the logger merges regions when appropriate, so you aim to end up with a log saying approximately "0 500000000 +"). I didn't put it in RAM because otherwise, had there been a power cut, I would have had no record of which bits had been successfully copied, and would have had to start again, which is not good on a disk with a limited life expectancy. :-)
From: [info]mas90.livejournal.com
2008-07-03 04:44 pm (UTC)

(Link)

...Glad to see that my random spare box (now property of [info]dougalwuff) has yet further uses :-)