Friday, 18 July 2008

Time Machine fails on 10.5.4 when interrupted by sleep

Ok, so I've been using my QNAP TS-109 NAS box as a remote Time Machine server for my various home Macs for several months now. Everything's been going swimmingly and for the first time in years I've felt a little more secure knowing that my files are all backed up to several generations, without me having to do anything.

However... in the last couple of days, first one then another of the backups failed and wouldn't restart. The symptoms were Time Machine errors saying that the target volume was read-only.

I ran a couple of quick checks -- first seeing if I could open the sparse bundle used for the backup image directly and try to repair it using Disk Utility -- but it wouldn't even mount. Then I checked in the logs to see exactly what happened when.

It turns out that in both cases, the backup failed when the machine was put to sleep during a Time Machine run. When the Mac woke up, it tried to automatically reconnect to the SMB share and logged a failure message such as the following:

Jul 16 15:34:48 imac kernel[0]: smbfs_smb_reopen_file: Reopen 2b with open deny modes failed because the modify time has changed was 1216197810s 0ns now 1216197813s 0ns!

These messages were often succeeded by various other disk errors and culminated in the backupd process returning an error when it was trying to do something. I have two examples of the backupd failures:

Jul 16 15:34:53 imac /System/Library/CoreServices/backupd[72423]: Error -47 deleting backup: /Volumes/Backup of iMac/Backups.backupdb/iMac/2008-07-15-084358

and

Jul 17 21:42:05 ruth_macbook /System/Library/CoreServices/backupd[6543]: Error: (-50) copying /Users/ruth/Library/Calendars/Calendar Sync Changes to /Volumes/Backup of Ruth's MacBook/Backups.backupdb/Ruth's MacBook/2008-07-17-211219.inProgress/A11E4033-1FAB-457A-96FA-B23B3FED2579/Macintosh HD/Users/ruth/Library/Calendars

but both appear to be due to the smbfs error above.

I've also just recently upgraded all my machines to 10.5.4. Given that it's quite unlikely that two machines would start to fail so close to each other, I suspect that the latest OS upgrade may have introduced a bug.

Other people seem to have encountered similar issues, and not only with unsupported NAS boxes but with official Time Capsules too.

I tried attaching the backup image without mounting it and then running Disk Utility (as suggested here), but Disk Utility wasn't happy then either -- giving me "sibling link" errors. Running fsck by itself just results in "BAD SUPER BLOCK: MAGIC NUMBER WRONG" which is kind of worrying. I haven't yet tried using DiskWarrior, though it's probably worth the $99 for some peace of mind.

I'm going to try a couple of things next:

  1. Check the NAS disk itself for corruption, just in case.
  2. Buy DiskWarrior and see if that fares any better.

I guess after that, I may have to wipe the backup and start again. At least it's a backup and I don't need anything from it right now (famous last words...).

A few days later...

The NAS disk itself seems fine -- I successfully backed up to another image on the same disk with no trouble.

I bought DiskWarrior and ran it against one of the corrupted backup images, but this didn't work either.

Please post a comment if you've had something similar happen to you. Especially if you've been able to fix things!

What I have noticed is that the Time Machine Options (under the Time Machine Preference pane) has a checkbox labelled "Back up while on battery power". Since I only tend to put my machines to sleep when they're on battery power, avoiding backups completely during this time should avoid this corruption issue for the moment.

Backups are now going fine again, just not on battery power.