If you read my previous blog post, you'll know that I was suffering what I thought was a potentially catastrophic data loss.
A Quick Recap
It started with a drive which was exhibiting bad blocks and needed to be removed from my Windows Home Server. When the drive finally failed, Home Server refused to remove the drive with a message of "File conflicts prevented the hard drive removal":
I attached a 2TB drive (which became E:) and tried copying all the data from D:\Shares. I got lots of instances of "File creation error - The device is not connected" errors:
I was starting to panic, but decided to do some spelunking around the file systems to see what I could find, and see if the missing data was recoverable.
How WHS Replicates Data
I'm making some estimated guesses here about how WHS works based on the recovery effort I made on Christmas Eve.
First, let's talk about mount points. The primary hard drive (in my system, the one in drive tray 1) is the C: drive of the system. All the other hard drives are mounted under C:\fs. All drives are formatted with NTFS. So far, there's nothing special going on here. You can browse to this folders and see what's on each drive, and interact with those files (though you should be careful not to edit or delete anything here).
Then we come to the magical D: drive, which is the combined storage of all the hard drives. It's a virtual drive of sorts. All the shares are created in D:\shares. When files are placed into the shares, they actually end up in the C:\fs mount area, depending on which drive the system decides a file should belong on (it will actually dynamically rearrange data so that all drives are used in a balanced fashion).
The system apparently maintains a database of places to find the files when users are accessing the shares. When data is duplicated, identical copies are placed on two different hard drives, and the database which says where to find the data alternates which drive to find the files on.
For a simplified example, assume you have a single duplicated share and two hard drives. You place 10 files in there. All 10 files will exist on both drives, but in order to level the wear on each drive, the database index will say to find 5 of the files on drive 1, and the other 5 files on drive 2.
So when a drive has failed, but the system refuses the remove the drive, you have a situation where some of the files pointed to by the database will not be readable (because of the failed drive). That's exactly what you're seeing in the XCOPY above: those files which refuse to be read are the ones residing on the dead drive.
But the data is duplicated, so it also exists on another drive. We just don't know which drive, and the database won't look at the other drive for us.
I installed the Windows Server 2003 Resource Kit so I could get access to ROBOCOPY, and ended up with this command to fill in the gaps for me:
for %y in (Music,Photos,Public,Television,Videos) do @for /D %x in (C:\fs\*.*) do robocopy /ndl /xx /e /njh /njs /xa:hs %x\DE\shares\%y E:\%y
The first list is the names of all the shares I wanted data from. The E: drive is where I was copying the data to. The reason to use ROBOCOPY is because it will only copy data that doesn't already exist, so it's just going to fill in the holes (I already had all the other data copied).
A little detective work (and ROBOCOPY) rescued what was certain to be a horrible data loss and put it into the WIN column. I'm still not very happy that WHS refused to remove the dead drives, which meant I had to completely repave the WHS machine and copy the data onto it. At least, though, I learned a lot about what WHS was doing it and will be more comfortable if there's a future drive issue like this.