One of the more difficult-to-debug scenarios in .NET is failure to load an assembly, especially as you create new App Domains programmatically.
The system which is responsible for resolving the loading of assemblies is called Fusion. The work of assembly loading can often be complex, so there is a tool which ships with the Windows SDK (and Visual Studio) call the Fusion Log Viewer (fuslogvw.exe). Suzanne Cook's blog contains extensive information about Fusion and its inner workings.
Most assembly loading issues can usually be resolved by judicious use of the Fusion Log Viewer. You can ask the tool to log binding successes and/or failures and then review the logs as necessary to see exactly what the system did and did not do on your behalf. Note that any changes you make to Fusion logging settings only affect an application when it starts up, so you will need to stop and restart your application after making changes.
Fusion Log Viewer never failed me until Wednesday, when Jim and I spent several hours debugging code that it took us about 10 minutes to write. I wanted to document the process we went through here, in case others have the same problem, since we weren't really able to find sufficient information about what was going on, and had to do a lot of guess work and spelunking to find the solution.
Here's the problem as it manifested itself to us. If you're not interested in the arcane details of creating App Domains and using cross-App Domain remoting, then go on outside and play and be happy that these issues don't affect you. :)
When you create an App Domain and ask to load assemblies into it, those assemblies usually need to reside on the disk in a path that is relative to the App Domain's ApplicationBase path. So when we create an App Domain for running the tests, we set the ApplicationBase path to the path containing the DLL that is being tested, which should also contain xunit.dll (adding a reference to the DLL copies it into your BIN directory during build).
The xUnit.net console runner (xunit.console.exe) creates a separate App Domain to run the test code in, so that the test code is completely isolated from the runner. When you run the console runner, it contains a copy of xunit.dll with it as well in the same directory, so the base App Domain and test App Domain both contain copies of xunit.dll within the appropriate ApplicationBase path areas. So far, so good. (Another day, we'll talk about what might happen if those two xunit.dll files are not the same version.)
When we create our test App Domain, we create an instance of a class named RemoteAssemblyCommand in the test App Domain. Short answer is that this class runs the tests for us, but it lives in the test App Domain instead of the base App Domain. We use Assembly.CreateInstanceAndUnwrap() against the RemoteAssemblyCommand type, which derives from MarshalByRefObject, so it stays there and we get a transparent proxy that allows us to call it remotely.
Whew! Trust me, setting this stuff up to work the first time is non-trivial. Even though Jim and I were both very familiar with App Domains, we still spent several weeks making sure all scenarios worked correctly.
The problem we hit came when we making the MSBuild task. The key difference here is that the base App Domain is created by MSBuild.exe, not xunit.console.exe, so the base App Domain's ApplicationBase path is actually C:\Windows\Microsoft.NET\Framework\v2.0.50727. This is important. (Where's the blink tag when you need it? :-p)
We built our MSBuild task using code
purloined copied from the console runner, with calls to Console.WriteLine() replaced with Log.LogXxx(). We kept it really simple to start, just to make sure it was working correctly.
It failed. Spectacularly, and strangely.
Our call to CreateInstanceAndUnwrap() was telling us that the thing we created was not castable to RemoteAssemblyCommand. So we fired up the debugger and watched the whole process, including watching as RemoteAssemblyCommand was indeed instantiated properly from inside the test App Domain. It's just that when we popped back out the other side, the thing we had said it was a MarshalByRefObject but knew nothing of its RemoteAssemblyCommand heritage.
We couldn't really understand why. Identical versions of xunit.dll were loaded into both App Domains successfully. We thought for a while that it might be because there were two different xunit.dll files being loaded (one from the directory that contained the MSBuild task, and one from the directory that contained the test DLL), but that wasn't the problem, either.
We fired up Fusion Log Viewer, and poked through some of the resolution failures, but there wasn't actually any failures going on during the time when we ended up with the casting problem. On a lark, Jim suggested that we copy xunit.dll along side MSBuild.exe, which resolved the problem. We definitely had a Fusion binding problem, but it wasn't being logged.
That didn't make any sense, though. The xunit.dll assembly was already loaded in our App Domain. Why would it feel the need to attempt to re-load it?
To confirm our suspicions that this was an unlogged binding failure, we attached to the AssemblyResolve event from the base App Domain (System.Reflection.AppDomain.CurrentDomain.AssemblyResolve). Fusion will call this to allow you to help resolve any failed assembly lookups. Sure enough, it got called during the point where we would normally have failed, and as we suspected, it was trying to load xunit.dll. We inserted code in there to returning a copy of the existing assembly, and everything worked again.
We're still not sure why Fusion decided it wanted to re-load the xunit.dll assembly when it was already loaded into the App Domain. It is clear now, though, that Fusion had always been doing it, except in the case of the console runner, it always found xunit.dll sitting right there next to the runner and so it succeeded. It wasn't until we tried to do the same thing from MSBuild that it failed, because our base App Domain's ApplicationBase path was now pointing under Windows instead of at the console runner.
Hopefully this will help someone else when they hit the same problem. :)