Panopticon Central

a blog on Visual Basic, .NET and other stuff

  Home :: Contact :: Syndication  :: Login
  504 Posts :: 19 Stories :: 3895 Comments :: 659 Trackbacks

News

The information in this weblog is provided "AS IS" with no warranties, and confers no rights.

My Book

Picture

My Recent Posts

Article Categories

Archives

Post Categories

Microsoft Blogs

Samples

Technical Blogs

VB Links

Roy points to Philip’s complaint that VB still exhibits problems with multi-language solutions that have been around since the VS 2002 beta. Philip’s completely correct, and the explanation of why this bug still hasn’t been fixed even though we’ve known about it since before VS 2002 shipped bears some explanation. Specifically, the problem is with a mistake we made when designing our background compilation system a very long time ago. Since I’ve been asked more than a few times about how background compilation works, this is an excellent chance to delve into that subject. So let me talk about background compilation for a while and then we’ll get back to Philip’s bug.

“Background compilation” is the feature in VB that gives you a complete set of errors as you type. People who move back and forth between VB and C# notice this, but VB-only developers may not realize that other languages such as C# don’t always give you 100% accurate Intellisense and don’t always give you all of the errors that exist in your code. This is because their Intellisense engines are separate, scaled-down compilers that don’t do full compilation in the background. VB, on the other hand, compiles your entire project from start to finish as Visual Studio sits idle, allowing us to immediately populate the task list with completely accurate errors and allowing us to give you completely accurate Intellisense. As Martha would say, it’s a good thing.

However, doing background compilation is a tricky prospect. The problem is that just as soon as you’ve finished compiling the project in the background, the user is likely to do something annoying like edit their code. Once they do that, the application you just finished compiling is now incorrect – it doesn’t reflect the current state of the user’s code anymore. So, the question is: how do you handle that? The brute force way would be to throw away the entire result of the compilation and start over again. However, since Intellisense depends on compilation being mostly complete, this is impractical – given a reasonably large project, you may never get the chance to give Intellisense because by the time you’re almost done recompiling the whole project, the user has had the chance to type in another line of code, thus invaliding all the work you just did. You’ll never catch up.

To deal with this, we implement a concept we call “partial decompilation.” When a user makes an edit, instead of throwing the entire compilation state away, we figure out the smallest amount of stuff we can throw away and then keep everything else. Since most edits don’t actually affect the project as a whole, this means we can usually throw out minimal information and get back to being fully compiled pretty quickly. Here’s how we do it: each file in the project is considered to be in one of the following states at any one time:

  • NoState: We’ve done nothing with the file.
  • Declared: We’ve built symbols for the declarations in the file, but we haven’t bound references to other types yet.
  • Bound: We’ve bound all references to types.
  • Compiled: We’ve emitted IL for all the properties and methods in the file.

When a project is compiled, all the files in the project are brought up to each successive state. (In other words, we have to have gotten all files to Declared before we can bring any file up to Bound, because we need to have symbols for all the declarations in hand before we can bind type references.) When all the files have reached Compiled, then the project is fully compiled.

Now let’s say that a user walks up to a project that’s reached Compiled state and makes an edit to a file. The first thing that we have to do is classify the kind of edit that the user made. (Keep in mind that “an edit” can actually be an extremely complex one if the user chose to cut and paste one block of code over another block of code.) Edits can generally be broken down into two classifications:

  • Method-level edits, i.e. edits that occurs within a method or a property accessor. These are the most common and also the easiest to deal with because a method-level edit can never affect anything outside of the method itself.
  • Declaration-level edits, i.e. edits that occur in the declaration of a type or type member (method, property, field, etc). These are less common and can affect anyone who references them or might reference them anywhere in the project.

When an edit comes through, it’s first classified. If it’s a method-level edit, then the file that the edit took place in is decompiled to Bound. This involves the relatively small work of throwing away all the IL for the properties and methods defined in the file. Then we can just recompile all the methods and we’re back to being fully compiled. Not a lot of work. Say, though, that the edit is a declaration-level edit. Now, we have to do some more work.

Earlier, when we were bringing files up to Bound state, we kept track of all the intra-file dependencies caused by the binding process. So if a file a.vb contained a reference to a class in b.vb, we recorded a dependency from a.vb to b.vb. When we go to decompile a file that’s had a declaration edit, we call into the dependency manager to determine what files depend on the edited file. We then decompile the edited file all the way down to NoState, because we have to rebuild symbols for the file. Then we go through and decompile all the files that depend on the edited file down to Declared, because those files now have to rebind all their name references in case something changed (for example, maybe the class the file depended on got removed). This is a bit more work, but in most cases the number of files being decompiled is limited and we’re still doing a lot less work than doing a full recompile.

This is kind of the high-level overview of how it works – there are lots of little details that I’ve glossed over, and the process is quite a bit more complex than this, but you get the idea. I’m going to stop here for the moment and pick up the thread again in a few days, because there’s a few more pieces of the puzzle that we have to put into place before we get to explaining the bug.


When I last left off with background compilation, we were talking about how compilation and decompilation worked within a project. But what about between projects? This is where things start to get a little more interesting and problematic.

The simplest case is two VB projects that have a reference between them. Because they're both source code projects, we can handle compilation and decompilation by using the same scheme that we use inside of projects - in essence, all the VB projects in a solution look like one big project that happens to produce multiple assemblies. (This is why Mike has his problem with the compiler complaining about two types with the same name in two projects that don't reference each other, although we're planning to fix that particular error for Whidbey.) So we just track dependencies between files regardless of which project they are in and everything works fine. There are a few added complications because now we need to track a project's compilation level, but they aren't really worth mentioning.

Now let's talk about referenced assemblies such as mscorlib, System, System.Windows.Forms, etc. When you reference a compiled assembly, we have to build a symbol table for it just like we have to for a source project. The way we handle this is to treat each assembly as it's own "metadata project" with a single file in it (i.e. the assembly). Metadata projects go through the same compilation steps as source projects: NoState, Declared, Bound and Compiled. However, the compilation process for a metadata assembly can be much simplified, for the following reasons:

  • Since a metadata assembly is already compiled, the Compiled state has no meaning.
  • Metadata assemblies rarely, if ever, change.
  • Metadata assemblies only ever reference other metadata assemblies (since VS doesn't support circular builds, at least not in the IDE).

Since the Compiled state makes no sense for metadata assemblies, we ignore it. And since metadata assemblies don't really ever change (or change very infrequently) and only depend on other metadata assemblies, we can skip much of the decompilation work that I described in the first part of this discussion. In fact, we can make it very simple: when any metadata project changes, we just decompile every metadata project in the solution to NoState and decompile every source project down to Declared. Essentially, we throw away all metadata information and start over. Since metadata projects don't really change often, usually only when you add or remove a reference, this is a reasonable simplification.

Right? Wrong.

Unfortunately, much of the logic of the previous few paragraphs is faulty because it neglects something fairly significant: multi-language solutions. Let's say that you have a solution with a VB project and a C# project, and the VB project references the C# project. How does that C# project look to the VB project? It's not a source project because the VB compiler only understands VB code. Yes, that's right, class, it looks like a metadata project. And C# projects completely violate 2 of the 3 bullet points I listed above: they can change frequently (i.e. with every rebuild) and they can have references into VB source projects.

Here's where the train wreck happens. Let's say you've got three projects: VB1, VB2 and CS1. VB1 has a reference to VB2 and CS1. CS1 has a reference to VB2. When we go to load in the metadata for CS1, the compiler finds a reference to type Foo in VB2. But because we assume that metadata projects can't refer to source projects, we only look for Foo in the other metadata projects and fail to find it. So we mark Foo as a bad type. Now VB1 tries to call some method in CS1 that returns a Foo. As we compile the call, we notice that the type Foo is bad, so we generate an error that says something along the lines of "We can't find 'Foo' in 'VB2'. Add a reference to 'VB2.'" And, of course, the user ends up scratching his head because VB1 already has a reference to VB2. (Even better, it's possible to get error messages along the lines of "Can't convert type 'Foo' to type 'Foo'." if you try and use a Foo you got from CS1 and a Foo you got from VB2 together.)

OK, you say, that's bad. But why don't you just allow CS1 to lookup Foo in VB2's symbol table? Then this would all work. And it would. Until the first time you actually edited VB2 and caused the project to decompile. If you'll remember, we assumed that metadata projects don't need to participate in decompilation. So now VB2 decompiles and CS1's symbol table is left with a bogus pointer off into hyperspace because it didn't know how to handle the decompilation properly. (We could just decompile the world at this point, but the performance of that would be horrendous.)

The real fix is to make metadata projects work like source projects - make them track intra-file dependencies and decompile properly. The problem is that this fix requires a fundamental reworking of our project system, which is a major, major undertaking because we made so many faulty assumptions about metadata projects. Rewriting a basic piece of the compiler like this requires an extended period of time for stabilization and a massive amount of testing to ensure we got everything right. Since we found this very late in the VS 2002 cycle, it was too late to make a change of this magnitude without causing a significant slip to the entire VS/.NET/ASP.NET product agglomeration. And the VS 2003 cycle was way too short to do this kind of work. This will be fixed, I can promise everyone, for VS 2005.

All that said, I think it's worth acknowledging that this was a screwup on our part, plain and simple. There are reasons why multi-language solution testing came online so late, but none of the really justify the pain and annoyance that this has caused (and still causes) for customers. It doesn't really make it better, but we really apologize for not fixing this problem in time. As always, we strive to do better.

I will add that there is a workaround for the problem. If you create a reference between VB projects using a file reference (i.e. using the ".NET" tab and browsing to the actual DLL) instead of a project reference (i.e. using the "Projects" tab), then you'll force the compiler to see all the references as metadata projects and you won't get weird errors. The downside is that you'll have duplicate symbol tables for VB projects that you reference that are also in your solution. It's an imperfect solution, but it's all there is for the moment.

So ends the lesson on background compilation... All of this material will be on the test.


Original URLs: http://panopticoncentral.net/archive/2004/02/25/276.aspx (part 1), http://panopticoncentral.net/archive/2004/03/19/291.aspx (part 2)

posted on Friday, April 16, 2004 10:43 AM