Released: ReliabilityPatterns – a circuit breaker implementation for .NET

How to get it

Library: Install-Package ReliabilityPatterns (if you’re not using NuGet already, start today)

Source code: hg.tath.am/reliability-patterns

What it solves

In our homes, we use circuit breakers to quickly isolate an electrical circuit when there’s a known fault.

Michael T. Nygard introduces this concept as a programming pattern in his book Release It!: Design and Deploy Production-Ready Software.

The essence of the pattern is that when one of your dependencies stops responding, you need to stop calling it for a little while. A file system that has exhausted its operation queue is not going to recover while you keep hammering it with new requests. A remote web service is not going to come back any faster if you keep opening new TCP connections and mindlessly waiting for the 30 second timeout. Worse yet, if your application normally expects that web service to respond in 100ms, suddenly starting to block for 30s is likely to deteriorate the performance of your own application and trigger a cascading failure.

Electrical circuit breakers ‘trip’ when a high current condition occurs. They then need to be manually ‘reset’ to close the circuit again.

Our programmatic circuit breaker will trip after an operation has more consecutive failures than a predetermined threshold. While the circuit breaker is open, operations will fail immediately without even attempting to be executed. After a reset timeout has elapsed, the circuit breaker will enter a half-open state. In this state, only the next call will be allowed to execute. If it fails, the circuit breaker will go straight back to the open state and the reset timer will be restarted. Once the service has recovered, calls will start flowing normally again.

Writing all this extra management code would be painful. This library manages it for you instead.

How to use it

Taking advantage of the library is as simple as wrapping your outgoing service call with circuitBreaker.Execute:

// Note: you'll need to keep this instance around
var breaker = new CircuitBreaker();

var client = new SmtpClient();
var message = new MailMessage();
breaker.Execute(() => client.SendEmail(message));

The only caveat is that you need to manage the lifetime of the circuit breaker(s). You should create one instance for each distinct dependency, then keep this instance around for the life of your application. Do not create different instances for different operations that occur on the same system.

(Managing multiple circuit breakers via a container can be a bit tricky. I’ve published a separate example for how to do it with Autofac.)

It’s generally safe to add this pattern to existing code because it will only throw an exception in a scenario where your existing code would anyway.

You can also take advantage of built-in retry logic:

breaker.ExecuteWithRetries(() => client.SendEmail(message), 10, TimeSpan.FromSeconds(20));

Why is the package named ReliabilityPatterns instead of CircuitBreaker?

Because I hope to add more useful patterns in the future.

This blog post in picture form

Sequence diagram

.NET Rocks! #687: ‘Tatham Oddie Makes HTML 5 and Silverlight Play Nice Together’

I spoke to Carl + Richard on .NET Rocks! last week about using HTML5 and Silverlight together. We also covered a bit of Azure toward the end.

The episode is now live here:

http://www.dotnetrocks.com/default.aspx?showNum=687

JT – Sorry, I referred to you as “the other guy I presented with” and never intro-ed you. 😦

Everyone else – JT is awesome.

Peer Code Reviews in a Mercurial World

Mercurial, or Hg, is a brilliant DVCS (Distributed Version Control System). Personally I think it’s much better than Git, but that’s a whole religious war in itself. If you’re not familiar with at least one of these systems, do yourself a favour and take a read of hginit.com.

Massive disclaimer: This worked for our team. Your team is different. Be intelligent. Use what works for you.

The Need for Peer Code Reviews

I’ve previously worked in a number of environments which required peer code reviews before check-ins. For those not familiar, the principle is simple – get somebody else on the team to come and validate your work before you hit the check-in button. Now, before anybody jumps up and says this is too controlling, let me highlight that this was back in the unenlightened days of centralised VCSs like Subversion and TFS.

This technique is just another tool in the toolbox for finding problems early. The earlier you find them, the quicker and cheaper they are to fix.

  • If you’ve completely misinterpreted the task, the reviewer is likely to pick up on this. If they’ve completely misinterpreted the task, it spurs a discussion between the two of you that’s likely to help them down the track.
  • Smaller issues like typos can be found and fixed immediately, rather than being relegated to P4 bug status and left festering for months on end.
  • Even if there aren’t any issues, it’s still useful as a way of sharing knowledge around the team about how various features have been implemented.

On my current project we’d started to encounter all three of these issues – we were reworking code to fit the originally intended task, letting small issues creep into our codebase and using techniques that not everybody understood. We identified these in our sprint retrospective and identified the introduction of peer code reviews as one of the techniques we’d use to counter them.

Peer Code Reviews in a DVCS World

One of the most frequently touted benefits for DVCS is that you can check-in anywhere, anytime, irrespective of network access. Whilst you definitely can, and this is pretty cool, it’s less applicable for collocated teams.

Instead, the biggest benefit I perceive is how frictionless commits enables smaller but more frequent commits. Smaller commits provide a clearer history trail, easier merging, easier reviews, and countless other benefits. That’s a story for a whole other post though. If you don’t already agree, just believe me that smaller commits are a good idea.

Introducing a requirement for peer review before each check-in would counteract these benefits by introducing friction back into the check-in process. This was definitely not an idea we were going to entertain.

The solution? We now perform peer reviews prior to pushing. Developers still experience frictionless commits, and can pull and merge as often as possible (also a good thing), yet we’ve been able to bring in the benefits of peer reviews. This approach has been working well for us for 3 weeks so far (1.5 sprints).

It’s a DVCS. Why Not Forks?

We’ve modelled our source control as a hub and spoke pattern. BitBucket has been nominated as our ‘central’ repository that is the source of truth. Generally, we all push and pull from this one central repository. Because our team is collocated, it’s easy enough to just grab the person next to you to perform the review before you push to the central repository.

Forks do have their place though. One team member worked from home this week to avoid infecting us all. He quickly spun up a private fork on BitBucket and started pushing to there instead. At regular intervals he’d ask one of us in the office for a review via Skype. Even just using the BitBucket website, it was trivial to review his pending changesets.

The forking approach could also be applied in the office. On the surface it looks like a nice idea because it means you’re not blocked waiting on a review. In practice though, it just becomes another queue of work which the other developer is unlikely to get to in as timely a manner. “Sure, I’ll take a look just after I finish this.” Two hours later, the code still hasn’t hit the central repository. The original developer has moved on to other tasks. By the time a CI build picks up any issues, ownership and focus has long since moved on. An out-of-band review also misses the ‘let’s sit and have a chat’ mentality and knowledge sharing we were looking for.

What We Tried

To kick things off, we started with hg out. The outgoing command lists all of the changesets that would be pushed if you ran hg push right now. By default it only lists the header detail of each changeset, so we’d then run though hg exp 1234, hg exp 1235, hg exp 1236, etc to review each one. The downsides to this approach were that we didn’t get colored diff outputs, we had to review them one at a time and it didn’t exclude things like merge changesets.

Next we tried hg out -p. This lists all of the outgoing changesets, in order, with their patches and full colouring. This is good progress, but we still wanted to filter out merges.

One of the cooler things about Mercurial is revsets. If you’re not familiar with them, it’d pay to take a look at hg help revsets. This allows us to use the hg log command, but pass in a query that describes which changesets we want to see listed: hg log -pr "outgoing() and not merge()".

Finally, we added a cls to the start of the command so that it was easy to just scroll back and see exactly what was in the review. This took the full command to cls && hg log -pr "outgoing() and not merge()". It’d be nice to be able to do hg log -pr "outgoing() and not merge()" | more but the more command drops the ANSI escape codes used for coloring.

What We Do Now

To save everybody from having to remember and type this command, we added a file called review.cmd to the root of our repository. It just contains this one command.

Whenever we want a review we just type review and press enter. Too easy!

One Final Tweak

When dealing with multiple repositories, you need to specify which path outgoing() applies to in the revset. We updated the contents of review.cmd to cls && hg log -pr "outgoing(%1) and not merge()". If review.cmd is called with an argument, %1 carries it through to the revset. That way we can run review or review myfork as required.

Released: FormsAuthenticationExtensions

What it does

Think about a common user table. You probably have a GUID for each user, but you want to show their full name and maybe their email address in the header of each page. This commonly ends up being an extra DB hit (albeit hopefully cached).

There is a better way though! A little known gem of the forms authentication infrastructure in .NET is that it lets you embed your own arbitrary data in the ticket. Unfortunately, setting this is quite hard – upwards of 15 lines of rather undiscoverable code.

Sounds like a perfect opportunity for another NuGet package.

How to get it

Library: Install-Package FormsAuthenticationExtensions

(if you’re not using NuGet already, start today)

Source code: formsauthext.codeplex.com

How to use it

Using this library, all you need to do is add:

 using FormsAuthenticationExtensions; 

then change:

 FormsAuthentication.SetAuthCookie(user.UserId, true); 

to:

 var ticketData = new NameValueCollection {
    { "name", user.FullName },
    { "emailAddress", user.EmailAddress }
 };
new FormsAuthentication().SetAuthCookie(user.UserId, true, ticketData);

Those values will now be encoded and persisted into the authentication ticket itself. No need to store it in any form of session state, custom cookies or extra DB calls.

To read the data out at a later time:

 var ticketData = ((FormsIdentity) User.Identity).Ticket.GetStructuredUserData();
var name = ticketData["name"];
var emailAddress = ticketData["emailAddress"];

If you want something even simpler, you can also just pass a string in:

 new FormsAuthentication().SetAuthCookie(user.UserId, true, "arbitrary string here"); 

and read it back via:

 var userData = ((FormsIdentity) User.Identity).Ticket.UserData; 

Things to Consider

Any information you store this way will live for as long as the ticket.

That can be quite a while if users are active on your application for long periods of time, or if you give out long-term persistent sessions.

Whenever one of the values stored in the ticket needs to change, all you need to do is call SetAuthCookie again with the new data and the cookie will be updated accordingly. In our user name / email address example, this is actually quite advantageous. If the user was to update their display name or email address, we’d just update the ticket with new values. This updated ticket would then be supplied for future requests. In web farm environments this is about as perfect as it gets – we don’t need to go back to the DB to load this information for each request, yet we don’t need to worry about invalidating the cache across machines. (Any form of shared, invalidatable cache in a web farm is generally bad.)

Size always matters.

The information you store this way is embedded in the forms ticket, which is then encrypted and sent back to the users browser. On every single request after this, that entire cookie gets sent back up the wire and decrypted. Storing any significant amount of data here is obviously going to be an issue. Keep it to absolutely no more than a few simple values.

Twavatar – coming to a NuGet server near you

Yet another little micro-library designed to do one thing, and do it well:

twavatar.codeplex.com

Install-Package twavatar

I’ve recently been working on a personal project that lets me bookmark physical places.

To avoid having to build any of the authentication infrastructure, I decided to build on top of Twitter’s identity ecosystem. Any user on my system has a one-to-one mapping back to a Twitter account. Twitter get to deal with all the infrastructure around sign ups, forgotten passwords and so forth. I get to focus on features.

The other benefit I get is being able to easily grab an avatar image and display it on the ‘mark’ page like this:

image

(Sidenote: You might also notice why I recently built relativetime and crockford-base32.)

Well, it turns out that grabbing somebody’s Twitter avatar isn’t actually as easy as one might hope. The images are stored on Amazon S3 under a URL structure that requires you to know the user’s Twitter Id (the numeric one) and the original file name of the image they uploaded. To throw another spanner in the works, if the user uploads a new profile image, the URL changes and the old one stops working.

For most Twitter clients this isn’t an issue because the image URL is returned as part of the JSON blob for each status. In our case, it’s a bit annoying though.

Joe Stump set out to solve this problem by launching tweetimag.es. This service lets you use a nice URL like http://img.tweetimag.es/i/tathamoddie_n and let them worry about all the plumbing to make it work. Thanks Joe!

There’s a risk though … This is a free service, with no guarantees about its longevity. As such, I didn’t want to hardcode too many dependencies on it into my website.

This is where we introduce Twavatar. Here’s what my MVC view looks like:

 @Html.TwitterAvatar(Model.OwnerHandle) 

Ain’t that pretty?

We can also ask for a specific size:

 @Html.TwitterAvatar(Model.OwnerHandle, Twavatar.Size.Bigger) 

The big advantage here is that if / when tweetimag.es disappears, I can just push an updated version of Twavatar to NuGet and everybody’s site can keep working. We’ve cleanly isolated the current implementation into its own library.

It’s scenarios like this where NuGet really shines.

Update 1: Paul Jenkins pointed out a reasonably sane API endpoint offered by Twitter in the form of http://api.twitter.com/1/users/profile_image/tathamoddie?size=bigger. There are two problems with this API. First up, it issues a 302 redirect to the image resource rather than returning the data itself. This adds an extra DNS resolution and HTTP round trip to the page load. Second, the documentation for it states that it “must not be used as the image source URL presented to users of your application” (complete with the bold). To meet this requirement you’d need to call it from your application server-side, implement your own caching and so forth.

The tweetimag.es service most likely uses this API under the covers, but they do a good job of abstracting all the mess away from us. If the tweetimag.es service was ever to be discontinued, I imagine I’d update Twavatar to use this API directly.

Released: RelativeTime

Ruby has a nifty little function called time_ago_in_words. You pass it an arbitrary number of seconds and it gives you back something friendly like “about 2 weeks ago”.

Today, I implemented a similar routine for .NET.

relativetime.codeplex.com

nuget.org/List/Packages/relativetime

To use it, just include the namespace, then call ToHumanTime() on a TimeSpan object.

If you want more of an idea of what it generates, take a look at the test suite.

Released: Crockford Base32 Encoder

Now, doesn’t that just sound sexy? No, not really. I hear you.

Alas, I went and built it anyway.

crockfordbase32.codeplex.com

nuget.org/List/Packages/crockford-base32

Crockford Base32 lets you encode a number into an alphanumeric string, and back again.

Where it shines is in the character set it uses.

It’s resilient to humans:

  • No crazy characters or keyboard gymnastics
  • Totally case insensitive
  • 0, O and o all decode to the same thing
  • 1, I, i, L and l all decode to the same thing
  • Doesn’t use U, so a number like 519,571 encodes to FVCK instead
  • Optional check digit on the end

It’s great for URLs:

  • No funky characters that require special encoding
  • No plus, slash or equals symbols like base 64

It handles really big numbers. (Well, my implementation is limited to 18,446,744,073,709,551,615 but you could extend the algorithm even further just by changing the data type from ulong to something even bigger.)

Number Encoded Encoded with optional check digit
1 1 11
194 62 629
456,789 1CKE 1CKEM
398,373 C515 C515Z
3,838,385,658,376,483 3D2ZQ6TVC93 3D2ZQ6TVC935
18,446,744,073,709,551,615 FZZZZZZZZZZZZ FZZZZZZZZZZZZB

 

Don’t have too much fun now.

Yet Another Debugging Tale – Visual Studio Disappearing

Call me a nerd (that’s obvious!), but I find a good debugging tale like something of a geek murder thriller. Every issue has its own little debugging quirks. This blog post, and some of my previous ones, are posted to be both entertaining as well as educational. I don’t want to bore you to death with cdb or WinDbg documentation, but you might find some of the approaches useful in the future.

The Issue

This morning ScottGu announced NuPack, a package management solution for .NET.

Eager to try it out, I opened an existing solution, expanded a web application project, right clicked on the References node and chose Add Package Reference.

The dialog popped up for a second or so and then my entire VS shell just disappeared without a trace. No error. No crash dialogs. Nothing.

This happened reliably every time.

Note: This issue is now fixed in the latest source.

My Debugging Steps

I opened a fresh instance of VS, attached WinDbg, opened the solution in question, and expanded the project nodes.

Before opening the context menu, I set a pretty wide exception breakpoint:

 0:051> !soe -derived -create System.Exception 1 Breakpoint set 

Then resumed execution:

 0:051> g 

Next, I right clicked on the References node and clicked Add Packaged Reference.

Bam! Exception:

(1ee4.1144): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=2b542f64 edi=2b5fb018
eip=03458ddf esp=00457b30 ebp=00457b38 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
03458ddf 8b721c          mov     esi,dword ptr [edx+1Ch] ds:002b:0000001c=????????

Where is it?

0:000> !clrstack
OS Thread Id: 0x1144 (0)
Child SP IP       Call Site
00457b30 03458ddf NuPack.Dialog.Providers.OnlinePackagesProvider.IsInstalled(System.String)*** ERROR: Module load completed but symbols could not be loaded for NuPack.Dialog.dll

00457b40 03458dae NuPack.Dialog.Providers.OnlinePackagesItem.get_IsInstalled()
00458094 5ad921db [DebuggerU2MCatchHandlerFrame: 00458094] 
00458060 5ad921db [CustomGCFrame: 00458060] 
00458034 5ad921db [GCFrame: 00458034] 
00458018 5ad921db [GCFrame: 00458018] 
0045823c 5ad921db [HelperMethodFrame_PROTECTOBJ: 0045823c] System.RuntimeMethodHandle._InvokeMethodFast(System.IRuntimeMethodInfo, System.Object, System.Object[], System.SignatureStruct ByRef, System.Reflection.MethodAttributes, System.RuntimeType)
004582b8 55b1d689 System.RuntimeMethodHandle.InvokeMethodFast(System.IRuntimeMethodInfo, System.Object, System.Object[], System.Signature, System.Reflection.MethodAttributes, System.RuntimeType)*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\4ff1f12a08d455f195ba996fe77497c6\mscorlib.ni.dll

0045830c 55b1d3d0 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, Boolean)
00458348 55b1bfed System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
0045836c 55af63f8 System.Reflection.RuntimePropertyInfo.GetValue(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
00458390 55af63ac System.Reflection.RuntimePropertyInfo.GetValue(System.Object, System.Object[])
0045839c 52076f58 MS.Internal.Data.PropertyPathWorker.GetValue(System.Object, Int32)

[...]

At this stage there’s no managed exception yet, so I stepped out a few times until I got one:

0:000> gu
(1ee4.1144): CLR exception - code e0434352 (first chance)
'System.Exception hit'
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00457e40 ebx=00000005 ecx=00000005 edx=00000000 esi=00457eec edi=0027fbe0
eip=75d9b727 esp=00457e40 ebp=00457e90 iopl=0         nv up ei pl nz ac po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
KERNELBASE!RaiseException+0x58:
75d9b727 c9              leave
0:000> !pe
Exception object: 2b613d1c
Exception type:   System.Reflection.TargetInvocationException
Message:          Exception has been thrown by the target of an invocation.
InnerException:   System.NullReferenceException, Use !PrintException 2b6125a0 to see more.
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80131604
0:000> !pe 2b6125a0
Exception object: 2b6125a0
Exception type:   System.NullReferenceException
Message:          Object reference not set to an instance of an object.
InnerException:   <none>
StackTrace (generated):
    SP       IP       Function
    00457B20 03458DDF NuPack_Dialog_1b750000!NuPack.Dialog.Providers.OnlinePackagesProvider.IsInstalled(System.String)+0xf
    00457B30 03458DAE NuPack_Dialog_1b750000!NuPack.Dialog.Providers.OnlinePackagesItem.get_IsInstalled()+0x1e

StackTraceString: <none>
HResult: 80004003

A quick trip to Reflector shows this to be the culprit (NuPack source was only published after this debugging exercise):

public bool IsInstalled(string id)
{
    return (this.ProjectManager.LocalRepository.FindPackage(id) != null);
}

Either get_ProjectManager or get_LocalRepository is returning null.

I let the app crash, then setup the scenario again.

This time I caught a tighter exception:

0:061> !soe -create2 System.NullReferenceException
Breakpoint set
0:061> g

As expected, we got a hit in the same place:

0:000> !clrstack
OS Thread Id: 0x1634 (0)
Child SP IP       Call Site
00367ab0 027bd85f NuPack.Dialog.Providers.OnlinePackagesProvider.IsInstalled(System.String)*** ERROR: Module load completed but symbols could not be loaded for NuPack.Dialog.dll

00367ac0 027bd82e NuPack.Dialog.Providers.OnlinePackagesItem.get_IsInstalled()
00368014 5ad921db [DebuggerU2MCatchHandlerFrame: 00368014] 
00367fe0 5ad921db [CustomGCFrame: 00367fe0] 
00367fb4 5ad921db [GCFrame: 00367fb4] 
00367f98 5ad921db [GCFrame: 00367f98] 
003681bc 5ad921db [HelperMethodFrame_PROTECTOBJ: 003681bc] System.RuntimeMethodHandle._InvokeMethodFast(System.IRuntimeMethodInfo, System.Object, System.Object[], System.SignatureStruct ByRef, System.Reflection.MethodAttributes, System.RuntimeType)
00368238 55b1d689 System.RuntimeMethodHandle.InvokeMethodFast(System.IRuntimeMethodInfo, System.Object, System.Object[], System.Signature, System.Reflection.MethodAttributes, System.RuntimeType)*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\4ff1f12a08d455f195ba996fe77497c6\mscorlib.ni.dll

0036828c 55b1d3d0 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, Boolean)
003682c8 55b1bfed System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
003682ec 55af63f8 System.Reflection.RuntimePropertyInfo.GetValue(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
00368310 55af63ac System.Reflection.RuntimePropertyInfo.GetValue(System.Object, System.Object[])
0036831c 52416f58 MS.Internal.Data.PropertyPathWorker.GetValue(System.Object, Int32)

[...]

I grabbed the native disassembly:

0:000> !u 027bd85f
Normal JIT generated code
NuPack.Dialog.Providers.OnlinePackagesProvider.IsInstalled(System.String)
Begin 027bd850, size 51
027bd850 55              push    ebp
027bd851 8bec            mov     ebp,esp
027bd853 57              push    edi
027bd854 56              push    esi
027bd855 8bfa            mov     edi,edx
027bd857 ff1570007e02    call    dword ptr ds:[27E0070h] (NuPack.Dialog.Providers.OnlinePackagesProvider.get_ProjectManager(), mdToken: 06000048)
027bd85d 8bd0            mov     edx,eax
>>> 027bd85f 8b721c          mov     esi,dword ptr [edx+1Ch]
027bd862 33c9            xor     ecx,ecx
027bd864 33d2            xor     edx,edx
027bd866 e8057b2e53      call    mscorlib_ni+0x245370 (55aa5370) (System.Version.op_Equality(System.Version, System.Version), mdToken: 06001487)
027bd86b 85c0            test    eax,eax
027bd86d 750e            jne     027bd87d
027bd86f 6a00            push    0
027bd871 8bd7            mov     edx,edi
027bd873 8bce            mov     ecx,esi
027bd875 ff153818f901    call    dword ptr ds:[1F91838h]
027bd87b eb18            jmp     027bd895
027bd87d 8bd7            mov     edx,edi
027bd87f 8bce            mov     ecx,esi
027bd881 ff15a82a7f02    call    dword ptr ds:[27F2AA8h] (NuPack.PackageRepositoryExtensions.FindPackagesById(NuPack.IPackageRepository, System.String), mdToken: 06000285)
027bd887 6a00            push    0
027bd889 6a00            push    0
027bd88b 8bc8            mov     ecx,eax
027bd88d 33d2            xor     edx,edx
027bd88f ff15142b7f02    call    dword ptr ds:[27F2B14h] (NuPack.PackageExtensions.FindByVersion(System.Linq.IQueryable`1<NuPack.IPackage>, System.Version, System.Version, System.Version), mdToken: 060001c5)
027bd895 85c0            test    eax,eax
027bd897 0f95c0          setne   al
027bd89a 0fb6c0          movzx   eax,al
027bd89d 5e              pop     esi
027bd89e 5f              pop     edi
027bd89f 5d              pop     ebp
027bd8a0 c3              ret

From this we can note that the crash is after the call to get_ProjectManager but before a call to System.Version.op_Equality.

Checking the eax register shows that the call to get_ProjectManager returned null:

0:000> r
eax=00000000 ebx=00465938 ecx=0046c824 edx=00000004 esi=00000043 edi=00004000
eip=5ad91984 esp=00465868 ebp=00465868 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
clr!StressLog::StressLogOn+0xa:
5ad91984 854508          test    dword ptr [ebp+8],eax ss:002b:00465870=00004000

How could this occur?

protected ProjectManager ProjectManager
{
    get
    {
        return this.PackageManager.GetProjectManager((Project) this.Project);
    }
}

public ProjectManager GetProjectManager(Project project)
{
    ProjectManager manager;
    this.EnsureProjectManagers();
    this._projectManagers.TryGetValue(project, out manager);
    return manager;
}

From Reflector, we can see that get_ProjectManager calls into GetProjectManager.

This latter methods ‘ensures’ that a dictionary is initialized, then tries to return a value from it.

If the EnsureProjectManagers() method logic is wrong at all, and TryGetValue returns null, the application crashes. If it is always expected that this method will return a value, TryGetValue(Project, out ProjectManager) should be replaced with a standard dictionary lookup. If it’s valid for this method to return null, then NuPack.Dialog.Providers.OnlinePackagesProvider.get_ProjectManager or NuPack.Dialog.Providers.OnlinePackagesProvider.IsInstalled(string) need to be updated to handle null values.

Let’s work out why it returned null. First up, lets find all PackageManager instances on the heap so we can interrogate them:

0:000> !dumpheap -type NuPack.ProjectManager
Address       MT     Size
2bceed0c 21187e4c       52     
2bcf7450 21189858       60     
total 0 objects
Statistics:
      MT    Count    TotalSize Class Name
21187e4c        1           52 System.Collections.Generic.Dictionary`2[[EnvDTE.Project, NuPack.VisualStudio],[NuPack.ProjectManager, NuPack.Core]]
21189858        1           60 System.Collections.Generic.Dictionary`2+Entry[[EnvDTE.Project, NuPack.VisualStudio],[NuPack.ProjectManager, NuPack.Core]][]
Total 2 objects

Out of these two, the first one is the interesting one:

0:000> !do 2bceed0c
Name:        System.Collections.Generic.Dictionary`2[[EnvDTE.Project, NuPack.VisualStudio],[NuPack.ProjectManager, NuPack.Core]]
MethodTable: 21187e4c
EEClass:     558b99ac
Size:        52(0x34) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
55b82938  4000bd3        4       System.Int32[]  0 instance 2bcf7438 buckets
56320924  4000bd4        8 ...non, mscorlib]][]  0 instance 2bcf7450 entries
55b82978  4000bd5       20         System.Int32  1 instance        3 count
55b82978  4000bd6       24         System.Int32  1 instance        3 version
55b82978  4000bd7       28         System.Int32  1 instance       -1 freeList
55b82978  4000bd8       2c         System.Int32  1 instance        0 freeCount
55b82fd8  4000bd9        c ...Canon, mscorlib]]  0 instance 2bceed70 comparer
55b7568c  4000bda       10 ...Canon, mscorlib]]  0 instance 00000000 keys
55b7b730  4000bdb       14 ...Canon, mscorlib]]  0 instance 00000000 values
55b7f5e8  4000bdc       18        System.Object  0 instance 00000000 _syncRoot
55b7b794  4000bdd       1c ...SerializationInfo  0 instance 00000000 m_siInfo

The count is correct. (My solution has three projects.)

This indicates that the dictionary context is correct, but that we’re looking for the wrong key.

This is where the lookup key comes from:

protected ProjectManager ProjectManager
{
    get
    {
        return this.PackageManager.GetProjectManager((Project) this.Project);
    }
}

protected Project Project
{
    get
    {
        return Utilities.GetActiveProject(this.DTE);
    }
}

public static Project GetActiveProject(_DTE dte)
{
    Project project = null;
    if (dte != null)
    {
        object activeSolutionProjects = dte.ActiveSolutionProjects;
        if (((activeSolutionProjects != null) && (activeSolutionProjects is Array)) && (((Array) activeSolutionProjects).Length > 0))
        {
            object obj3 = ((Array) activeSolutionProjects).GetValue(0);
            if ((obj3 != null) && (obj3 is Project))
            {
                project = (Project) obj3;
            }
        }
    }
    return project;
}

In the last method, there are a whole host of scenarios that would cause it to return null. This will then cascade through GetProjectManager(Project) and get_ProjectManager before resulting in the NullReference exception we saw in IsInstalled(string). Either GetActiveProject(_DTE) should be updated to fail fast when the active project cannot be determined, or IsInstalled(string) needs to be updated to handle this scenario.

Which scenario caused GetActiveProject(_DTE) to return null for me?

Interrogating the _DTE instance was beyond me because it’s a COM object. By this point I had a pretty good idea though so trial and error got me the rest of the way.

My Diagnosis

The traditional Solution Explorer tool window needs to be opened at least once between opening a solution and trying to add a package reference. If you’re exclusively using the Solution Navigator tool window from the power tools, your VS will disappear as soon as you try to add the package reference.

My Repro Steps

  1. Open Visual Studio
  2. Close the Solution Explorer tool window
  3. Open the Solution Navigator tool window (the one from the power tools)
  4. Open your .sln
  5. In the Solution Navigator, expand a project
  6. Right click the ‘References’ node
  7. Click ‘Add Package Reference’
  8. Boom!

I hope you found this story interesting for the mystery factor, as well as showing a few debugging techniques that I use. 🙂