Lock-free thread safe initialisation in C#

I used to do quite a lot of concurrent programming early in my career. Recently I’ve been working again on optimising a large piece of code to behave better when ran in concurrent environment and I must share a secret with you – I had a great fun. It all started with a semi-serious challenge from a friend, and ended up consuming big part of my Christmas-new year break.

One of the problems I found, was that a group of objects that was being used on multiple threads had to be initialised in a thread safe manner – that is the initialisation code had to be ran just once, atomically, on a single thread. However afterwards it could be safely executed concurrently on multiple threads. This was being accomplished by using code similar to following:

public void Foo GetInitializedFoo()
{
   if(this.fooIsInitialized) return foo;
   lock(lockObject)
   {
      if(this.fooIsInitialized) return foo;
      
      InitializeFoo();
      return foo;
   }
}

As you can see the code usually works lock-free, and only the first time when foo is not initialised it has to lock to perform the initialisation algorithm. The reason that was suboptimal is that in heavily multi-threaded scenarios it yielded suboptimal performance when multiple threads were trying to initialise foo.

Imagine there are four threads. First of them succeeds in obtaining the lock, and goes ahead to perform the initialisation, while the other two go into the waiting queue and sleep there waiting for the lock to be freed. The first thread finishes the initialisation pretty soon, but by the time the first thread wakes up already quite a lot of time is wasted. Then the second thread obtains the lock, but the remaining two, even though the variable is already initialised and therefore safe to be read concurrently, still wait in the lock queue. The three remaining threads could execute all at once now, yet they will be woken up and executed each in turn.

Mad scientist solution

The problem was that using lock proved to be suboptimal for the problem at hand. To try to address this (and to make my inner geek happy) I decided to look at making this code perform better.

While .NET framework library has lots of synchronisation primitives none of them really seemed to be any better suited for the job then good old Monitor. I didn’t want to put waiting threads to sleep immediately, as this is quite an expensive operation (and so is waking them up back again), and the initialisation would usually be performed quite quickly. I also wanted all the waiting threads to be able to execute concurrently with no synchronisation after the first one of them finished initialisation. To accomplish this I came up with the following solution:

public sealed class ThreadSafeInit
{
	// We are free to use negative values here, as ManagedTreadId is guaranteed to be always > 0
	// http://www.netframeworkdev.com/net-base-class-library/threadmanagedthreadid-18626.shtml
	// the ids can be recycled as mentioned, but that does not affect us since at any given point in time
	// there can be no two threads with the same managed id, and that's all we care about
	private const int Initialized = int.MinValue + 1;
	private const int NotInitialized = int.MinValue;
	private int state = NotInitialized;

	public void EndThreadSafeOnceSection()
	{
		if (state == Initialized)
		{
			return;
		}
		if (state == Thread.CurrentThread.ManagedThreadId)
		{
			state = Initialized;
		}
	}

	public bool ExecuteThreadSafeOnce()
	{
		if (state == Initialized)
		{
			return false;
		}
		var inProgressByThisThread = Thread.CurrentThread.ManagedThreadId;
		var preexistingState = Interlocked.CompareExchange(ref state, inProgressByThisThread, NotInitialized);
		if (preexistingState == NotInitialized)
		{
			return true;
		}
		if (preexistingState == Initialized || preexistingState == inProgressByThisThread)
		{
			return false;
		}

		var spinWait = new SpinWait();
		while (state != Initialized)
		{
			spinWait.SpinOnce();
		}
		return false;
	}
}

The ThreadSafeInit class manages lock-free access to part of codebase. It has a state int flag which starts off in a non-initialised state. Then as ExecuteThreadSafeOnce method, which marks the beginning of the initialisation section is being called, it uses Interlocked.CompareExchange to set the flag in a lock-free thread safe manned to the id of current thread. The thread that succeeds returns true and is free to go ahead and do the initialisation. Any other thread waits for the first one to finish, but instead of going immediately to sleep it spins for a while, effectively wasting cycles, and then once the first thread finishes, and calls EndThreadSafeOnceSection method, all the other threads can proceed in parallel.

Usage pattern looks something like this:

public void Foo GetInitializedFoo()
{
   if(this.fooIsInitialized) return foo;
   var initialising = false;
   try
   {
      initializing = init.ExecuteThreadSafeOnce();
      if(this.fooIsInitialized) return foo;
      
      InitializeFoo();
      return foo;
   }
   finally
   {
      if(initialising)
      {
         init.EndThreadSafeOnceSection();
      }
   }
}

Using this code I was able to measure over two-fold improvement over the old solution in this part of code. Sure, it’s a micro-optimisation and in 99.9% of cases lock would be just fine, but as I said, it was a challenge, and I had a blast working on this, and that’s all I care about. Well, not all perhaps. If you see any flaws in this code, ways to improve it, or I just replicated something that someone else did much better long time ago, let me know.

  • zlamma

    Me thinks that moving the:

    try
    {

    below the:

    if(this.fooIsInitialized) return foo;

    shouldn’t hurt.

    Also that would mean that only the thread that actually did the initialization gets to EndThreadSafeOnceSection, so checking which thread entered the EndThreadSafeOnceSection would be unnecessary, as would using the initialising flag.

  • zlamma

    Also the code:
    if (preexistingState == Initialized || preexistingState == inProgressByThisThread)
    {
    return false;
    }

    doesn’t seem to have any use. You could probably save on the creation of a new SpinWait() with 100% accuracy, using:
    System.Threading.Thread.SpinWait(int iterations) iside the loop.

    And from what I can see – you don’t need Thread.CurrentThread.ManagedThreadId at all, just a third const – InProgress.

    But the fact that you put the mysterious

    || preexistingState == inProgressByThisThread

    at all, there tells me that it is me who is still missing something, and might as well be all wrong.

    • Krzysztof

      That’s to support reentrancy. If I didn’t track the thread already is doing the init, it would spin waiting for itself to finish.

  • What about the following pseudo code:

    class WithLazyInitialization
    {
    int initializationsCounter = 0;
    ManualResetEventSlim wait = new ManualResetEventSlim(false);
    private string someLazyState;

    void OneTimeInitialize()
    {
    Thread.Sleep(10); //otherwise it will end before the next thread gets to Get();
    someLazyState = “HELLO”;
    }

    public string Get()
    {
    if (someLazyState!=null) return someLazyState;

    var myturn = Interlocked.Increment(ref initializationsCounter);
    if (myturn > 1)
    {
    wait.Wait();
    }
    else
    {
    OneTimeInitialize();
    wait.Set();
    }

    return someLazyState;
    }
    }

    class Program
    {
    static void Main(string[] args)
    {
    var thing = new WithLazyInitialization();

    Parallel.For(0, 10, i => thing.Get(););

    }
    }

    • Krzysztof

      Ken,

      the problem with this approach is it does not support re-entrancy (see my comment above). Also it blocks the threads which ends up wasting more time than needed usually.

  • Bozydar

    IMHO this if also have to be inside lock statement (in your first example):

    if(this.fooIsInitialized) return foo;

    Outside of the lock statement you cannot be sure, that the other threads would see correct value of fooIsInitialized. You have force memory barier on CPU.

    You could make fooIsInitialized volatile, but it would also be broken on some memory-model relaxed CPU (but it would probably word on x86).

    As far as I remember this is similar problem to the singleton case and double check locking. There is a lot of realy scary articles about it.

  • Krzysztof

    Bozydar,

    you’re right in general. However in this case I don’t really care.

    The only place I do care about absolute up to date information is where I try to access the section and Interlocked CompareExchange guarantees just that.

    Notice how fooIsInitialized changes its value:

    NotInitialized –> Id-of-the-thread-performing-init –> Initialized

    I don’t really care if some thread reads a bit stale data. Nothing bad will happen.

  • krlm

    Why don’t you use a static constructor? It’s well defined that for a multiple threads it’ll be called only once per app.domain. Take a look on this: http://csharpindepth.com/Articles/General/Beforefieldinit.aspx