C# Compiler Emits call IL Instruction for Instance Methods Called on the Reference Returned by a Constructor

Some time ago, I wrote how the call instruction could actually call an instance method on a null reference, and that inside this instance method, the this keyword would reference to null.

I find that very interesting, so I kept on disassembling some sample code to see what’s the generated IL and try to grasp some of the compiler’s logic.

Here is some simple code used to see what’s the IL generated by the C# compiler:

class Hello1
{
    internal String GetHello()
    {
        return "Hello1";
    }
}

sealed class Hello2
{
    internal String GetHello()
    {
        return "Hello2";
    }
}

static void Main(string[] args)
{
    var h1 = new Hello1();
    Console.WriteLine(h1.GetHello());

    var h2 = new Hello2();
    Console.WriteLine(h2.GetHello());

    Console.WriteLine(new Hello1().GetHello());

    Console.WriteLine(new Hello2().GetHello());

    Console.ReadLine();
}

In the first two calls, we use a local variable that we call the GetHello method on, and in the two last calls we instantiate the object and call the GetHello method on the reference returned by the constructor, reference that we don’t keep.

Here’s the IL generated for the Main method:

CallCallvirtIL

We can see that in the first two call, the callvirt instruction is emitted by the compiler. As the call happens on a variable, the runtime type of the object could be different from the compile type, meaning that using the callvirt instruction makes sense (the compiler is not “smart” enough to detect that the compile time and the runtime types are the same).

In the two subsequent calls, however, as the method call is done on the reference returned by the constructor, the instruction emitted is call, which is slightly more performant than callvirt.

For more information on call and callvirt instructions, see ECMA 335 12.4.1.2.

Instance Methods Called on null References

In a previous post, I wrote how you can call Extension Methods on null references, as in fact the are static methods with one more parameter, the extended object itself.

I’m currently reading CLR via C# (which is a fascinating read), and I was surprised to learn in chapter 6 how the CIL instructions call and callvirt actually work.

What is amazing is that for methods called with the call instruction, the CLR does not check if the referenced object is null. The method call will succeed, but the this reference will be null in the instance method. Actually, in both cases, the reference to the object that the method was called on is passed as a hidden parameter to the method.

Before examining this, another interesting fact is the that the C# compiler mostly emits callvirt instructions when calling a method, which checks if the reference is null. To test the call instruction easily, we will have to disassemble, modify then reassemble the following code:

public class SomeClass
{
    public String GetHello()
    {
        if (this == null)
        {
            return "Amazing!";
        }

        return "Hello";
    }
}

class Program
{
    static void Main(string[] args)
    {
        var o = null as SomeClass;
        var hello = o.GetHello();

        Console.WriteLine(hello);
    }
}

Pretty dumb, right? Especially the if statement where we check if this is null. It’s seems logical to most of us that this will throw a NullReferenceException. However, this is just to get the compiler to build us code that is very close to what to achieve, so we don’t have to write IL ourselves.

After running ILDasm.exe on the assembly, this is what we have in the Main method:

  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       18 (0x12)
    .maxstack  1
    .locals init ([0] class Sandbox09.SomeClass o,
             [1] string hello)
    IL_0000:  nop
    IL_0001:  ldnull
    IL_0002:  stloc.0
    IL_0003:  ldloc.0
    IL_0004:  callvirt   instance string Sandbox09.SomeClass::GetHello()
    IL_0009:  stloc.1
    IL_000a:  ldloc.1
    IL_000b:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_0010:  nop
    IL_0011:  ret
  } // end of method Program::Main

As we can see, the call to GetHello is done with the callvirt instruction. As this instruction checks if the object is null (and in this case, it is), this will fail at runtime.

Just to make sure, I used ILasm.exe to build the assembly and ran it, here is what it outputs:

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.

   at Pvle.Program.Main(String[] args)

Now, let’s try to replace the callvirt by call to see how it behaves.

    IL_0004:  call       instance string Sandbox09.SomeClass::GetHello()

Now, run it again trough ILasm.exe once more and run it. Here’s what it outputs:

Amazing!

The actual difference between call and callvirt is that call calls the method on the compile time type of the object, so there is no need to check if the reference is null. The object will be passed as a hidden parameter to the method and will be references as this. It’s very similar to extension methods.

Callvirt, on the other hand, will resolve the method that is to be called at runtime, depending on the runtime type of the object, so the object cannot be null. The CLR enforces this check at runtime.

What About Value Types?

For value types, it’s a bit different. As they are implicitly sealed, the only methods that are virtual are the ones that are defined in System.Object. Oh wait, there is another case: if the value type is cast to an interface it implements, calls to methods on that variable will be using callvirt, as the value type will have to be boxed.

Here is some sample code that demonstrates this:

public interface ISomeInterface
{
    String GetHelloFromInterface();
}

public struct SomeClass : ISomeInterface
{
    public String GetHello()
    {
        return "Hello";
    }

    public override string ToString()
    {
        return "Hello";
    }

    public String GetHelloFromInterface()
    {
        return "Hello from interface";
    }
}

class Program
{
    static void Main(string[] args)
    {
        var o = new SomeClass();
        var hello = o.GetHello();

        o.ToString();
        o.GetHelloFromInterface();
        ((ISomeInterface)o).GetHelloFromInterface();
        o.GetHashCode();
    }
}

And here is the corresponding IL for the Main method:

  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       66 (0x42)
    .maxstack  1
    .locals init ([0] valuetype Pvle.SomeClass o,
             [1] string hello)
    IL_0000:  nop
    IL_0001:  ldloca.s   o
    IL_0003:  initobj    Pvle.SomeClass
    IL_0009:  ldloca.s   o
    IL_000b:  call       instance string Pvle.SomeClass::GetHello()
    IL_0010:  stloc.1
    IL_0011:  ldloca.s   o
    IL_0013:  constrained. Pvle.SomeClass
    IL_0019:  callvirt   instance string [mscorlib]System.Object::ToString()
    IL_001e:  pop
    IL_001f:  ldloca.s   o
    IL_0021:  call       instance string Pvle.SomeClass::GetHelloFromInterface()
    IL_0026:  pop
    IL_0027:  ldloc.0
    IL_0028:  box        Pvle.SomeClass
    IL_002d:  callvirt   instance string Pvle.ISomeInterface::GetHelloFromInterface()
    IL_0032:  pop
    IL_0033:  ldloca.s   o
    IL_0035:  constrained. Pvle.SomeClass
    IL_003b:  callvirt   instance int32 [mscorlib]System.Object::GetHashCode()
    IL_0040:  pop
    IL_0041:  ret
  } // end of method Program::Main

We can see that when calling the method trough the interface, the value type is boxed.

I find this very interesting in understanding how calls to methods actually work. Getting your nose in IL is always a good idea when you want to see what happening under the hood, but I have to admit that this is the first time that I modify it and reassemble it.