Overload Resolution and Covariance
As of C# 4.0, generics support covariance and contravariance. I won’t talke about contravariance in the post, however.
To sum it up quickly, covariance enables implicit conversion of a generic collection of type T to the same generic collection of a type that derives from T. In short:
IEnumerable<Object> list = new List<String>().AsEnumerable();
This means that as of now, any method that accepts something like IEnumerable<Object> can accept an IEnumerable<String> as well. This changes the overload resolution mechanism:
class A { } class B : A { } class T { internal void Foo(IEnumerable<B> sequence) { Console.WriteLine("In T.Foo.B"); } } class U : T { internal void Foo(IEnumerable<A> sequence) { Console.WriteLine("In U.Foo.A"); } }
In the following code, how does the method resolution changes?
U u = new U(); var l = new List<A>(); var m = new List<B>(); u.Foo(l); u.Foo(m);
If ran in Visual Studio as a .NET 3.5 application, here is the result:
If ran in Visual Studio as a .NET 4.0 application, here is the result:
Prior to C# 4.0, U.Foo was not a candidate for a call using a generic collection of B. However, with covariance, it is, hence the different result. So, this is no breaking change, but the behavior of an application might be affected.
The Misunderstood var Keyword
I find amazing how the var keyword introduced in C# 3.0 is misunderstood.
The web is full of questions asking what are the performances implications of using var, of how var is not type safe and other completely false statements.
Also, one a day, I had a discussion with colleagues that argued that if I was using var, I could also use Object as the type for all the variables and then cast everywhere.
I don’t know why the var appears so misleading. I find it very simple, and have no issues using it a lot. Now, there are some readability discussion to using var, but that is purely subjective.
Everything Happens at Compile-Time
The very important concept that has to be understood is that with var, everything happens at compile time. You make the compiler work a little more when building your assembly, but the generated IL is exactly the same than if you explicitly use types.
So, the performances are only at compile time, and frankly, who cares of the performances at compile time?
Let’s go over this again.
Look at this code and the generated IL:
static void Main(string[] args) { var s = "Hello"; }
.method private hidebysig static void Main(string[] args) cil managed { .entrypoint .maxstack 1 .locals init ( [0] string s) L_0000: nop L_0001: ldstr "Hello" L_0006: stloc.0 L_0007: ret }
Now the same code, using var instead of String:
static void Main(string[] args) { var s = "Hello"; }
.method private hidebysig static void Main(string[] args) cil managed { .entrypoint .maxstack 1 .locals init ( [0] string s) L_0000: nop L_0001: ldstr "Hello" L_0006: stloc.0 L_0007: ret }
Exactly the same IL. Exactly.
To quote MSDN:
It is important to understand that the var keyword does not mean "variant" and does not indicate that the variable is loosely typed, or late-bound. It just means that the compiler determines and assigns the most appropriate type.
It cannot be more clear!
So please, stop picking on the poor var. Go pick on dynamic.
Parameter Passing and Reference Types in C#
A few days ago, I had a question at work on why “regular” objects could be modified when in a method, while strings couldn’t. That’s a good question that any C#/Java developer (and many other languages) will ask at some point.
The issue there is the general misunderstanding about parameter passing in C#. My response to that is generally to say that “In C#, all parameters are passed by value”. Which is wrong if you include ref and out modifiers, but I won’t cover these in this post. I say it that way because it makes people think about it. Generally the first answer is “No because I can modify an object in a method!”.
That’s when you realize that most of the people understand the logic, but have issues being precise on telling what is truly happening. It has to do with the fact that strings are reference types in .NET. Taken from MSDN:
Variables of reference types, referred to as objects, store references to the actual data.
To get with my previous statement, the value of a reference variable is the reference to the actual data. Also, the actual data will be stored on the heap, but this is implementation details that we should not take into account.
So, to sum it up, when you pass a variable of reference type to a method, you actually pass the reference itself, as a value, to the method.
Here is very simple sample of code to illustrate all of that:
static void Main() { String name = "Philippe"; Console.WriteLine(name); Modify(name); Console.WriteLine(name); Console.ReadLine(); } static void Modify(String text) { Console.WriteLine(text); text = "Hello"; Console.WriteLine(text); }
And here is the printout for this:
Philippe
PhilippeHello
Philippe
Let’s examine the memory during the different phases.
Here is the memory just before the call to Modify:
Now, here is the memory when entering Modify:
We can see that both text and name variables have the same value, namely the reference to a location in the heap that contains a string which content is “Philippe”.
Now, with that picture in mind, it’s very easy to imagine what will happen when we change the value of text:
As simple as that. We modified the value of text, assigning it the reference to a string containing “Hello” somewhere in the heap. But we didn’t modify the name variable nor it’s content.
When we exit the Modify method, text variable gets out of scope and is eligible for garbage collection. The name variable was not modified in the process.
Now, this confusion also arises because of the fact that string are immutable. Mutable object’s internal content can be modified if you have a reference to them, but this does not hold true for immutable reference objects as you don’t modify them, you create new.
For a very good extensive tutorial on this topic, please see Jon Skeet’s excellent article on the subject. It’s probably much more clear that what I can explain.
Convert.ToInt32(String) With an Empty String
Lately, I have been struggling with the Convert.ToInt32 overload that takes a String as a parameter. Basically, it’s the same as Int32.Parse, except that if the given String is null, it returns 0 instead of throwing a FormatException. That’s quite cool, but Convert.ToInt32 still throws when the argument is an empty String…
Now, my particular case is that I’m retrieving data from Sharepoint, and that the field retrieved can be empty if the user left it blank (when retrieving the field trough a DataTable).
The workaround to that issue is actually pretty silly: just add a 0 at the beginning of the parsed String, and it will work all the time (and return 0 when the string is empty, as it does with null). Now mind you, this only works with positive integers. If your integer is negative, adding a 0 in front of it will make the Convert.ToInt32 throw a FormatException again…
String theInt = ""; //Throws... Console.WriteLine(Convert.ToInt32(theInt)); //Doesn't throw Console.WriteLine(Convert.ToInt32("0" + theInt)); theInt = Int32.MaxValue.ToString(); //Doesn't throw Console.WriteLine(Convert.ToInt32("0" + theInt)); theInt = Int32.MinValue.ToString(); //Throws again... Console.WriteLine(Convert.ToInt32("0" + theInt));
So, this is safe to use if you are certain that the integer in the String is always positive.
PS: note that I always write String with a capital S in my code (and in text), this is a habit left from Java I guess…
C# Compiler Emits call IL Instruction for Instance Methods Called on the Reference Returned by a Constructor
Some time ago, I wrote how the call instruction could actually call an instance method on a null reference, and that inside this instance method, the this keyword would reference to null.
I find that very interesting, so I kept on disassembling some sample code to see what’s the generated IL and try to grasp some of the compiler’s logic.
Here is some simple code used to see what’s the IL generated by the C# compiler:
class Hello1 { internal String GetHello() { return "Hello1"; } } sealed class Hello2 { internal String GetHello() { return "Hello2"; } } static void Main(string[] args) { var h1 = new Hello1(); Console.WriteLine(h1.GetHello()); var h2 = new Hello2(); Console.WriteLine(h2.GetHello()); Console.WriteLine(new Hello1().GetHello()); Console.WriteLine(new Hello2().GetHello()); Console.ReadLine(); }
In the first two calls, we use a local variable that we call the GetHello method on, and in the two last calls we instantiate the object and call the GetHello method on the reference returned by the constructor, reference that we don’t keep.
Here’s the IL generated for the Main method:
We can see that in the first two call, the callvirt instruction is emitted by the compiler. As the call happens on a variable, the runtime type of the object could be different from the compile type, meaning that using the callvirt instruction makes sense (the compiler is not “smart” enough to detect that the compile time and the runtime types are the same).
In the two subsequent calls, however, as the method call is done on the reference returned by the constructor, the instruction emitted is call, which is slightly more performant than callvirt.
For more information on call and callvirt instructions, see ECMA 335 12.4.1.2.
