Instance Methods Called on null References
In a previous post, I wrote how you can call Extension Methods on null references, as in fact the are static methods with one more parameter, the extended object itself.
I’m currently reading CLR via C# (which is a fascinating read), and I was surprised to learn in chapter 6 how the CIL instructions call and callvirt actually work.
What is amazing is that for methods called with the call instruction, the CLR does not check if the referenced object is null. The method call will succeed, but the this reference will be null in the instance method. Actually, in both cases, the reference to the object that the method was called on is passed as a hidden parameter to the method.
Before examining this, another interesting fact is the that the C# compiler mostly emits callvirt instructions when calling a method, which checks if the reference is null. To test the call instruction easily, we will have to disassemble, modify then reassemble the following code:
public class SomeClass { public String GetHello() { if (this == null) { return "Amazing!"; } return "Hello"; } } class Program { static void Main(string[] args) { var o = null as SomeClass; var hello = o.GetHello(); Console.WriteLine(hello); } }
Pretty dumb, right? Especially the if statement where we check if this is null. It’s seems logical to most of us that this will throw a NullReferenceException. However, this is just to get the compiler to build us code that is very close to what to achieve, so we don’t have to write IL ourselves.
After running ILDasm.exe on the assembly, this is what we have in the Main method:
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 18 (0x12)
.maxstack 1
.locals init ([0] class Sandbox09.SomeClass o,
[1] string hello)
IL_0000: nop
IL_0001: ldnull
IL_0002: stloc.0
IL_0003: ldloc.0
IL_0004: callvirt instance string Sandbox09.SomeClass::GetHello()
IL_0009: stloc.1
IL_000a: ldloc.1
IL_000b: call void [mscorlib]System.Console::WriteLine(string)
IL_0010: nop
IL_0011: ret
} // end of method Program::Main
As we can see, the call to GetHello is done with the callvirt instruction. As this instruction checks if the object is null (and in this case, it is), this will fail at runtime.
Just to make sure, I used ILasm.exe to build the assembly and ran it, here is what it outputs:
Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
at Pvle.Program.Main(String[] args)
Now, let’s try to replace the callvirt by call to see how it behaves.
IL_0004: call instance string Sandbox09.SomeClass::GetHello()
Now, run it again trough ILasm.exe once more and run it. Here’s what it outputs:
Amazing!
The actual difference between call and callvirt is that call calls the method on the compile time type of the object, so there is no need to check if the reference is null. The object will be passed as a hidden parameter to the method and will be references as this. It’s very similar to extension methods.
Callvirt, on the other hand, will resolve the method that is to be called at runtime, depending on the runtime type of the object, so the object cannot be null. The CLR enforces this check at runtime.
What About Value Types?
For value types, it’s a bit different. As they are implicitly sealed, the only methods that are virtual are the ones that are defined in System.Object. Oh wait, there is another case: if the value type is cast to an interface it implements, calls to methods on that variable will be using callvirt, as the value type will have to be boxed.
Here is some sample code that demonstrates this:
public interface ISomeInterface { String GetHelloFromInterface(); } public struct SomeClass : ISomeInterface { public String GetHello() { return "Hello"; } public override string ToString() { return "Hello"; } public String GetHelloFromInterface() { return "Hello from interface"; } } class Program { static void Main(string[] args) { var o = new SomeClass(); var hello = o.GetHello(); o.ToString(); o.GetHelloFromInterface(); ((ISomeInterface)o).GetHelloFromInterface(); o.GetHashCode(); } }
And here is the corresponding IL for the Main method:
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 66 (0x42)
.maxstack 1
.locals init ([0] valuetype Pvle.SomeClass o,
[1] string hello)
IL_0000: nop
IL_0001: ldloca.s o
IL_0003: initobj Pvle.SomeClass
IL_0009: ldloca.s o
IL_000b: call instance string Pvle.SomeClass::GetHello()
IL_0010: stloc.1
IL_0011: ldloca.s o
IL_0013: constrained. Pvle.SomeClass
IL_0019: callvirt instance string [mscorlib]System.Object::ToString()
IL_001e: pop
IL_001f: ldloca.s o
IL_0021: call instance string Pvle.SomeClass::GetHelloFromInterface()
IL_0026: pop
IL_0027: ldloc.0
IL_0028: box Pvle.SomeClass
IL_002d: callvirt instance string Pvle.ISomeInterface::GetHelloFromInterface()
IL_0032: pop
IL_0033: ldloca.s o
IL_0035: constrained. Pvle.SomeClass
IL_003b: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
IL_0040: pop
IL_0041: ret
} // end of method Program::Main
We can see that when calling the method trough the interface, the value type is boxed.
I find this very interesting in understanding how calls to methods actually work. Getting your nose in IL is always a good idea when you want to see what happening under the hood, but I have to admit that this is the first time that I modify it and reassemble it.
SelectMany, Sorting and Grouping Objects
So here is the problem: I have a list of items that
var collection = new[] { new { Title = "One", References = "1;3" }, new { Title = "Two", References = "2;3" }, new { Title = "Three", References = "1;4" }, new { Title = "Four", References = "4"} };
The References fields of these object is some kind of category. What I want to do here is to have a list for each different reference (in this example: 1, 2, 3 and 4) containing all the items that are in the reference. Items will be duplicated if they are in more than one category.
To sum it up, the expected output would be: One, Three, Two, One, Two, Three, Four
After fooling around a bit, here is the query I came out with:
var query = from c in collection from d in c.References.Split(';') orderby d group c by d into groups select groups;
This does exactly what I want and produces the output I expected from the input data.
However, when I use Linq, I generally use extensions methods directly and not the pretty query syntax. This is mostly because I want to understand what happens behind the scene, and I have to admit that this query was quite a beast.
First, as there are two from clauses, there is a SelectMany somewhere. You probably know that SelectMany is a kind of the beast and that understanding it fully is quite a challenge compared to the other operators/extensions methods. Also, I thought that the GroupBy clause was going to be tough, as we groups c items by d which is in the other collection.
I couldn’t figure out by myself how to write that query using extension methods, so I fell back on the good old Reflector that gave me a straight answer:
var query = collection.SelectMany(delegate (<>f__AnonymousType0c) { return c.Values.Split(new char[] { ';' }); }, delegate (<>f__AnonymousType0 c, string d) { return new { c = c, d = d }; }).OrderBy(delegate (<>f__AnonymousType1<<>f__AnonymousType0 , string> <>h__TransparentIdentifier0) { return <>h__TransparentIdentifier0.d; }).GroupBy(delegate (<>f__AnonymousType1<<>f__AnonymousType0 , string> <>h__TransparentIdentifier0) { return <>h__TransparentIdentifier0.d; }, delegate (<>f__AnonymousType1<<>f__AnonymousType0 , string> <>h__TransparentIdentifier0) { return <>h__TransparentIdentifier0.c; }).Select(delegate (IGrouping <>f__AnonymousType0 > groups) { return groups; });
After reading that, it made much more sense. Here is what I came up with when writing it on my own:
var p = collection .SelectMany(c => c.References.Split(';'), (c, d) => new { c, d }) .OrderBy(t => t.d) .GroupBy(t => t.d, c => c.c);
Much more readable. The idea here is that the SelectMany clause outputs a sequence of anonymous types that contains the two kind of elements. This sequence is then sorted with the OrderBy, and finally fed trough a GroupBy that uses the d property as the grouping key and the c property as the project in the resulting collections. Not that difficult after all…
Here is another version that is probably a bit more clear:
var q = collection .SelectMany(c => c.References.Split(';'), (c, d) => new { Title = c.Title, Reference = d }) .GroupBy(c => c.Reference, c => c.Title) .OrderBy(g => g.Key);
Note that this is a simplified version of the original issue. The issue itself was to do this with some ListItems retrieved from SharePoint. Objects were a bit more complicated, but logic is the same.
Double check locking Extension Method for ASP.NET Cache
On my current project, we need to cache some of the results retreived when querying into SharePoint. As the retrieving operation is quite expensive (as always with SharePoint) and that performances are very important, we must make sure that it is only executed once.
So, the way to do this is to use the double check locking mechanism. The idea is to check if the cache contains the item sought, if it does not contain it, lock then check again, in case another thread added it while we were acquiring the lock.
I wrote a generic extension method to do this, to which the lock handle to lock on is given as a parameter, as well as the function used to retrieve the item if it is not in the cache. In this case, the retrieved item will be stored in the cache (and of course returned to the called).
Code speaks louder than words:
public static class CacheExtension { public static T SetAndGetItemFromCache<T>(this Cache cache, String cacheKey, Object lockHandle, Func<Cache, T> addItemToCache) { Object item = cache.Get(cacheKey); if (item == null) { lock (lockHandle) { item = cache.Get(cacheKey); if (item == null) { item = addItemToCache(cache); } else { //The sought item has been added meanwhile in another thread/process } } } else { //Cache contains the key, return it } if (item is T) { return (T)item; } else { //The Object in the cache is not of type T, some other component might be using that key throw new ApplicationException(String.Format("Object retreived from the cache is not of the expected {0} type. Another component might be using the same key to store in the cache.", typeof(T).FullName)); } } }
This can be used on any System.Web.Caching.Cache object.
Foreach Statement Calls Dispose() on IEnumerator
Again, something that might seems natural because you generally don’t see it or even think about it, but interesting to know.
If the IEnumerator/IEnumerator<T> returned by the GetEnumerator() function of a collection that is foreach-ed implements IDisposable, Dispose() will be called on it when the foreach is over.
Here is a sample code that does just that:
class Program { static void Main(string[] args) { foreach (var item in new SomeEnumerable()) { Console.WriteLine(item); } Console.ReadLine(); } class SomeEnumerable : IEnumerable<String> { #region IEnumerable<string> Members public IEnumerator<String> GetEnumerator() { return new CustomEnumerator(new List<String>() { "One", "Two" }.GetEnumerator()); } #endregion #region IEnumerable Members System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); } #endregion class CustomEnumerator : IEnumerator<String> { IEnumerator<String> enumerator; public CustomEnumerator(IEnumerator<String> enumerator) { this.enumerator = enumerator; } #region IEnumerator<string> Members public string Current { get { return this.enumerator.Current; } } #endregion #region IDisposable Members public void Dispose() { Console.WriteLine("SomeEnumerable Enumerator Disposed!"); this.enumerator.Dispose(); } #endregion #region IEnumerator Members object System.Collections.IEnumerator.Current { get { return this.enumerator.Current; } } public bool MoveNext() { return this.enumerator.MoveNext(); } public void Reset() { this.enumerator.Reset(); } #endregion } } }
You can look at the two possible code expansions for the foreach statement on the MSDN page.
Using Statement with an “expression”
Reading More Effective C# book, I was amazed to find out that the using statement can be used simply with an expression:
using-statement:
using ( resource-acquisition ) embedded-statementresource-acquisition:
local-variable-declaration
expression
If the resource-acquisition is an expression, the variable resulting from the expression will be inaccessible to the embedded-statement.
Now, the reason why this is outlined in Bill Wagner’s book is when you write generic classes. Simply put, if expression is something like “a as IDisposable”, if a actually implements IDisposable, it will be disposed after the using statement. If it does not implements IDisposable, nothing will happen.
The beauty of this is that you don’t have to know, it will work in all cases.
I didn’t know that, and I find it very neat!
