Why I Love Extension Methods

The other day, I was using for the first time the Html Agility Pack library.

The method I use the most is the SelectNodes to which you give an XPath and that returns an HtlmNodeCollection containing the resulting HtmlNodes or null if no node where found.

I don’t know if this is a design decision, but returning null when there is no match is not very nice. If you use the expression as is in a foreach statement, it will throw a NullReferenceException if no match.

A simple solution is to use the coalescing operator next to the function’s call, in order to give the foreach an empty Enumerable to avoid the exception.

htmlNode.SelectNodes(xpath) ?? Enumerable.Empty<HtmlNode>()

This is working well, but it’s a bit ugly to repeat that in every foreach statement.

This is where Extension Methods are so enjoyable. Let’s just add a new method to our HtmlNode friend that returns an empty enumerable when SelectNodes return null.

internal static class HtmlAgilityPackExtension
{
    internal static IEnumerable<HtmlNode> SelectNodesOrEmpty(this HtmlNode htmlNode, String xpath)
    {
        return htmlNode.SelectNodes(xpath) ?? Enumerable.Empty<HtmlNode>();
    }
}

There we go. From now on, I can simply foreach over a SelectNodesOrEmpty result of any HtmlNode, with no fear of any exception.

Do You Noda Time?

A few days ago, Jon Skeet announced that he was starting an open source project with a unknown name (that found itself named Noda Time) that aimed to provide a .NET equivalent to Joda Time.

As there were clearly some openings, I proposed to join, and there is now a Google Group for the project. Jon has great expectations for the project itself but also for the methodology. He wants that project to be model:

I want it to be a shining example of how to build, maintain and deploy an open source .NET library.

I’m really eager to see how this will go. I think this is a great opportunity for me to learn a lot of new stuff and meet people.

Next steps for me are to keep up with the ongoing discussions and learn more about Joda Time as I never used it.

I hope I can prove myself useful!

Building Paths in C# and Java

When it comes to build full path files strings in C# or Java, I very often see methods being reinvented again and again.

However, both framework offer utility classes to do just that for you. An it has been developed by some of the most skilled developers on the planet. How can you beat that? So stop rolling out your own path building functions!

.NET Version (in C#)

I’m always astonished that so many developers don’t know the System.IO.Path utility static class of the .NET Framework.

As an example, see this code:

String file1 = Path.Combine(@"C:\tmp\test\", "test.txt");
String file2 = Path.Combine(@"C:\tmp\test", "test.txt");

//file1 == file2

//"C:\tmp\test\test.txt"
Console.WriteLine("File: {0}", file1); 

Doesn’t matter if you forgot the “\” at the end of the directory string, it will be added. If it’s already there, it’s also fine! If you don’t know that stuff yet, have a look at the others methods available.

Now, onto Java.

Java Version

Java has the File class to do that. It’s very easy to use, but it doesn’t work the same way. You have to instantiate a new File object every time, instead of calling a static method like in .NET.

File file1 = new File("C:\\tmp\\test", "test.txt");
File file2 = new File("C:\\tmp\\test\\", "test.txt");

Now, this object is an abstract representation of the “C:\tmp\test\test.txt” file, so if you call toString() on it, you get that path as a string.

Side Notes

Two interesting things I noticed while writing this.

First, a difference between .NET and Java path creators. The .NET Path.Combine will return only the second string if it starts with the “\” character, while the Java File class will ignore it. the .NET class probably assumes that it’s a relative path, while the Java class ignores it.

Second, if you try to create an instance of the .NET Path class, Visual Studio will give you the following error message: “Cannot create an instance of the abstract class or interface ‘System.IO.Path’”. Once you tried to build, if fails and the error message changes to “Cannot create an instance of the static class ‘System.IO.Path’”. It’s clearly related to Visual Studio, but it’s the first time I notice that behavior (the first error message was quite suspicious with the “abstract class or interface” thingy). Anyway, Visual Studio warns you that it cannot be done, which is the important thing, then it refines it’s error message based on the output of the compiler.

Boxing and Unboxing in .NET

I realized that the way boxing and unboxing works in .NET was not something I knew well, so I decided to write some small recap along with code to test boxing/unboxing behaviors.

Some Theory

In .NET, even if value types derive from the uber root object System.Object, they need the special boxing operation to be treaded as object.

A good explanation is given in the C# Language Specification:

A value of a class type can be converted to type object or to an interface type that is implemented by the class simply by treating the reference as another type at compile-time. Likewise, a value of type object or a value of an interface type can be converted back to a class type without changing the reference (but of course a run-time type check is required in this case).

Since structs are not reference types, these operations are implemented differently for struct types. When a value of a struct type is converted to type object or to an interface type that is implemented by the struct, a boxing operation takes place. Likewise, when a value of type object or a value of an interface type is converted back to a struct type, an unboxing operation takes place.

So, when a value type needs to be converted to an object, it is boxed in a reference type. As stated, boxing means that the value type gets copied in the wrapping reference type.

A boxing conversion implies making a copy of the value being boxed. This is different from a conversion of a reference-type to type object, in which the value continues to reference the same instance and simply is regarded as the less derived type object.

The important thing to bear in mind is that boxing and unboxing happens automatically. Everywhere an reference type is expected but a value type is used instead, the value type is automatically boxed.

Another important point is that if the value type has overrides some of the virtual methods inherited from object, invocation of these methods on the value type does not require boxing.

Some Examples

Now, let’s see how it works, and what are the caveats.

A first interesting case is directly taken from Bill Wagner’sEffective C#”. Consider this code:

Console.WriteLine("A few numbers:{0}, {1}, {2}", 25, 32, 50);

WriteLine takes an array of object references as parameters. This means that the three value types will be boxed before calling the ToString() method on them. To avoid this, ToString() method should explicitely be called on each of these int in order to provide WriteLine with string which are reference types, so there is no boxing.

Console.WriteLine("A few numbers:{0}, {1}, {2}", 25.ToString(), 32.ToString(), 50.ToString());

So, that’s one thing to keep in mind. It’s good for performances reasons, but it doesn’t introduce any bug.

The next step is the copy of the value type in the box itself, meaning that any change to the copied value type in the box will not be reflected in the original copy. Also, when unboxing, the value from the box is copied again.

int i = 5;
var j = (object)i;

i++;

Console.WriteLine("i: {0}", i); //Prints 6
Console.WriteLine("j: {0}", j); //Prints 5

j = (object)i;
var k = (int)j;

k++;

Console.WriteLine("j: {0}", j); //Prints 6
Console.WriteLine("k: {0}", k); //Prints 7

When you think about it, is quite easy and it makes a lot of sense, as in C#, things are copied by value (meaning that when you copy a reference type, you copy the value of the reference, which is what the variable stores).

Last funny thing:

Object.ReferenceEquals(5, 5); //Returns false

It’s easy to understand. The two value types (5 and 5) are each boxed in a separate reference type, which don’t have the same reference.

A small warning tough, everything I write here is from quite trusted sources, but it may happen that I misunderstood something, so ask a real expert if you want to be sure…

Difference between Simplified Initialisation and Anonymous Types

Here is a sample code that declares a Person class then uses Simplified Initializations to set its public properties. This code also instantiates an Anonymous Type that has two properties named exactly like the two properties of the Person class.

class Person
{
    public String Name { get; set; }
    public int Age { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var p = new Person { Name = "Philippe", Age = 27 };
        var q = new { Name = "Johndoe", Age = 88 };

        Console.ReadLine();
    }
}

Initializations of variables p and q looks quite similar. However, under the covers, it is very different.

Let’s look at the generated C# for this, using Reflector:

private static void Main(string[] args)
{
    Person <>g__initLocal0 = new Person();
    <>g__initLocal0.Name = "Philippe";
    <>g__initLocal0.Age = 0x1b;
    Person p = <>g__initLocal0;
    var q = new {
        Name = "Johndoe",
        Age = 0x58
    };
    Console.ReadLine();
}

So, what do we see there?

First, Person’s constructor is invoked (I guess they mean that you d’ont have to invoke it yourself), and that the compiler uses a temporary variable to store this new Person object. Then, the properties set in the simplified initialization are set on that temporary variable, and at last our variable is assigned the reference to the object that is now ready to be used. This is clearly done for atomicity reasons, as the object will not be available while in inconsistent state.

Second, we see that for the anonymous type, it’s pretty much the same as the original code. However, there is no trace of an intermediate variable used during initializations of the object’s properties. Let’s have a look at the code generated for that anonymous type:

[CompilerGenerated, DebuggerDisplay(@"\{ Name = {Name}, Age = {Age} }",
    Type="<Anonymous Type>")]
internal sealed class <>f__AnonymousType0<<Name>j__TPar, <Age>j__TPar>
{
    // Fields
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Age>j__TPar <Age>i__Field;
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Name>j__TPar <Name>i__Field;

    // Methods
    [DebuggerHidden]
    public <>f__AnonymousType0(<Name>j__TPar Name, <Age>j__TPar Age);
    [DebuggerHidden]
    public override bool Equals(object value);
    [DebuggerHidden]
    public override int GetHashCode();
    [DebuggerHidden]
    public override string ToString();

    // Properties
    public <Age>j__TPar Age { get; }
    public <Name>j__TPar Name { get; }
}

Now, two things to note:

  1. There is no default constructor for that anonymous type, there is only a constructor that takes the values of the two properties.
  2. The two properties are declared as readonly, so they cannot be assigned once the object has be instantiated. In fact, Anonymous Types are immutable.

So, it is quite clear from the generated code that when instantiating an anonymous type, the compiler translates this into a call to the anonymous type’s constructor. It is not shown in Reflector’s C# disassembler, but it can be seen using Reflector’s IL disassembler:

L_001e: ldstr "Johndoe"
L_0023: ldc.i4.s 0x58
L_0025: newobj instance void <>f__AnonymousType0`2<string, int32>::.ctor(!0, !1)

As expected, the call will be to the constructor of the anonymous type.

To sum it up, even if the following two lines look very similar, the reverse call will be very different:

var p = new Person { Name = "Philippe", Age = 27 };
var q = new { Name = "Johndoe", Age = 88 };

For variable p, the specified constructor will be called (in this case, the default one), then properties will be set on the newly created object. For variable q, the generated constructor will be called using properties given as parameters.

Next Page →