Code Kata: Kata Two

I haven’t been coding much at work recently, so I decided that I need to sharpen the saw a bit and exercise.

I decided to start with some Code Kata. The first one is somewhat an academical problem: implement binary search in five different ways.

Here is my first solution, the iterative way:

        //Classic iterative method
        public int Chop1(int element, IEnumerable<int> collection)
        {
            //Empty collection -> not found
            if (collection.Count() == 0) return -1;

            var lower = 0;
            var upper = collection.Count() - 1;
            int index, candidate;

            while (lower != upper)
            {
                index = lower + ((upper - lower) / 2);
                candidate = collection.ElementAt(index);

                if (candidate == element)
                {
                    return index;
                }
                else if (candidate < element)
                {
                    lower = index + 1;
                }
                else //if (candidate > element)
                {
                    upper = index;
                }
            }

            return (collection.ElementAt(upper) == element) ? upper : -1;
        }

I was actually scared at the time it took me. This is the kind of problem you are expected to solve in 5 minutes flat at university, but the lack of programming puzzles lately soften my brain. On the bright side, I challenged myself to get it right without compiling before being absolutely sure that the code would pass the tests, which it did. This involved some scratching paper to test the different corner cases, but I felt good when the green light of passing tests appeared.

Here is my second solution, recursive and passing slices of the original array:

        //Recursively passing smaller slices of the initial array
        public int Chop2(int element, IEnumerable<int> collection)
        {
            //Empty collection -> not found
            if (collection.Count() == 0) return -1;

            var index = collection.Count() / 2;
            var candidate = collection.ElementAt(index);

            if (candidate == element)
            {
                return index;
            }
            else if (candidate < element)
            {
                var offset = index + 1;
                var position = Chop2(element, collection.Skip(index + 1));
                return position > -1 ? offset + position : -1;
            }
            else //if (candidate > element)
            {
                return Chop2(element, collection.Take(index));
            }
        }

After finishing this one, I started thinking about it and found it very funny that this came more naturally than a recursive function using boundaries inside the array. I tend to use LINQ a lot when I code in C#, and I guess this is where it shows. The good news is that this led me to the third possible solution, which would have been second if I was just out of university.

Third solution, recursive and passing arrays boundaries:

        //Classic recursive method, passing boundaries around
        public int Chop3(int element, IEnumerable<int> collection)
        {
            //Empty collection -> not found
            if (collection.Count() == 0) return -1;

            return SubChop3(element, collection, 0, collection.Count() - 1);
        }
        //...and its sub method required for passing boundaries around
        private int SubChop3(int element, IEnumerable<int> collection, int lower, int upper)
        {
            //End case: lower == upper
            if (lower == upper)
            {
                if (collection.ElementAt(lower) == element) return lower;
                else return -1;
            }

            var index = lower + ((upper - lower) / 2);
            var candidate = collection.ElementAt(index);

            if (candidate == element)
            {
                return index;
            }
            else if (candidate < element)
            {
                return SubChop3(element, collection, index + 1, upper);
            }
            else //if (candidate > element)
            {
                return SubChop3(element, collection, lower, index);
            }
        }

Nothing too fancy here.

I don’t have a 4 and 5 yet, but I already have one idea that is very challenging: continuation passing style. I was also tempted to implement one version using TPL, but it wouldn’t be very different from the recursive methods, so I am not sure… Maybe using a custom partitioner?

Comments

5 Responses to “Code Kata: Kata Two”

  1. Christophe Beyls on September 9th, 2013 21:30

    Hi Philippe!

    I’m concerned about the performance of your solutions, because the way you access the collection is entirely based on the IEnumerable interface, which prevents you to access any element directly by index and makes it much slower, requiring multiple unnecessary collection traversals.

    For example, the Count() method in LINQ will actually traverse the entire collection of elements before returning, so it has a O(n) complexity. (BTW If you just want to check if an IEnumerable collection is empty, it’s better to use the IsEmpty() LINQ extension method because it returns immediately after checking the first element)

    Each time you call ElementAt(i), the LINQ implementation will actually traverse the i first nodes of the collection before returning, which is also much slower than a direct array-based access.

    Not to mention that each call to a LINQ extension method will also instanciate a new IEnumerator by calling GetEnumerator() on your IEnumerable which is also a heavy operation in this scenario.

    So to implement this kind of algorithm you must always use an ICollection to get a O(1) element access, not an IEnumerable (C# arrays and List already implement ICollection).

  2. Christophe Beyls on September 9th, 2013 22:25

    After looking up, it seems that the recent versions of the .NET runtime are smarter than I thought and will at least call the Count property directly instead of iterating if your IEnumerable is actually an ICollection instance.

    But this trick will only work if you pass a raw ICollection as parameter, which means that if you pass the result of another LINQ method to your binary search function, it will suffer from these performance issues anyway.

    This also means that in your first recursive solution, only the first level call can benefit from a slight performance gain, but since you use Skip() and Take() and pass the resulting IEnumerable to the other levels, all these other function calls will need to deal with slow O(n) operations where O(1) would have been possible.

    Also don’t forget that methods like Skip() and Take() create new IEnumerable objects that are lazy-evaluated, it’s just a filtered view of your collection. If you do this recursively with n levels deep, your collection will be encapsulated into n IEnumerable layers which will make the execution of your LINQ methods slower.

    So for better clarity and less unpredictable side effects, always use the interface most suited for the algorithm, which is ICollection in this case. Hope this helps!

  3. Philippe on September 10th, 2013 11:09

    Most of the LINQ operators (in the Enumerable class, as extension methods) are actually quite smart :-)

    I suppose that by IsEmpty(), which doesn’t exist, you mean Any() – indeed I should have done that, thanks for pointing it out.

    You are also right with Skip and Take (which, interestingly, are not optimized at all – speculations are that it’s in case the source sequence is modified during the process), the result is a lazy sequence that will have to be iterated at the next Count() call. I could add a ToArray() or ToList(), which would increase the performances a little bit.

    Agreed, ICollection is more suited in this case than IEnumerable, but I picked up the reflex of always working with the top most interface with Linq. This makes for a interesting case of rewriting this and checking if the sequence is an ICollection then optimizing accordingly!

  4. Christophe Beyls on September 10th, 2013 19:03

    You’re right, I forgot the LINQ method name and it’s actually called Any() and not IsEmpty().

    The whole point of binary search is to find an element with 0(log n) complexity, which is faster than iterating over an array, which has O(n) complexity. Or in the first line of your code, the Count() call has already O(n) complexity which makes the algorithm basically pointless (except for the performance tweak of the ICollection check that LINQ does).

    My point is that you should not use LINQ to implement this algorithm, because LINQ is at a higher level of abstraction (it works with enumerables, not sorted lists). You could copy your LINQ collection to a temporary array, but again this is pointless because the copy itself would have a O(n) complexity! So the only way to find a value in a sorted IEnumerable with LINQ is to iterate through the IEnumerable until you find it or find a greater value, which is slower than a binary search anyway.

    There is a BinarySearch() method already available on the List and Array classes, not on the Enumerable class.

    I think many developers use LINQ all the time without knowing how it actually works. It’s not a general-purpose replacement for arrays and list methods, but it’s still a good solution in many cases to make the code cleaner and simpler.

  5. Philippe on September 10th, 2013 23:54

    I completely agree, the method should use an ICollection and not and IEnumerable, it’s not the right abstraction for the tool. However I picked up the habit of writing methods with the highest interface, generally IEnumerable, to make them as generic as possible, relying on Linq’s underlying implementation to make the right call.

    I would still argue that the point here was to write it in a functional way to recursively pass sub arrays – this requires creating a sub array from the given array. But I should have done that with ArraySegment, which implements ICollection and not with Skip/Take that would ruin the performances.

    In my experience, I prefer to see people use Linq operators a lot, achieving (sometimes slow) correct results with readable code that can be fine tuned later on when and if required, rather than the other way around (we all stumbled upon some for i; for j; for k; loops that were performing fast but achieving the wrong results).