Creating distinct collections of items is generally pretty easy in C#. We’ve got the Enumerable.Distinct() extension method that can take an IEqualityComparer if required and most of the time we don’t need anything more powerful than this.

At other times we want our deduplication process to be a bit more involved. In my case, I need to deduplicate one collection based on items in x number of other collections.

Enter “DeduplicationService” and as Phil Karlton once said, with the latter being more than applicable here:

“There are only two hard things in Computer Science: cache invalidation and naming things.”

To begin with I had an idea of how I wanted to call my service and it looked like this:

?View Code CSHARP
service.Deduplicate(operand, x => x.Id, operatorList1, operatorList2, ...);

After a bit of investigation and upon implementing my first version of this I ended up with a method signature more like:

?View Code CSHARP
service.Deduplicate(operand, (x, y) => x.Id == y.Id, operatorList1, operatorList2, ...);

It turns out that this is actually rather useful if you want to deduplicate based on different properties between your operand and operators. You may want to deduplicate your operand based on its Id being equal to the ParentId of your operators, for example.

Since we’re currently passing in a Func to our Deduplicate method, I also wanted a way to deduplicate complex types based on a more complex comparison than just a single property. This led to introducing a new signature that takes an IEqualityComparerwhich turns out to be a great base for the rest of the method signatures I’ll describe.

So here’s how that method looks in its entirety:

?View Code CSHARP
public IEnumerable Deduplicate(IEnumerable operand, IEqualityComparer comparer, params IEnumerable[] operators)
        {
         // Flatten the operator collections into one collection.
         var operatorItems = operators.SelectMany(x => x);
 
         // Remove items from our operand collection that conform to our comparer.
         return operand.Except(operatorItems, comparer);
        }

Now that we have a flexible base to work from we need some prettier methods to call that do most of the common operations for us. The first of these brings us back to a previously mentioned signature:

?View Code CSHARP
public IEnumerable Deduplicate(IEnumerable operand, Func func, params IEnumerable[] operators)
        {
            var comparer = new Comparer(func);
            return this.Deduplicate(operand, comparer, operators);
        }

In this method we take a Func which translates as something like:

?View Code CSHARP
 (x, y) => x.SomeProp == y.SomeProp

We’ve also introduced a new type here, Comparer<T>. This is simple class that implements IEqualityComparerand gives us a way to wrap up our Func into something we can pass to IEnumerable.Except(). It looks like this:

?View Code CSHARP
public class Comparer : IEqualityComparer
    {
        private readonly Func comparer;
 
        public Comparer(Func comparer)
        {
            this.comparer = comparer;
        }
 
        public bool Equals(T x, T y)
        {
            return this.comparer(x, y);
        }
 
        public int GetHashCode(T obj)
        {
            return obj.ToString().ToLower().GetHashCode();
        }
    }

So now we can pass in a Lambda rather than creating a new equality comparer each time however this is still a bit convoluted for most situations. This brings me back to my most desired method signature:

?View Code CSHARP
service.Deduplicate(operand, x =&gt; x.Id, operatorList1, operatorList2, ...);

As it turns out this is pretty easy to achieve. We simply need a way of turning our Func<TIn, TOut> into a Func<TIn, TIn, bool> and since the result of this is the function of a comparison between x and y evaluated through the Func we end up with a slimline method that abstracts the detail from the consumer:

?View Code CSHARP
public IEnumerable Deduplicate(IEnumerable operand, Func func, params IEnumerable[] operators)
        {
            Func<TResult, TResult, bool> f = (x, y) => func(x) == func(y);
            return this.Deduplicate(operand, f, operators);
        }

We now have a clean method signature that accepts a simple Lambda and a few collections.

Job done? Almost.

Since internally (within our team) we know that most of our deduplication is going to be happening on items of a specific base class we can introduce a new signature that keeps things even more simple. It would be nice to have a method that knew what base type it was accepting and could present a default comparison, mitigating the need for any Lambda being passed in. For this we can just create a new signature that has a set property to filter on and we’re done!

?View Code CSHARP
public IEnumerable Deduplicate(IEnumerable operand, params IEnumerable[] operators)
 {
    return this.Deduplicate(operand, x => x.Id, operators);
 }
?View Code CSHARP
service.Deduplicate(operand, operatorList1, operatorList2, ...);

The full source code for this, with unit tests, can be found on GitHub here:

https://github.com/JamieDixon/DeduplicationService

If you have some ideas on how this can be improved then I’d love to hear them either in the comments or send me a pull request on GitHub!


One Comment

  1. Posted November 24, 2013 at 10:29 am | Permalink

    Hi Jamie,

    Nice post.

    Here is another implementation using an Expression tree in the case where a single property is used in the comparison.

    Another note: The GetHashCode() in the Comparer class is a bit too aggresive – check for nulls….(although I acknowledge that this may never be the case in your domain…).

    public interface IDeduplicationService
    {
    IEnumerable Deduplicate(IEnumerable operand, Expression<Func> propertyExpression, params IEnumerable[] operators);
    IEnumerable Deduplicate(IEnumerable operand, Func func, params IEnumerable[] operators);
    IEnumerable Deduplicate(IEnumerable operand, params IEnumerable[] operators);
    }

    public class DeduplicationService : IDeduplicationService
    {
    public IEnumerable Deduplicate(IEnumerable operand, Func func, params IEnumerable[] operators)
    {
    var comparer = new Comparer(func);
    var operatorItems = operators.SelectMany(x => x);
    return operand.Except(operatorItems, comparer);
    }

    public IEnumerable Deduplicate(IEnumerable operand, params IEnumerable[] operators)
    {
    return this.Deduplicate(operand, type => type.Id, operators);
    }

    public IEnumerable Deduplicate(IEnumerable operand, Expression<Func> propertyExpression, params IEnumerable[] operators)
    {
    var selector = propertyExpression.Compile();
    Func func = (x, y) => selector(x) == selector(y);
    return this.Deduplicate(operand, func, operators);
    }
    }

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>