When working with collections of data to which a set of rules, filters or data transformations has to be applied I often see implementations which are constructing one list after another to hold data between different workflow steps. Those solutions can be inelegant, make code hard to read and consume unnecessary memory. Those issues can be easily addressed with help on IEnumerable(T)
interface and extension methods.
First, imagine scenario in which we load data from external source, lets say a CSV file provided by customer. The data can be expressed by following entity:
public class Entity
{
public int Id { get; set; }
public int CategorydId { get; set; }
public int UserId { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public decimal Amount { get; set; }
}
Now, before we can enter it to the system we need to normalise value in Name
property. For this task we are using an implementation of INameCanonicalisator
. Also we have to apply tax to the Amount
. This calculation is done by implementation of IAmountTaxCalculator
. Below are definitions of those interfaces:
public interface INameCanonicalisator
{
string ToCanonicalForm(string name);
}
public interface IAmountTaxCalculator
{
decimal CalculateTax(decimal value);
}
To make our example more interesting lets also assume that we are interested only in entries where normalised name starts with letter “a” and amount after taxes is equal to or greater then 50000. One way of implementing above requirements is as follow:
private void ProcessData(IEnumerable<Entity> entities)
{
var entitiesWithCanonicalName = new List<Entity>();
foreach (var entity in entities)
{
entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
if (entity.Name.StartsWith("a"))
entitiesWithCanonicalName.Add(entity);
}
var entitiesWithRecalculatedTax = new List<Entity>();
foreach (var entity in entitiesWithCanonicalName)
{
entity.Value = taxCalculator.CalculateTax(entity.Value);
if (entity.Value >= 50000)
entitiesWithRecalculatedTax.Add(entity);
}
foreach (var entity in entitiesWithRecalculatedTax)
{
// code for saving an entity
}
}
First problem with above approach is that the method has multiple responsibilities (normalising name, calculating taxes, filtering and saving data). Other problem is related to memory consumption. Every new list has to create an array to hold entities and with big set of input data we are risking an OutOfMemoryException
.
The first issue can be solved by moving chunks of code into separate methods leaving ProcessData
responsible for managing workflow only:
private void ProcessData(IEnumerable<Entity> entities)
{
var entitiesWithCanonicalName = CanonicaliseEntityNames(entities);
var entitiesWithRecalculatedTax = EntitiesWithRecalculatedTax(entitiesWithCanonicalName);
SaveEntities(entitiesWithRecalculatedTax);
}
private static IEnumerable<Entity> CanonicaliseEntityNames(IEnumerable<Entity> entities)
{
var entitiesWithCanonicalName = new List<Entity>();
foreach (var entity in entities)
{
entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
if (entity.Name.StartsWith("a"))
entitiesWithCanonicalName.Add(entity);
}
return entitiesWithCanonicalName;
}
private static IEnumerable<Entity> EntitiesWithRecalculatedTax(IEnumerable<Entity> entities)
{
var entitiesWithRecalculatedTax = new List<Entity>();
foreach (var entity in entities)
{
entity.Value = taxCalculator.CalculateTax(entity.Value);
if (entity.Value >= 50000)
entitiesWithRecalculatedTax.Add(entity);
}
return entitiesWithRecalculatedTax;
}
private static void SaveEntities(IEnumerable<Entity> entities)
{
foreach (var entity in entities)
{
// code for saving an entity
}
}
To address the second issue we need to find a way to not create new collection of items on different steps in our workflow. LINQ has a set or methods which can be applied to IEnumerable
and which allow to process data “on the fly” without a need to create new collections. To solve memory problem we could rewrite our methods as follow:
private static IEnumerable<Entity> CanonicaliseEntityNames(IEnumerable<Entity> entities)
{
return entities
.Select(i =>
{
i.Name = nameCanonicalisator.ToCanonicalForm(i.Name);
return i;
})
.Where(i => i.Name.StartsWith("a"));
}
private static IEnumerable<Entity> EntitiesWithRecalculatedTax(IEnumerable<Entity> entities)
{
return entities
.Select(i =>
{
i.Value = taxCalculator.CalculateTax(i.Value);
return i;
})
.Where(i => i.Value >= 50000);
}
By using Select
and Where
methods from LINQ we defer execution of the code to a point when data is requested. We also avoid creating new collections as data is returned by an enumerator.
By using an extension methods we can make the code even more readable chaining calls to extensions:
private void ProcessData(IEnumerable<Entity> entities)
{
entities
.WithCanonicalName(nameCanonicalisator)
.Where(i => i.Name.StartsWith("a"))
.WithTaxApplied(taxCalculator)
.Where(i => i.Value >= 50000);
SaveEntities(entities);
}
And below is a class with extension methods:
public static class EntitiesEnumerableExtensions
{
public static IEnumerable<Entity> WithCanonicalName(this IEnumerable<Entity> entities, INameCanonicalisator nameCanonicalisator)
{
foreach (var entity in entities)
{
entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
yield return entity;
}
}
public static IEnumerable<Entity> WithTaxApplied(this IEnumerable<Entity> entities, IAmountTaxCalculator taxCalculator)
{
foreach (var entity in entities)
{
entity.Value = taxCalculator.CalculateTax(entity.Value);
yield return entity;
}
}
}