Thursday, August 19, 2010

Side Effects of Linq Select Statements due to Deferred Execution

I have a function to read data from a delimited text that is splitted into a string array. To the caller didn’t specifies the columns to extract, it extract all columns, and only extracts the specified one otherwise.

Assumed that I have a delimited text that contains 2 columns only: -

public DataField[] Read()
{
// splits the text according to the defined delimiters.
var data = _reader.Read();
// when no columns specified, take all column from data.
if (_columns == null)
{
var counter = 0;
// save as the instance fields to avoid recompilation.
_columns = data.Select(item => new ColumnInfo
{
Index = counter++
});
}

return data == null
? null
: _columns.Select(item => new DataField
{
Column = item,
Data = data[item.Index]
}).ToArray();
}


To my surprise that this Read method keeps throwing ArrayOutOfBoundException after the first attempt. Can anyone guess what’s wrong? :) (Answer: scroll down please)

After the first call, the enumerator is wellover (at its end element). However, IEnumerable created (which I thought it was) from a Linq statement is not a real creation of IEnumerable instance.

_columns = data.Select(item => new ColumnInfo
{
Index = counter++
});


This statement is rather associating _columns field to the Linq statement (like a callback, I think). The subsequent Read() call would continue to invoke the lambda expression within the linq statement for next element, which created column with Index=3. (This also explains why linq’s performance is good without much overhead).

To workaround this side effets, one need to really tell the Linq function to create a real array or list for caching and the side effect will be gone.
Solution as such: -

public DataField[] Read()
{
var data = _reader.Read();

if (_columns == null)
{
var counter = 0;
// ToArray() will tear off the callback by caching a real array, not callback.
_columns = data.Select(item => new ColumnInfo
{
Index = counter++
}).ToArray();
}

return data == null
? null
: _columns.Select(item => new DataField
{
Column = item,
Data = data[item.Index]
}).ToArray();
}


Read Jon Skeet article on Human Linq

Also here about side effects with select

2 comments:

  1. Hi,

    As I come here quite often to get .NET coding tips I wonder if you could use a syntax highlighter - it'll make your sample code way easier to read. Here's one way to do it on Blogger: Gist. Thanks!

    ReplyDelete
  2. I have looked into GIST as you suggested, for some reason I can't get it to work here. Also, considering that has to depends to GIST availability (in case this company goes down), I have eventually resolve to use some long term solution by embedding some CSS into the template. Anyway, thanks for the suggestion. :)

    ReplyDelete