Thursday, August 19, 2010

Side Effects of Linq Select Statements due to Deferred Execution

I have a function to read data from a delimited text that is splitted into a string array. To the caller didn’t specifies the columns to extract, it extract all columns, and only extracts the specified one otherwise.

Assumed that I have a delimited text that contains 2 columns only: -

public DataField[] Read()
{
// splits the text according to the defined delimiters.
var data = _reader.Read();
// when no columns specified, take all column from data.
if (_columns == null)
{
var counter = 0;
// save as the instance fields to avoid recompilation.
_columns = data.Select(item => new ColumnInfo
{
Index = counter++
});
}

return data == null
? null
: _columns.Select(item => new DataField
{
Column = item,
Data = data[item.Index]
}).ToArray();
}


To my surprise that this Read method keeps throwing ArrayOutOfBoundException after the first attempt. Can anyone guess what’s wrong? :) (Answer: scroll down please)

After the first call, the enumerator is wellover (at its end element). However, IEnumerable created (which I thought it was) from a Linq statement is not a real creation of IEnumerable instance.

_columns = data.Select(item => new ColumnInfo
{
Index = counter++
});


This statement is rather associating _columns field to the Linq statement (like a callback, I think). The subsequent Read() call would continue to invoke the lambda expression within the linq statement for next element, which created column with Index=3. (This also explains why linq’s performance is good without much overhead).

To workaround this side effets, one need to really tell the Linq function to create a real array or list for caching and the side effect will be gone.
Solution as such: -

public DataField[] Read()
{
var data = _reader.Read();

if (_columns == null)
{
var counter = 0;
// ToArray() will tear off the callback by caching a real array, not callback.
_columns = data.Select(item => new ColumnInfo
{
Index = counter++
}).ToArray();
}

return data == null
? null
: _columns.Select(item => new DataField
{
Column = item,
Data = data[item.Index]
}).ToArray();
}


Read Jon Skeet article on Human Linq

Also here about side effects with select

Thursday, August 5, 2010

TDD: Only Unit Testing Public

Many times when I spoke to people about TDD and Unit Testing, quite surprisingly I will be trapped into a debate of "Unit Test Public only?".

My quick answer is "Yes".

Reason?
Internal, Protected, Private "properties" of an Assembly or Class are the abstraction for each of them respectively. Unit tests should never cross the border to break the abstraction intend of a design.

DISCLAIMER: "properties" I used here refers to methods, subtypes, property (fields) etc.

Come to think of this topic again, why would one asked the question of such like? (unit test public?)
My only view is that, these may be the people who are writing real implementation first before writing the tests. They knew the design and implementation before even writing unit test and they knew the importance of the internal implementation and they would like to secure it (secure logic implementation intend). Are these suggesting TDD practice? (Refer to my earlier TDD post HERE).

If one started to think of the application (usage) of code to be developed first before thinking how to develop it (or design it), one should start with the accessible content. (Which is public content). Get the usage right is the prime criteria for TDD and also many software quality assurance measure. From the righteous usage of the public "properties" corresponded with right expectation and that makes the "right" deliverable.

So, spend more time thinking of what may be "right" for the time of writing, and ensure unit tested the "right" intend that you would want to enforced. Internal implementation are abstraction that meant for future re-development or refactoring or enhancement that usually and very likely to change from time to time without prior noticed.

This way, you have done your team and yourself a favor by created some maintainable unit tests.

Tuesday, August 3, 2010

MSSQL Common Table Expression (CTE)


What is Common Table Expression?
  • Can be thought as a temporally result set that is defined within the SQL execution scope.
  • Usage: -
  1. Create Recursive Query
  2. Substitute for a view
  3. Enable Grouping by column that is derived from a scalar sub select.
  4. Referencing the resulting table multiple times.
  5. Available since SQL Server 2005


Syntax: -

WITH [(coloumn name [, column name...n])]
AS
(
/* anchor member definition */
UNION ALL | UNION | EXCEPT | INTERCECT
/* recursion member definition*/
)



Statement to use with CTE include INSERT, UPDATE, DELETE, CREATE VIEW.

CTE can be used without recursion member and that will serve very well as the temporally result set for further joining.

Example: -
/*************** BEGIN OF QUERY **********************/
USE NRule
GO

WITH DocumentReport
AS
(
SELECT a.id, a.[Description], a.Name AS [DocumentName], c.Name AS [AttributeName], d.value
FROM Documents AS a
INNER JOIN DocumentMetaAttributePathIndex AS b
ON a.id = b.documentId
INNER JOIN MetaAttributes AS c
ON b.attributeId = c.id
INNER JOIN AttributeValues AS d
ON b.valueId = d.id
)
SELECT *
FROM DocumentReport ORDER BY id
GO
/*************** END OF QUERY **********************/

CTE would be best use to query of outcome that is expanding dynamically. Typical example is to trace the tree or graph data structure represented under adjancent list table design pattern within relational model of a database.

Example: -
/*************** BEGIN OF QUERY **********************/
USE NRule
GO

DECLARE @documentId bigint
DECLARE @ancestorId bigint
SET @documentId = 1
SET @ancestorId = NULL;
WITH AttributeHierarchyReport(AncestorId, AttributeId, DocumentId, ValueId, LEVEL)
AS
(
-- Anchor member definition
SELECT meta.AncestorId, meta.attributeId, meta.[DocumentId], meta.[ValueId], 0 AS LEVEL
FROM dbo.DocumentMetaAttributePathIndex AS meta
WHERE (@ancestorId is NULL or meta.AncestorId = @ancestorId) AND
(@documentId is NULL or meta.DocumentId = @documentId)
UNION ALL
-- Recursive member definition
SELECT meta.ancestorId, meta.attributeId, meta.[documentId], meta.[valueId], LEVEL + 1
FROM dbo.DocumentMetaAttributePathIndex AS meta
INNER JOIN AttributeHierarchyReport rpt ON meta.ancestorId = rpt.AttributeId
WHERE (@ancestorId is NULL or meta.AncestorId = @ancestorId) AND
(@documentId is NULL or meta.DocumentId = @documentId)
)
SELECT AncestorId, AttributeId, DocumentId, ValueId, Level
FROM AttributeHierarchyReport ORDER BY Level;
GO
/*************** END OF QUERY **********************/

However, watch out for the data redundancy that is appearing in the query result set. Redundant query result set within the recursive CTE proven to be a killer to your database application using it. It grows the result set sometimes exponentially until it kills the application by running out of memory.

Tweak is needed to CTE when redundant row sets happened and if you care about your application, never live with redundant row sets by either further tuning your CTE or improve your Database/Table design to help better/efficient use with your CTE.

Example: -
/*************** BEGIN OF QUERY **********************/
/* this version is using additional column ParentDescendantId to enforce root node validation at anchor member */
USE NRule
GO

DECLARE @documentId bigint
DECLARE @ancestorId bigint
/* @documentId = NULL would mean to get all attributes for all documents regardlessly */
/* alternatively we can use the @documentId to narrow down to a specific document attribute tree for retrieval */
SET @documentId = 1
/* @ancestorId = NULL would mean to get all attributes regardlessly */
/* alternatively we can use the ancestorId to narrow down the attribute tree for retrieval */
SET @ancestorId = NULL;
WITH AttributeHierarchyReport(AncestorId, DescendantId, DocumentId, ValueId, ParentDescendantId, LEVEL)
AS
(
-- Anchor member definition
SELECT meta.AncestorId, meta.attributeId, meta.[DocumentId], meta.[ValueId], parent.attributeId, 1 AS LEVEL
FROM dbo.DocumentMetaAttributePathIndex AS meta
LEFT JOIN dbo.DocumentMetaAttributePathIndex AS parent ON meta.AncestorId = parent.attributeId
WHERE (@ancestorId is NULL or meta.AncestorId = @ancestorId) AND /* Ensure the ancestor id is matched */
(@documentId is NULL or meta.DocumentId = @documentId) AND /* Ensure the document id is matched */
(parent.attributeId IS NULL) /* Ensure there is no ancestor, Root node only */
UNION ALL
-- Recursive member definition
SELECT meta.ancestorId, meta.attributeId, meta.[documentId], meta.[valueId], parent.DescendantId, LEVEL + 1
FROM dbo.DocumentMetaAttributePathIndex AS meta
INNER JOIN AttributeHierarchyReport parent ON meta.ancestorId = parent.DescendantId /* Ensure all is child is recursively joined */
)
SELECT AncestorId, DescendantId, DocumentId, ValueId, ParentDescendantId, Level
FROM AttributeHierarchyReport;
GO
/*************** END OF QUERY **********************/

Sunday, August 1, 2010

Don't Limiting Yourself Much Than Your Users or Clients Do.

Many times, before planning on a new software release you often hear the voice of backward compatibility, upgrade costs, additional component dependencies, operating platforms upgrades and etc. from the team. Are these important? Yes they do. Are these debates necessary before the plan of a release? Absolutely not. We need to look at it and aware of it, but lets not debating it now.

A new software release is to ensure it maintains attractiveness and sounds of improve workability, usability, flexibility, integrativity, scalability, security, martketability and many more.

Before one starting to look at the release plan or product backlog, and if you have already be the first to attack on your own team, you are not doing your team a favor, but holding them back to be closer with FAILURE and not able to deliver.

One phrase that I often hear and it is always true is that "Many times, we often limiting ourselves much more than the customers or users do."

Wake up guys! Be wild and creative when innovating of what you want to do to your software during visioning stage, and only come to be more focus and cautious for not over committing when you are planning and delivering it.

Overly conservative does no help of bringing innovations. Bottom line, don't limiting yourself much than your users or customers do would always be a golden phrase to follow.