Sunday, March 18, 2012

A Simple External Sort


Sort is a simple word and in computing field surely everyone has learned some simple sorting algorithm. Its use are trivial and often we applied it to help solving some complex issues, such as merging, unification, searching, indexing; to name a few.

Often, buying a tool to just do efficient sorting can be expensive, especially for those that comes as part of the ETL package. The consideration is rather on how much data must be processed and how soon must we accomplished the data sorting.


  • Considers sorting data at your Database Server.
  • Favors incremental update over batch. You don't need sorting when you do incremental update.
  • Considers an efficient ETL tool if all the above does not applied.
  • Considers writing your own highly threaded sorting tools if all the above are not applicable.

Also, refers to ETL Sorting or SQL Sorting

External Sort

In case you find yourself into a resolution to consider writing your own sorting tools, consider this Simple External Sort.

External Sort, by definition, uses external storage as the intermediate storage to ensure the ability to sort data sample larger than the permitted memory constraint.

Here is the pseudo approach of a simple External Sort.
  1. Splits a subset of data into multiple smaller data collection where it is bound to the permitted memory size.
  2. Running Quick Sort at each and every data collection created, in a separate thread respectively.
  3. Writes every quick sorted data collection into a file and stores externally to your hard disk drive.
  4. Repeat all the above until the data sample completely processed.
  5. You will now have multiple sorted data subset in files stored externally in hard disk drive, you can now merge them like a Merge Sort to build your final sorted data sample.

To be exact, this is the External Merge + Quick Sort.

Future Work

From my observation, if you have done this Simple External Sort properly, it is faster than the Sort function available within .NET Collection Data Type. The performance improvement of this Simple External Sort is proportion to the data sample you selected. Personally, I recorded ~10 times faster than the sort available in .NET when the record size over a million.

What one can consider for the future work to improve this Simple External Sort is to bake in an Advance IO Elimination Algorithm to avoid writing of final quick sorted data chunk into the Hard Disk Drive, but manipulating it off the memory directly for the merging with other data chunks that already persisted to hard disk drive. This way, you can safely shorten *at least* one file chunk IO round-trip for better performance and lesser IO.

Hope this helps.

Wednesday, February 29, 2012

SCRUM - Dos and Don'ts


Many people practice Agile these days and many of them are with SCRUM Methodology. We need to draw a line on what we must do and what we must not. Reason are simple, it violates the essence of Agile.

Read here: SCRUM Principles.

Dos and Don'ts

  • Do: Allows only one person to play the SCRUM Master role within the team.
  • Don't: Never run a SCRUM with more than one SCRUM Master or rotates the SCRUM Master.
  • Do: Ensure SCRUM Master and Product Owner roles were played by different person.
  • Don't: Never make Product Owner and SCRUM Master the same person. It creates conflict of interests.
  • Do: Scales your SCRUM team when it makes a large team or it is geographically separated.
  • Don't: Never make a team larger than 7 nor geographically separated. Your first impediment before your SCRUM starts will be distance, and distance and distance!
  • Do: Favour the approach of SCRUM for SCRUM. Creates sub teams to run proper SCRUM and runs another final SCRUM with ALL sub SCRUM Teams' SCRUM Masters and SCRUM Product Owners respectively.
  • Don't: Never run a SCRUM that cannot be scaled.
  • Do: A SCRUM Master/Product Owner works best when he/she has 1 team to run. Less good when with 2 teams.
  • Don't: Never let the same SCRUM Master/Product Owner runs more than a team if you want to build a successful SCRUM team.
  • Do: Listen to everyone's concern in retrospective and make sure you have them copied.
  • Don't: Never defence for the team when in SPRINT Retrospective.
  • Do: Team oriented and takes ownership.
  • Don't: Never take anything beyond the team. Don't be personal, and it is never anything about you or anybody.

Monday, November 14, 2011

Micro Management vs Macro Management

More than often, we heard of these terms. How many of us try understand these definition and see how it impacts and influence the day to day activities.

By Definition

Micro Management - Require one to enforce every of his/her staffs to report on their progress very frequently and reluctantly to delegate decisions. Also known as Bully Approach.

Macro Management - Delegate decisions to his/her staffs and monitor on the progress less aggressively. However, it is often that one delegates too much of decisions without some necessity knowledge of work.

Often we learnt about a leader when we start working in a team. We then come across many different style of management. Each style tells us, indirectly, what type of leader we work with.

Micromanage suitable only if: -

  • Start up company
  • Incompetent team members
  • Mission critical but proprietary knowledge of work

Manager with micromanage style suggests that he/she is still a better person to take on every lower level tasks assigned. Company with managers of tendency to micromanage suffers opportunities of losing their better employee to be a manager who does not manage well but love detail work.

Macromanage suitable only if: -

  • Well established company
  • Not only competent but knowledge independent team members
  • Generally all type of works

Manager with macromanage style suggests that he/she trusts the team members being knowledge independent and capable of producing good work. Decisions and rights are trustfully given with minimal supervision. However, that may also suggest that many manager involves in Macro Management style would lost his initial stint within the business and overly relying on his/her team members which results in decision dependencies and politically driven decision/outcome. Company must love these managers as they made themselves out of job while growing people to out cast themselves.


While micromanaging managers suggest to delegate only risks to his/her staffs and claiming all credits to himself/herself when succeeded, macromanaging managers suggest to delegates credits to his/her staffs and taking the risks on their own.

Either way, you are either too good only to yourself or too good to your people. That only leads you into one way, failure! Yes, you got me right, it is failure! With micromanaging, higher management would have learnt and noticed that they can't scale in your department. You are too important that they need to find someone to manage you. You just missed the promotion by that. With macromanaging however, everyone is doing right and perform well enough to replace you, because you made them too good to put yourself out of job. You have higher pay then others, and why should I pay more if they are equally good as you with better details, but miles cheaper?

You need to struck a balance in your style. You have great people, do more macromanagement and learning the details from your team via some suitably application of micromanage. This way, you won't be thrown away by your people in your business knowledge, while letting them to take their deserved credit.

If you have average people, depends on how much the average is, apply the micromanage as you see fits while not letting the opportunity to slip away for your people to learn and excel with some delegation through macromanagement.

Speaking of a company perspective, I would love to have as many macromanaging managers than micromanaging managers. At least, I'm not tightening up my options on resources.

Tuesday, April 12, 2011

SCRUM - Manage Your User Story


We usually have some number of user stories to deliver in each sprint. Often, the user stories were created to our best knowledge about some business values that we are going to deliver. It then gets updated with clearer detail over the time as it turns higher at priority over the time. However it is very often that not all details can be gathered before we started to work on a user story. Very often too, there are some new discoveries as we progressing to complete a user story. These are risks, the risks may be preventing a user story from getting complete. If it prevents the completion of the current user story, how we handle it? Can I have a story extends all the way to the next sprint or couple of sprint?

Manage your new found risks in a user story

In agile, we don't like a large user story that spread well over several sprints without a concrete indication of where we are. This does not manage the risks well. However, if it is a main (big) user story that has several not too large but measurable sub user stories where each of them can be completed either concurrently or independently within a sprint, I believe that is very much acceptable and the way I *think* it should be. This makes the user story, whether the main user story or its sub user stories, more accountable, quantifiable and manageable. It then makes things transparent as to when can I expect certain deliverable based on what we know best now. However, many people would argue that this is in fact a dependency of user story and the dependency management usually needs project planning and that is not agile after all. If you think so, you have mistaken my points I am trying to make here.

Say a developer is going to deliver a user story "A" in this sprint and halfway thru the process, he discovered an impediment, say "A'", that prevents him from getting user story "A" done. He has no additional bandwidth or capacity of getting "A'" done this sprint, so he has discussed with SCRUM Master and it seems nobody has better knowledge than he does to handle it, so what a SCRUM Master would normally do is to get Product Owner into this discussion. If nothing much has changed, what should the Product Owner do to this User Story "A" and the impediment "A'"?

As I would suggest: -
  1. The Product Owner should split the user story "A" into 2 separate sub user stories, say "A1" and "A2", and ensure it is associating back to the original user story "A".
    • Eg. A = A1 + A2
  2. Then, ensure in user story "A1", it covers what we need to get done initially are covered with some mocking functionality of "A2" in a very high abstraction (Hard coded may be?)
    • Eg. A1 = A1' + A''
    • A1' => What we supposed to do initially in A, but with a bit changes in abstraction to cater for the coming impediment of A' we discovered during this iteration.
    • A'' => Mocking functionality or resolution to impediment A' that we discovered.

Then within this sprint, we can have the developer to complete the user story A1 and moving the user stories A and A2 to next sprint or re-prioritize as it sees fits.


  1. Are we not agile?
    • No, we are. We still follow "just in time" (as we discovered it) and "just enough" (just the mocking of functionality for now) and "just because" (mocking because we can't handle it now, but later)
  2. Are we doing project planning in advance to identify the dependency?
    • No, we don't. We let it happens naturally without factoring in any additional effort in advance to predict.
  3. More importantly, do we need to emphasis on Agile to be Agile?
    • No, Agile is the thought process of getting things done and SCRUM is a framework to do things in agile. They are like best practices recommended to be followed, however you must only follow the practices if that makes sense to you. Ultimately, it is the principle of being continuously transparent, lean in thinking, ready for changes and managing risks with adaptive sprint planning that eventually driving us to the release we planned that makes sense to the business.


We need a real worker in Agile and SCRUM and not a philosopher.

Wednesday, March 30, 2011

SCRUM - Definition of Done?


Definition of done in software development companies varies one after another. Some company has additional constraints to fulfil while others may be very little. Definition of done is crucial and one must do it right and customized according to the needs to ensure continuous deliverable. For instance, in TDD environment, we usually need some unit tests per feature introduced with sufficient tests coverage that is predefined (usually > 90%). Next, we need some code review and have the code checked in after the review and yet has the Continuous Integration Build completed and all unit tests passed.

Do we need one in SCRUM?

Certainly we need one. Indeed, we need it to a much wider extends. In SCRUM, we create user stories and then planned out the deliverables in the release plan and the sprint plan. At the end of each sprint, we will mark whichever user stories that are "done" as done. The business users can then expect a somewhat "workable" solution to be delivered.

The question now is, how much do we need to constraint when we are in SCRUM? How much do we want our business users to be able to expect from the delivered "workable" solution at each sprint?

If a solution that works but may be with many defects due to not properly tested is considered acceptable by the business users as a potential deliverable, you can then schedule your QA tests to come in after the sprint!

In many occasions, in fact in SCRUM, this is not acceptable. You cannot say you've completed a user story where it is highly potential with defects, especially the functionality or workability defects. When you mentioned a done to the user story, we are expecting it to be stable, workable with minimal functional defects and met some level of quality measure.

So in Agile or SCRUM, we do tests much sooner and users stories that failed the tests can never be made DONE. We bring the test team into our SCRUM team and they are in our team. Soon as we completed the development work, the tester will do the tests according to the user story's needs. We don't delay the tests and wait till the end-of-sprint and packaged the deliverable to test. Many SCRUM teams failed, one of the key reason is that we do post sprint test approach. You always need to factor in additional period of time for the tester to tests and the defects of this sprint can only get fixed next at sprint. Your SCRUM fails, you are running waterfall approach (miniature waterfall style). So, for the SCRUM Master, please change this to in-sprint testing for the better success; for the Product Owner, please voice your concern of having it (defects and quality) delayed to next sprint. You don't always have that next sprint and worst, you are always wasting a sprint in addition!

Compulsory Items in SCRUM DoD List

  1. Automated Unit Tested with Good Coverage (above 90%)
  2. Continuously Integrated and Successful Build
  3. Automated Integration Tested
  4. QA Tested (with Test Team)
  5. All Defects Fixed.

Tuesday, March 29, 2011

SCRUM - Endlessly Growing Product Backlog


Often when we have run our SCRUM over a period of time (a year or two), we noticed the product backlog gets expanded at an increasing velocity. Up from some 1 or 2 new user stories introduced to several tens of new user stories introduced within a sprint. Is this a good indication or worries that one should look into?

User stories within product backlog indicates some missing business features or value that one must look into and plan out properly to ensure the business satisfaction. This usually has to be inline with the product roadmap and how we want to grow the product over the time. This roadmap can be changed as time comes, to ensure it meets the current and near future (not too far in the future) business needs. Remember that phrase we talked about, Just in time", "Just enough" and "Just because"!!!

Why It Happens

  • Everyone raises a user story they "think" it is value added.
  • No clear inidcation of product roadmap telling where we want to be within a fixed schedule.
  • No planned out of fixed schedules.
  • Overloaded with too many technical driven user stories

How to Turn It Around

  1. Who can raise user stories?
  2. We must ensure and limits the people with ability to raise a user story to those who posses business interests. We are dealing with delivering genuine business values that benefits the business users, not architects, developers, consultants.
  3. Where we want to be at a bird's eye view?
  4. There must be a fixed schedule and a fixed cost. I'm not talking about short term plan like a sprint or a release, but a very high-level objective that we want to achieve to fulfil the business after several releases. This is a directive measure, a pool of resources as part of the costs and how are we planned to spend these costs to achieve our business direction or intends (roadmap). Important to note here, I'm not suggesting a fixed or permanent roadmap that cannot be altered, rather a roadmap for everyone to follow if nothing is suggesting a change needs. If there is a change, we prioritized some and deprioritized some in and out of the roadmap. If it is too far in future, we may opt to drop that from the product backlog if it is not at all important or its value depreciated over the time. Why waste effort of tracing some potentially not needed features if it is not at all fullfilling the business now and near future. Remember, Just in time", "Just enough" and "Just because"!!!
  5. Where we want to be at a bird's eye view with a minimal lookahead?
  6. Just like what we have discussed above but we need to know the pipeline of the fixed schedules. However, lookahead lightly and at a higher level of abstract then the current fixed schedule.
  7. Should we raise technical user stories?
  8. We can get endless of technical driven user stories, especially true for the sake of perfection. Not saying we cannot have a task nailing down some technical or architecture aspects, but it all must be driven by the business value. Example, I need to have an Online Store that serves all my customers and I have 20 thousand customers with my company loyalty card membership. This suggests the needs to load balance the Web Application and there is a need of scaling out. We may need a proper cache (distributed) or if we have a highly clustered database farms that is durable, reliable and efficient. However, we are not creating it as a primary citizen in the product backlog. This at best is a supporting (dependent) user story to fullfil the first user story above it. Whenever the business value gets deprioritized, all its supporting user story goes the same direction, unless you have no better things to do.

Friday, March 18, 2011

Never write you own Message Queuing Framework

Software development is fun. Especially true if you are developing an Enterprise Application. There will be many challenges, world class challenges! High throughput, efficiency and low roll out cost and many sophisticated and complex business logic and workflow that goes beyond what you have learned and practiced in the past. If you are not ready for them, you better look elsewhere.

Very common in enterprise solutions to have distributed application support that require high response time and yet reliable communication and services to deliver the requests. Secured, durable and reliable messages encapsulating business operational messages that get processed at the distributed end, usually the server tiers, are common. These are normally done with many ready made Messaging Framework, such as MSMQ, AppFrabic Service Bus, Enterprise Service Bus, WebSphere MQ. There are also a couple opensource messaging framework like RabbitMQ, ZeroMQ, AMQP.

We have more than enough varieties in MQ selection however, there always some techy developers or architects would like to invent one them-self within their own team. Often, the discussion is surrounding the fact of freedom and lightweight modules that suits best into the solutions, teams or the company. It is always good to have the experience of developing such framework in your resume, but that is often bad for the team, and company.

Never invent one there is a ready made solution.
It takes several years for a framework to be matured and tested against many genuine requirements within the businesses and made many fixes, enhancements, simplification, optimization, customization, usability and security tests while accomplishing the needs of being orderable, durable, reliable and securable. Developing one yourself is a short sighted move. You are ignoring the facts that a framework takes millions of hour and efforts to be matured and testified. You are thinking that one man view is always better than the effort of a group of people. You might be right for the first few months, but as your solution becoming more matured and many more challenges coming onboard, you will quickly find that you were shooting at your own feet for the decision you made.

You often find that your invented MQ framework does not scale as good. Has many shortcomings in throttling, failed-safe, and security. Does not have a dead letter policy supports. Loosing the sight of dynamic expansion of queues. Priority overwrites and poison message control. Worst when you have no idea to maintain/support disastrous recovery. Worst when you find your own MQ framework does not support distributed transaction, transaction flow (distributed but integrated transaction model) and does not log properly and not autonomous. Your solution may ended up just a distributed but serialized messaging gateway that processes messages from different sources in a single queued fashion and starts to suffers when a business starts growing large.

Other than business factors consideration, you owe your team or your company at large by requiring many innocent teammates to endlessly support and fixing defects that were common but long fixed in the matured framework. You start to find yourself heading to where other frameworks is doing by doing more work and getting less throughput. You burn many pricey man-hours for the company to get less. You tricked your team members with more long working hour to support and defect fixing to deliver sub-standard solution to the users. You make your product less competitive to the competitors and you shifted the team focus from product development to platform components that are no better but long available. You sincerely given your competitors a chance to close up your leading role or gap in the industry. You put your team to loose enthusiasm. You bid your company and the team to the losing end to only build a good resume in you.

From the business, team, and product perspective, there are no reason for one to develop your own messaging framework. Unless you would want to compete with the giant in the market, this is not your cup of tea. Just stay away from losing sight, you should and must maintain your business focus and continuing your industry leading role.

Final words, don't reinvent the wheel if this is not your strength and does not record in your business roadmap. Do what you do best and leave the worries to the experts.