Wednesday, September 17, 2008

Whose Fault Is It?

I recently read on a newsgroup about different opinions who is at fault and therefore responsible. The issue was about bugs which are discovered at the end of the iteration or during a following one.

One position was the following. This is the nature of software development and bugs do come up and they should be handled like user stories and be processed in an upcoming iteration. The other was that the developers messed up and since it is their fault they have to fix the bugs in their spare time.

What does each statement mean? In the first it means that the customer will get less value in subsequent iterations as they include bug fixes. The developers get immunity and are not being held accountable for their failures. The customer suffers.
Where as the second one is on the side of the customer as it clearly pushes the fault onto the developers and they need to fix it in their spare time which means long nights and/or week-ends.

First it should be differentiated whether this is a green field project or an existing legacy system which by its very nature tends to be more brittle. Regardless, in my opinion both approaches are too polarized and one approach possible illegal in certain countries. The high goal should be a customer getting all the value they can expect in a fair way and programmers which can have a social life. Another solution to the problem is required.

In the last couple of month I've been reading quite a lot about Lean and how Toyota achieves to make $16B profit while the US car makers are asking for federal help. Lean has some great approaches and one of them is the 5 Whys or Root Cause Analysis. The idea is to ask Why 5 times and the fifth answer provides the root cause. By fixing the root cause you fix all discovered symptoms and the problem should disappear for good.

So, I applied the 5 Whys in retrospective to a project I've been working on a while ago and which had exactly those problems.

Q: Why do we have bugs at the end of the iteration?
A: QA does not accept some user stories as they don't fulfill the acceptance criteria.

Q: Why do the user stories don't fulfill the acceptance criteria?
A: QA and domain expert don't have enough time to specify them on time and therefore the developers don't have access to them during development.

Q: Why do QA and the domain expert have not enough time?
A: They work on other project as well and share their time between those.

Q: Why do the work in several projects at the same time?
A: Matrix Management

Q: Why Matrix Management
A: Upper Management believes that this improves efficiency.

With this process we were able to identify one root cause -- Matrix Management. By removing this cause we should see improvement. In this scenario the solution is rather a structural -- I dare to say political -- then technical. Upper Management needs to be convinced that having dedicated QA and domain experts per project is necessary. Don't assume that this will be easy. I strongly recommend to provide hard data and generate some statistics describing the current situation and how the change would improve the productivity.

I want to re-emphasize the connection between Matrix Management and the problem of not having dedicated resources per team. This was obvious after answer number three but the root cause is the management structure. With that knowledge, you most likely will be able to identify and address further issues.

Once you get this sorted out and you see improvement don't let inertia slow you down. Look out for the next problem and apply the 5 Whys once more.

Just a final thought at the end. Often, if too many people work on too many projects in parallel. This is a symptom for prioritization gone wrong. Fear of postponing the wrong projects causes that more projects are underway as should.