A lot of people talks about the benefits of Test-Driven Development, how great it is when you compare it with non-test driven development. People would give a massive list of advantages of using TDD. For me these are the core reasons:

  • Writing tests first require you to really consider what you want from the code
  • Makes writing unit test easy
  • Act as documentation for the code

There is so much praise around it, to the point where people would believe it solves problem it is not meant to solve. For example, just because you use TDD does not mean your code has no bug. In fact, depending on developer it might not make any difference at all. If a good developer is writing the code, who has very clear head and knows exactly what he is doing. On top of that it takes a lot of brain power to think about all the different scenarios and design the possible inputs/outputs before hand. So it is kind of a moo point if it doesn’t add too much value.

The reason I wanted to write about this today is because I read a blog from Github introducing their Scientist 1.0 product. They highlighted a very good point, that “tests aren’t enough”. The scenario they explained is when you have a legacy system, but wish to replace a component or part of the system to improve performance or whatever the reason is. Then how can you be confident that the new changes won’t cause issues and trip over on production? The solution they gave was using this idea that if you want to replace a function in the system, then run the old and new function are the same time side-by-side for a period of time and compare their results. Record any difference in results and then remove the old function when you have enough confidence in the new function.

Here’s some sudo code to explain the concept:

1
2
3
4
5
6
7
8
9
10
11
12
13
function expriment() {
old = old()
new = new()
if (old != new) {
record data
}
}
function old() {
do x
}
function new() {
do x
}

Here’s the original text from the article. Find Original Article here.

Why tests aren’t enough

If you want to test correctness, you just write some tests for your new system, right? Well, not quite. Tests are a good place to start verifying the correctness of a new system as you write it, but they aren’t enough. For sufficiently complicated systems, it is unlikely you will be able to cover all possible cases in your test suite. If you do, it will be a large, slow test suite that slows down development considerably.

There’s also a more concerning reason not to rely solely on tests to verify correctness: Since software has bugs, given enough time and volume, your data will have bugs, too. Data quality is the measure of how buggy your data is. Data quality problems may cause your system to behave in unexpected ways that are not tested or explicitly part of the specifications. Your users will encounter this bad data, and whatever behavior they see will be what they come to rely on and consider correct. If you don’t know how your system works when it encounters this sort of bad data, it’s unlikely that you will design and test the new system to behave in the way that matches the legacy behavior. So, while test coverage of a rewritten system is hugely important, how the system behaves with production data as the input is the only true test of its correctness compared to the legacy system’s behavior.