Test-driven development has proven to increase quality of software in many cases. I believe that the same principle should be applied to network management. From time to time, I am occupied in managing quite large and distributed networks, consisting of many different network segments, routers, servers, etc.
Primary tool in managing any network is using monitoring software which tells you if everything is alright or if you should worry. For various reasons I have become a huge fan of Nagios for monitoring networks I am responsible for, especially for the simple extensibility by writing your own check scripts (plugins).
While working through some issues in a network, I suddenly decided to try an approach I spontaneously called “test-driven network management”¹. The steps are easy (and are a one-to-one translation of agile software-development principles):
- Write a Nagios test which checks for the requested/required feature.
- This test will fail.
- Implement a solution satisfying the test.
The same advantages of automated testing (better: unit testing) in software development also apply to the network management tasks:
- The test documents what you want to achieve in a quite formal way.
- You will (almost) immediately know when your solution breaks other requirements (if tests exist for them).
- As networks tend to be even more fragile then software, you have to monitor whatever you implemented anyways 🙂
Whenever possible, I try to add a test (or tweak an existing one) for any trouble-ticket / feature request I come around. In my experience, customer satisfaction tends to increase, because you start noticing problems before they do and you also implement measures to prevent the same problems to occur over and over again.
¹ I am quite sure there is another technical term for it, as I am quite sure I am not inventing anything new here… If you know how this is called by others, please tell me in the comments.
[tags]development, network, sysadmin, network management, test-driven development, nagios[/tags]