Tempo de leitura: 10 minutos
How to create a zero bug culture
Is it possible to crate a zero bug culture? Whenever I’m conducting a transformation where people are really stressed with bugs, and somebody suggests that creating a zero bug culture is impossible I get mad. Maybe impossible is to change the willingness, the mindset and the behavior of those people, though it’s certainly possible to create a zero bug culture. Of course, it will very much depend on the willingness of those individuals to change their behavior. Nevertheless, there are simple techniques that when applied with discipline will lead to a zero bug culture. In this post I would like to present you with simple techniques that can take you to the so desired bug free environment.
Before diving into it, I would like to emphasize the fundamentals that support this culture – Building integrity in!
You may know companies where quality is assured by testers that can’t code and have not participated in the inception and development phases of the software that they need to test. They will execute only manual tests and will never have the ability to conduct a full fledged regression test at the system. As they don’t have the capacity to fully test the system, they will not be aware of any undesired side effects that will be generated. The end of this story you already know, the users are the ones that will discover the problems. Those problems that are discovered by the customers after the release of a version are called “escaped defects”. When we are able to eliminate escaped defects, then we are creating a zero bug culture.
Instead of creating flaccid strategies to guarantee quality, lean processes will do that from the very beginning, testing everything from scratch beforehand, doing what we call the “test first approach”. Way before writing code, the test will start with the well defined acceptance criteria that describe the hypotheses that we believe to be true. With the simplest, cheapest and fastest means, we will thoroughly validate all hypotheses before writing any code. By doing that, we’ll not only have more confidence that developing that code will generate the desired value, but also we’ll have a brighter understanding of what we’re supposed to build. Then, each and every small piece of code will be written with TDD from ground up, creating a comprehensive test suit that will be the corner stone, therefore the most important foundation of safety, not only for developers and users but for the entire business.
With that philosophy in mind I’m gonna describe the simple steps that you can use to achieve a zero bug culture.
1 – Automated code review
Adopting a good static code analysis tool is one of the best things a team can do help keep tracking of code quality. The best tools in this arena will correlate several metrics like code complexity, duplication, known bugs, bad patterns, style and even security in order to provide a grade, generally from from A to F for each source file. The best tools will even be integrated with your git repository and will continuously monitor all the changes in your branches in search for code smells. The best benefit of this kind of tool is the very short feedback loop enabled by the automated code review. Even if you unconsciously pushed a bad code into the repository, these tools will immediately advice you. Not only that, but they will also point you where is the problem and show you what you can do to improve. One can argue that I should have put clean code as an item in this lists. The thing is that clean code is given when you use a tool like that. You’ll be forced to write clean code, otherwise your grade will be very low. Nonetheless, you need to learn all about how to write a clean, simple and elegant code. My favorite source for that are the books from Uncle Bob Martin, Clean code and Clean coder. One of my favorite tools for automated code review is Code Climate.
2 – Evaluate code coverage
As well as using an automated code review tool, you should continuously evaluate how much of your code base is covered by automated tests. I don’t need to say that you need to use automated test otherwise you wouldn’t have any coverage. However, it’s important to call your attention for the importance of writing code using TDD. Not only your code will be better structured, more decoupled and cleaner, but you also will start training your brain to visualize the end state of a code even before you have written it down. Like the automated code review, this kind o tool should be automated in the build process, breaking the build when it does not achieve a certain desired coverage. Analyzing the code coverage reports will aways give you good hints on what kind of tests you’re missing to improve your coverage. Although achieving 100% of code coverage does not guarantee 100% error free, it’s aways a good practice to keep a high percentage of code coverage. Each team should make explicit as a working agreement what’s the level of code coverage expected, making the build fail whenever that lower threshold is achieved.
3 – Componentization strategy
As soon as you deploy code to production it becomes legacy. The problem with legacy code is when you don’t have the ability to evolve it over time. That’s why several large corporations are relying their entire operation in code that was written 20, perhaps 30 years ago. Those systems are generally monolithic making it very hard to change and very expensive to evolve. Usually, this kind of old code base does not have any test coverage and sometimes the architecture mixes business logic in user interface code, not to mention duplication of business rules. All this makes it a big mess leading to bad quality and very stressful environments. Nowadays, technologies are much more evolved to provide better ways to componentization, either for backend services that can use a micro services strategy to compose federated architectures, or for the user interfaces that can have each small functionally written as a well tested component. In a complex screen for example, each action could be a component. That makes it much easier and cheaper to evolve the architecture over time, specially due to the fact that each individual component can be replaced at any time by a more modern one. Then, you just need to run your complete regression test suit to make sure everything continues to work as it should. In other words, you don’t need to rewrite your entire system in order to get it to a better shape. You do it one piece at a time.
4 – Seamless automated deploy
When we have manual procedures like compilation, packaging, database migrations, file transfers and others in order to deploy a system, the possibility to introduce bugs is very high. Moreover, when we have such manual operations, the possibility for a system downtime is greatly increased, thus affecting your users with the unavailability of the system during deployment. One of the most important aspects of a good deployment system is what Lean calls Jidoka, which means intelligent automation. We’re gonna use automation exacerbatingly for everything, from preparing the environments to migrating the database, with all steps in between. Creating the ability to reliably deploy a system with the simple push of a button is essential for any zero bug culture. High performance teams will manage to create effective continuous integration systems and deployment pipelines that will allow for concurrent development and totally seamless deploy for the user. Traditional organizations that don’t have that ability tend to deploy less frequently in an attempt to introduce less bugs. And just because they can’t guarantee error free deployments, they try to do it less and less, augmenting the amount of changes deployed each time. That will lead to a vicious cycle where each deploy will introduce more bugs which will be increasingly more expensive and harder to fix. On the other hand, organizations that have the ability to deploy seamlessly to the user and in an automated fashion, will deploy smaller changes much more frequently, in some cases dozen times a day.
5 – Fully identical environments
One of the great sources of bugs in a complex system environment is the inability to keep fully identical environments for development, testing and production. That used to be harder and more expensive before the advent of cloud computing and virtualization. Another common problem is the long time needed to set up environments, sometimes a few days are needed to set up the development environment. Another axiom commonly used is “works on my machine”. If it works in one machine and not in another, it’s due to differences between the environments. Currently there are several technologies that help us to virtualize and containerize environments in order to get them fully identical, such as docker and kubernetes. More than that, we can also version the file descriptors as we evolve the environments. Just for you to have an idea, when I get a new team member in my team, he will need just two commands to setup the whole development system: git clone and docker-compose up. Before she is able to finish a cup of coffee, the whole environment will be ready and fully identical to what all the other developers are using. The same is true for staging and production environments.
6 – Live documentation
Keep documentation up to date for an evolving system is very expensive. Usually that doesn’t happen. Documentation quickly becomes obsolete and nobody will give a damm for it. Surely this is ain’t no good. The best way to keep a good documentation about the system is what we call live documentation. That is done through automated tests. Automated tests will serve as a live documentation of how system components are used, thus facilitating the onboarding of new team players. That serves not only for unit and integration tests but specially for acceptance and behavior testing. Those tests can describe in a very high level language, close to the language that business guys use, facilitating a lot the communication between business and technical guys. Since those tests will be written as the application evolves, the live documentation will be aways up to date and therefore useful.
7 – Intolerance
This is the most important thing: intolerance for defects. This means that whenever we have an escaped defect, we must have no mercy for it. The team should stop the production line and thoroughly find the root cause and fix it. Fixing the bug with an automated test to prevent regression is definitely not enough. We need to identify what in the process let the error go to production, and then change the process so that it will never ever happen again.
Creating a bug free culture is possible, but will require relentless efforts of everybody involved in the organization, from the business guys that will need to learn how to better and constantly interact with the development team, to the development team themselves that will need to learn new techniques and technologies that will enable them to achieve the so desired bug free culture.