9

Root Cause Analysis in Quality Engineering

 1 year ago
source link: https://www.percona.com/blog/root-cause-analysis-in-quality-engineering/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Root Cause Analysis in Quality Engineering

In software development, dealing with bugs found by customers is inevitable. There is no perfect software. We’ve all been there: frantically trying to find a workaround, hurrying to release a hotfix, or trying to find an explanation for the relevant stakeholders. A vicious circle repeats itself with more or less intensity, depending on the bug’s severity or the magnitude of the affected customer. But it doesn’t have to be this way. We can turn bad experiences into productive ones by:

  • Learning about the actual real-world usage of the software that led to the discovery
  • Improving testing methods and processes based on these learnings
  • Assure stakeholders that relevant learnings have been acquired and that actions have been taken to ensure that the problems don’t happen again

Root Cause Analysis

In science and engineering, Root Cause Analysis (RCA) is defined as the process of discovering the root cause of a problem to identify the appropriate solution. Its purpose is to solve the underlying cause of the problem, instead of the symptom, and to apply a definitive solution instead of putting out the fire.

Most tech organizations have well-established RCA processes, especially for security breaches or uptime incidents. We MUST take the same approach in engineering and apply it to our quality process to acknowledge, analyze, resolve, learn and remove gaps in the software development life cycle (SDLC) that cause escapes (i.e. bugs reported by users not found during the SDLC) and problems for our customers.

root cause analysis

What will this accomplish?

  • We take a critical look at our own mistakes. Instead of taking the quick and easy route of fixing, testing, delivering, and forgetting, we pause for just a moment to think about why mistakes have happened. We become accountable;
  • We ensure that the appropriate tests are in place so that the reported problem does not happen again. Remember that we do not just aim to add “full coverage”, we need a representative test for the particular scenario, a test that is efficient and capable of catching the problem in all the next iterations;
  • In general, a Root Cause Analysis is a reactive activity. But by gathering, analyzing, and acting upon collected historical data, we become proactive in improving our processes and test procedures, before more feedback comes in;
  • Last but not least, this is a great opportunity to interact with the problems and the pains of our customers. We learn more about real-life production application usage; how to think and act like a real user of the software. 

How do we make this work?

Dealing with customer-reported bugs can be a time-sensitive issue. Therefore we must be mindful not to add bottlenecks. A simple asynchronous and easy exercise is a good way to start. 

We can streamline this process in several simple steps with the help of Jira workflow:

  1. Filter out bugs reported by paying customers and enforce RCA Workflow on them; Additionally, this activity can be expanded to other groups of reporters, such as communities. Places like Percona Forum can also be a good place to look for feedback.
  2. During bug resolution activities, the assigned developer investigates how the bug was introduced. When transitioning the ticket from a state like “In Progress” to a state like “In Review” the Developer posts a short description of the investigation mandatory text field.
  3. During the testing of the fix provided for the respective bug, the assigned quality engineer investigates why the problem was missed during all SDLC testing stages. When transitioning the ticket from a state like “In QA” to a state like “Ready for Merge” the Quality Engineer posts a short description of the investigation in a mandatory text field*.
  4. Depending on the level of testing where we take the preventive action, either the developer or the quality engineer picks options from 3 dropdown type fields:
    1. Problem category (Requirements, Testing, Coding, Environment, etc.)
    2. Problem detail (Missing Test Case, Inadequate Requirements, Missing Documentation, Insufficient Unit Test coverage, etc.)
    3. Corrective action (Automated Test Cases, Test Case Added to Regression suite, Documentation Updated, Process Improvement, etc.)
  5. Analysis of the data – The categories where we do end up taking corrective actions the most are our weak spots. Details provide us with information about what exactly we do wrong.
  6. The corrective action historical data provides the opportunity to become proactive and take preventative actions to improve weak areas.
Steps two and three do not have any measurable data or action tied to them, hence we use a free text field. The only purpose of these two steps is to make sure that engineers learn and convey the learnings to their teams. There is no place for naming or shaming in this activity.
root cause analysis

Conclusion

In practice, this is an adapted version of the fishbone technique for Root Cause Analysis. The missed bug represents the problem statement, and we do ask “Why did this happen” at several points. Why did we introduce the bug, why did we miss it in testing, and what exactly is the problem? Based on the answers to these questions we determine the actual root cause, and we take action to remove it definitely.

Applied diligently and in the long term, a Root Cause Analysis process does not only fill some gaps in the development and testing. It grows and matures the engineering organization. It encourages us to continuously scrutinize our methods and ask ourselves how we can do better. Let’s not miss this opportunity!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK