Choosing which key metrics to monitor is contingent on the specific challenges and needs of your company. DevOps metrics should provide a comprehensive view that details the impact and business value of DevOps success. Choosing the appropriate performance metrics to track can help guide future production and technology-related decisions while justifying the implementation of existing DevOps efforts. The following list can provide basic guidance as you evaluate which performance metrics may have the most impact on the growth and development of your business.
Most Used by DevOps Experts The following metrics are the most frequently used (overall) by DevOps experts/leaders:
Test case coverage (E2E)
Pass/fail rate (FV)
API pass/fail rate (IT)
Number of tests executed (E2E)
API bug density (IT)
Requirements covered by tests (FV)
Requirements covered by API tests (IT)
Blocked test cases (FV)
Percent of automated E2E test cases (E2E)
Successful code builds (build)
Variance from baseline of percent of test cases passed (E2E)
Categories
Build
Functional validation (FV)
Integration testing (IT)
End-to-end regression testing (E2E)
Types of DevOps Metrics
DevOps is all about continuous delivery and shipping code as fast as possible. You want to move fast and not break things. By tracking these DevOps metrics, you can evaluate just how fast you can move before you start breaking things.
Deployment Size
Deployment frequency
Deployment time
Lead time
Customer tickets
Automated test pass %
Defect escape rate
Availability
Service level agreements
Failed deployments
Application usage and traffic
Application performance
Top Metrics to Track for Measuring DevOps
Deployment size Tracking how many stories, feature requests, and bug fixes are being deployed is another good DevOps metric. Depending on how large your individual work items are, their counts could vary wildly. You could also track how many story points or days’ worth of development work are being deployed.
Deployment frequency Tracking how often you do deployments is a good DevOps metric. Ultimately, the goal is to do more smaller deployments as often as possible. Reducing the size of deployments makes it easier to test and release. I would suggest counting both production and non-production deployments separately. How often you deploy to QA or pre-production environments is also important. You need to deploy early and often in QA to ensure time for testing.
Deployment time This might seem like a weird one, but tracking how long it takes to do an actual deployment is another good metric. One of our applications is deployed with Azure worker roles and it takes about an hour to deploy. It is a nightmare. Tracking such things could help identify potential problems. It is much easier to deploy more often when the task of actually doing it is quick.
Lead time If the goal is shipping code quickly, this is a really key DevOps metric. I would define lead time as the amount of time that occurs between starting on a work item until it is deployed. This helps you know that if you started on a new work item today, how long would it take on average until it gets to production. This is also a good metric to help.
Customer tickets The best and worst indicator of application problems is customer support tickets and feedback. The last thing you want is for your users to find bugs or have problems with your software. Because of this, they also make a good indicator of application quality and performance problems.
Automated tests pass % To increase velocity, it is highly recommended that your team makes extensive usage of unit and functional testing. Since DevOps relies heavily on automation, tracking how well your automated tests work is a good DevOps metrics. It is good to know how often code changes are causing your tests to break.
Defect escape rate Do you know how many software defects are being found in production versus QA? If you want to ship code fast, you need to have confidence that you can find software defects before they get to production. Your defect escape rate is a great DevOps metric to track how often those defects make it to production.
Availability The last thing you ever want is for your application to be down. Depending on your type of application and how you deploy it, you may have a little downtime as part of scheduled maintenance. I would suggest tracking that and all unplanned outages.
Service level agreements Most companies have some service level agreement (SLA) that they operate with. It is also important that you track your compliance with your SLAs. Even if there are no formal SLA, there probably are application requirements or expectations to be achieved.
Failed deployments We all hope this never happens, but how often do your deployments cause an outage or major issues for your users? Reversing a failed deployment is something we never want to do, but it is something you should always plan for. If you have issues with failed deployments, be sure to track this metric over time. This could also be seen as tracking mean time to failure (MTTF).
Application usage & traffic After a deployment, you want to see if the amount of transactions or users accessing your system looks normal. If you suddenly have no traffic or a giant spike in traffic, something could be wrong. The last thing you ever want to see is no traffic at all. You could also see a spike in traffic if you are using microservices and one of your applications is causing a lot more traffic all of a sudden.
Application performance Before you even do a deployment, you should use a tool like Retrace to look for performance problems, hidden errors, and other issues. During and after the deployment, you should also look for any changes in overall application performance. It might be common after a deployment to see major changes in the usage of specific SQL queries, web service calls, and other application dependencies. Tools like Retrace can provide valuable visualizations like this one below that helps make it easy to spot problems.