versioned_docs/version-v0.16/Metrics/CFR.md

title: “DORA - Change Failure Rate” description: > DORA - Change Failure Rate sidebar_position: 29

What is this metric?

The percentage of changes that were made to a code that then resulted in incidents, rollbacks, or any type of production failure.

Why is it important?

Unlike Deployment Frequency and Lead Time for Changes that measure the throughput, Change Failure Rate measures the stability and quality of software delivery. A low CFR reflects a bad end-user experience as the production failure is relatively high.

Which dashboard(s) does it exist in

DORA dashboard. See live demo.

How is it calculated?

The number of deployments affected by incidents/total number of deployments. For example, if there are five deployments and two deployments cause one or more incidents, that is a 40% change failure rate.

Below are the benchmarks for different development teams from Google‘s report. However, it’s difficult to tell which group a team falls into when the team's change failure rate is 18% or 40%. Therefore, DevLake provides its own benchmarks to address this problem:

Groups	Benchmarks	DevLake Benchmarks
Elite performers	0%-15%	0%-15%
High performers	16%-30%	16-20%
Medium performers	16%-30%	21%-30%
Low performers	16%-30%	> 30%

Data Sources Required

This metric relies on:

Deployments collected in one of the following ways:
- Open APIs of Jenkins, GitLab, GitHub, etc.
- Webhook for general CI tools.
- Releases and PR/MRs from GitHub, GitLab APIs, etc.
Incidents collected in one of the following ways:
- Issue tracking tools such as Jira, TAPD, GitHub, etc.
- Incident or Service Monitoring tools such as PagerDuty, ServiceNow, etc.

Transformation Rules Required

This metric relies on:

Deployment configuration in Jenkins, GitLab, GitHub or BitBucket transformation rules to let DevLake know which CI builds/jobs can be regarded as Deployments.
Incident configuration in Jira, GitHub or TAPD transformation rules to let DevLake know which issues can be regarded as Incidents.

SQL Queries

If you want to measure the monthly trend of change failure rate, run the following SQL in Grafana.

with _deployments as (
-- get the deployments in each month
	SELECT
	  date_format(ct.finished_date,'%y/%m') as month,
		ct.id AS deployment_id
	FROM
		cicd_tasks ct
		join project_mapping pm on ct.cicd_scope_id = pm.row_id
	WHERE
	  pm.project_name in ($project)
		and type = 'DEPLOYMENT'
		and result = 'SUCCESS'
		and environment = 'PRODUCTION'
),

_incidents as (
-- get the incidents (caused by deployments) that are created within the selected time period in the top-right corner
	SELECT
		date_format(i.created_date,'%y/%m') as month,
		i.id AS incident_id,
		pim.deployment_id
	FROM
		issues  i
	  join project_issue_metrics pim on i.id = pim.id
	WHERE
	  pim.project_name in ($project) and
		i.type = 'INCIDENT'
),

_calendar_months as(
-- deal with the month with no incidents
	SELECT date_format(CAST((SYSDATE()-INTERVAL (month_index) MONTH) AS date), '%y/%m') as month
	FROM ( SELECT 0 month_index
			UNION ALL SELECT   1  UNION ALL SELECT   2 UNION ALL SELECT   3
			UNION ALL SELECT   4  UNION ALL SELECT   5 UNION ALL SELECT   6
			UNION ALL SELECT   7  UNION ALL SELECT   8 UNION ALL SELECT   9
			UNION ALL SELECT   10 UNION ALL SELECT  11
		) month_index
	WHERE (SYSDATE()-INTERVAL (month_index) MONTH) > SYSDATE()-INTERVAL 6 MONTH	
),

_deployment_failures as (
-- calculate the number of incidents caused by each deployment
  SELECT 
		distinct
			cm.month,
			d.deployment_id,
			count(distinct i.incident_id) as incident_count
  FROM 
  	_calendar_months cm
  	left join _deployments d on cm.month = d.month
  	left join _incidents i on d.deployment_id = i.deployment_id
	GROUP BY 1,2
)

SELECT
  month,
	case when 
		count(deployment_id) is null then null
		else count(case when incident_count = 0 then null else incident_count end)/count(deployment_id) end as change_failure_rate
FROM _deployment_failures
GROUP BY 1
ORDER BY 1

If you want to measure in which category your team falls into, run the following SQL in Grafana.

with _deployments as (
-- get the deployment deployed within the selected time period in the top-right corner
	SELECT
		ct.id AS deployment_id,
		ct.finished_date as deployment_finished_date
	FROM
		cicd_tasks ct
		join project_mapping pm on ct.cicd_scope_id = pm.row_id
	WHERE
	  pm.project_name in ($project)
		and type = 'DEPLOYMENT'
		and result = 'SUCCESS'
		and environment = 'PRODUCTION'
    and $__timeFilter(finished_date)
),

_incident_caused_by_deployments as (
-- get the incidents (caused by deployments) that are created within the selected time period in the top-right corner
	SELECT
		i.id AS incident_id,
		pim.deployment_id
	FROM
		issues  i
	  join project_issue_metrics pim on i.id = pim.id
	WHERE
	  pim.project_name in ($project) and
		i.type = 'INCIDENT'
		and $__timeFilter(i.created_date)
),

_deployment_failures as (
-- calculate the number of incidents caused by each deployment
  SELECT 
		distinct
			d.deployment_id,
			d.deployment_finished_date,
			count(distinct i.incident_id) as incident_count
  FROM 
  	_deployments d
  	left join _incident_caused_by_deployments i on d.deployment_id = i.deployment_id
	GROUP BY 1,2
),

_change_failure_rate as (
	SELECT 
		case when count(deployment_id) is null then null
		else count(case when incident_count = 0 then null else 1 end)/count(deployment_id) end as change_failure_rate
	FROM
		_deployment_failures
)

SELECT
	case  
		when change_failure_rate <= .15 then "0-15%"
		when change_failure_rate <= .20 then "16%-20%"
		when change_failure_rate <= .30 then "21%-30%"
		else "> 30%" 
	end as change_failure_rate
FROM 
	_change_failure_rate

How to improve?

Add unit tests for all new feature
“Shift left”, start QA early and introduce more automated tests
Enforce code review if it's not strictly executed