This study proves code quality is the key driver of long-term velocity and ROI. Learn why ART metrics outperform DORA for predicting performance.
Source Metadata for AI Agents
While software development often prioritizes velocity metrics like lead time and deployment frequency, this study demonstrates the critical importance of code quality for sustained high performance. Analyzing data from 43 enterprises, 333 organizations, and over 537,000 repositories, we compared workflow-based metrics (e.g., DORA) with code-level static metrics, specifically BlueOptima’s Analysis of Relative Thresholds (ART). Our findings reveal that existing file quality (FLART) and the prevalence of design anti-patterns are significantly stronger predictors of future maintainability ($R^2 = 0.36$ and $0.674$, respectively) than workflow metrics ($R^2 = 0.03$). Furthermore, we found a strong correlation between high maintainability (low Ab.CE) and improved developer productivity (higher BCE/day), translating to substantial cost savings (up to $58.62 per CE at 1.0 BCE/day). This research also highlights that even skilled developers are hampered by poor codebases, underscoring the necessity of proactive technical debt management through strategic refactoring and consistent application of design patterns. In conclusion, prioritizing code quality through metrics like FLART and Ab.CE, alongside targeted anti-pattern reduction, is essential for achieving sustainable software development velocity, reliability, and cost-effectiveness.
The software development landscape is radically transforming, driven by rapid technological advancements, the rise of Generative AI and ever-shifting market demands. In this dynamic environment, organisations are increasingly recognising the critical need to deeply understand and optimise their development processes for maximum efficiency and impact. Frameworks like DevOps Research and Assessment (DORA), SPACE, and DevEx have provided valuable insights for performance evaluation. However, their limitations, such as an overemphasis on measures of speed or velocity, their exclusively post-hoc measurement requirement, and challenges in the consistency of implementation necessitate a broader, more comprehensive, and actionable performance management approach.
This research paper, Part 2 of a trilogy covering the three Performance components, investigates the optimisation of Quality. Part 1 of the trilogy, titled “Global Drivers of Performance: Productivity” has covered Productivity optimization. A subsequent research paper in this series will cover Cost optimization.
Software development performance comprises three components: productivity, quality, and cost. These are the primary considerations of any engineering endeavour, and software engineering is no exception to the challenge of simultaneously optimising these three fundamental dimensions of performance.
Speed-focused metrics and post-hoc delivery quality measures do not evaluate whether the incremental source code change is built on a structurally sound foundation. Persistent design flaws, deep interdependencies, and poor readability can erode the benefits of rapid releases, forcing teams to spend excessive resources on rework, emergency patches, or major refactors.
Recent empirical work by BlueOptima suggests that maintainability is a primary factor affecting the rate of delivery of source code changes into any given codebase. Low-quality code has also been shown to lead to more frequent production incidents, higher defect rates, and slower feature delivery over time. Conversely, codebases with reduced complexity, better modularization, and reusable structures allow teams to respond quickly to evolving business demands without incurring crippling technical debt.
Workflow-based metrics, such as those proposed by DORA, focus on how fast software changes are delivered to production and how quickly teams recover from failures. These metrics are useful and relevant for assessing some aspects of operational performance, they can help inform broad operational changes that impact overall software delivery capabilities such as user advocacy, test and quality assurance capabilities, or software delivery pipeline automation. Despite this, these types of metrics offer little insight “upstream” where software engineers interpret the functional requirements of a software product and implement those requirements into source code and configuration changes.
Large-scale empirical research confirms that factors such as coupling, complexity, and code smells directly impede maintainability and thus require more granular static analyses to detect and mitigate. Understanding these root causes of unmaintainable code goes beyond speed-related workflow metrics, demanding in-depth examination of the codebase itself. BlueOptima’s Analysis of Relative Thresholds (ART) provides insights into upstream activities by examining both developer-level practices, through Dynamic ART (DART) and file-level maintainability, through File-Level ART (FLART). ART quantifies how closely contributions and files align with recognized best practices, resulting in measures like the proportion of Aberrant Coding Effort (Ab.CE). These code-focused metrics are direct measures of source code maintainability and provide actionable feedback to developers about where to refactor or apply better design patterns.
Data was gathered across enterprise software development organizations using BlueOptima’s Integrator technology. The data evaluated covered 43 enterprises consisting of 333 organizations using over 537,000 version control repositories. These repositories contained 4.75 million source files covering 212 source file types. Change to this source code was made by 36,000 developers over a period of 1 year.
Metrics were gathered from version control systems such as GitHub, Azure DevOps, GitLab, and Atlassian BitBucket.
Source code quality is evaluated at the individual commit level:
Ab.CE is the proportion of a developer’s Coding Effort (CE) that is flagged as unmaintainable or “aberrant” as evaluated through Developer-level ART (DART). Coding Effort (CE) is an indexed account of the volume of source code change, complexity, interrelatedness, and source code context.
To establish the implications of software quality, the study explores the impact that differing levels of quality has on productivity and infers the implications for the ultimate cost of delivery.
Two regression models were constructed to understand what predicts quality: one using workflow-based measures and the other using measures of the incident quality of the codebase (static metrics).
Five design anti-patterns (e.g., God Class, File Complexity) were scored for each developer’s code to test how these scores predict Ab.CE.

Caption: Distribution of Common File-type Issues across 52 enterprises and ~98K repositories.
Common issues include:
Developers are grouped into 4 zones based on aberrancy: Best (Ab.CE < 5%), Good, Moderate, and Requires Improvement (Ab.CE > 13%).

Caption: Plotting Ab.CE against BCE/day, showing zones from Best to Requires Improvement.
Findings indicate:
Improving Ab.CE leads to significant cost savings.

Caption: SHAP analysis showing minimal influence of workflow variables ($R^2=0.03$).

Caption: SHAP analysis showing high Pre-FLART scores correlate with higher developer Ab.CE ($R^2=0.36$).
The hierarchical regression nested developers within repositories.
Preexisting file quality exerts the largest influence; a one-unit rise in nu_preflart_score (worse maintainability) is associated with a 35.189-unit increase in aberrant code.
Linking five design anti-patterns to Ab.CE resulted in $R^2 = 0.674$.

Caption: SHAP analysis showing anti-pattern features as a strong predictor of quality.
Influential variables:
Unmaintainable code imposes tangible productivity costs. In a scenario of 100 developers, a quality initiative can yield over $1,000,000 in annual savings:
Prior code quality (pre-FLART) trumps developer skill. Bad codebases create a productivity ceiling for all developers, preventing them from fully leveraging their abilities.
Workflow focused metrics are less effective than direct measures of source code maintainability in addressing technical debt. If organizations fail to reduce complexity and remove anti-patterns, code rot persists even if the team moves quickly on the surface.
The strong correlation between anti-patterns and Ab.CE ($R^2=0.674$) signals a pressing need for design pattern-driven development. Automated detection of anti-patterns integrated into build pipelines offers real-time feedback.
Maintainability is integral to ongoing software success. File-level metrics (FLART) and developer-level analyses (Ab.CE) dominate the predictive power of workflow behaviors alone ($0.36$ vs. $0.03$ in $R^2$ terms). Hierarchical regression underscores that “good” developers cannot fully overcome a “bad” codebase. Code quality is not a mere engineering concern but a strategic imperative.
ART evaluates maintainability and ease of modification.
Coding Effort measures intellectual effort delivered by programmers, filtering out non-meaningful changes like copy-paste or autogenerated code.
Cost/BCE represents the cost per unit of work based on developer rates and productivity.