The Impact of Generative AI on Software Developer Performance

Abstract

This study represents the largest and most comprehensive empirical investigation into the impact of Generative AI (GenAI) on software developer performance to date. By analyzing 218,354 professional software developers and approximately 880 million commits from mid-2022 to mid-2024, researchers addressed key questions regarding productivity, code quality, and adoption patterns. The study utilized a mixed-method approach, including Code Author Detection (CAD) analysis to identify AI-contributed code at the method level. Key findings demonstrate that while GenAI tools can enhance developer productivity by approximately 4%, only about 1% of developers consistently commit GenAI-authored code without significant manual rework.

Introduction

The software development industry has high expectations for Large Language Models (LLMs) to automate routine tasks and streamline development. While some predict significant boosts in efficiency, others caution that the need to review and correct poor-quality AI code might ultimately impair progress. Historically, tools like compilers and IDEs have boosted reliability; GenAI is similarly expected to reduce cognitive load, though it introduces new risks regarding security and over-reliance.

Performance Evaluation Framework

This paper evaluates performance through three critical dimensions:

Productivity: Measured by Billable Coding Effort per Day (BCE/Day), representing intellectual effort adjusted for stored changes and prorated across working days.
Quality: Measured as Aberrant Coding Effort (% Aberrant BCE), an objective account of code maintainability across more than 20 static metrics.
Cost: Defined as the fully loaded cost of employing software developers.

Methodology

The study employed two primary independent methodologies to control for usage effects:

Licence-based Group Comparison: Comparing an Experimental Group (officially granted GenAI access) with a matching Control Group using ANOVA.
CAD-based Group Comparison: Using advanced Machine Learning to independently detect AI-authored code committed to version control without material editing.

Developer Group Classifications

High AI-Contributing: Above-average levels of AI-generated code in commits (1,031 developers).
Low AI-Contributing: Below-average levels of AI-generated code in commits (1,031 developers).
Zero-AI: Developers with access who did not commit identified AI-authored code.

Key Results and Findings

Productivity Gains

Average Boost: Developers using GenAI tools experienced a productivity boost of just over 4% on average.
High vs. Low Contributors: High AI-contributors saw an 8.4% increase, while low contributors saw a more modest 1.93% gain.
Non-AI Decline: Developers who avoided AI tools altogether experienced a 2.08% decline in productivity during the same period.

Impact on Code Quality

Stable Quality: The study found that GenAI tools generally maintained or slightly improved code quality.
Aberrancy Trends: The Experimental Group saw a 2.63% reduction in aberrancy (improved maintainability), whereas the Control Group's quality remained stable.

Adoption and Behavior Patterns

Low Unaltered Adoption: Only 12% of licensed users committed GenAI-authored code multiple times without significant manual changes.
Early Adoption: 64% of developers who used GenAI did so before they were officially granted corporate licences.

Discussion

A striking finding is the low incidence of GenAI code being committed without significant human rework. This suggests that current LLMs lack the contextual proficiency to deliver value autonomously in production environments. Interestingly, Low AI-Contributing Developers emerged as the highest overall performers, suggesting a "sweet spot" where AI augments human expertise rather than replacing it.

Recommendations for Executives

Address Adoption Barriers: Investigate why so few developers commit unaltered AI code; identify specific workflow clashes or reliability concerns.
Promote Balanced Integration: Develop guidelines that prioritize human oversight and judgment to hit the productivity "sweet spot".
Targeted Skill Support: Focus AI upskilling on roles most likely to benefit, such as DevOps or Data Science.
Continuous Performance Monitoring: Use objective metrics like BCE/Day to ensure productivity gains do not come at the expense of long-term maintainability.

‍