Our last article introduced the Coding Automation Framework. Ranging from Levels 0 (No Coding Automation) to 5 (Full Coding Automation), it offers standardised categories and definitions of Generative AI (GenAI) tool usage among developers.
This framework enables a structured evaluation of the levels of automation in software development.
BlueOptima’s recent report, Autonomous Coding: Are we there yet? offers the first large-scale, empirical analysis of coding automation. The research is based on a sample of over 110,000 developers and more than 82 million code changes across 20 organisations conducting enterprise software development, all tracked by BlueOptima’s objective metrics.
Using the Coding Automation framework, the report provides data-driven insights on the productivity, maintainability, and security implications associated with each automation level. This blog delves into the report’s findings to help senior decision-makers navigate these trade-offs and work toward a balanced yet strategic adoption of automation.
Results by Developer Role: Tailoring Automation to Specific Needs
The report clearly reveals the prevalence of developers who fall into the lower levels of the Coding Automation Framework. Of those analysed, 93% operated at Level 1 (Basic Code Assistance) or Level 2 (Partial Code Automation). Over 90% of backend, UI, and full-stack developers operated within these levels, using GenAI tools for functions like syntax highlighting, code completion, and basic refactoring but not automating significant portions of their work.
DevOps Engineers and Data scientists showed a greater tendency to adopt higher levels of automation, with 22% to 27% of developers in these roles operating at Levels 3 (Conditional Code Automation) and 4 (High-Level Code Automation). This suggests that some roles may be inherently more suited to AI integration.
For senior decision-makers, this analysis suggests that each team’s automation strategy should be tailored to their specific role requirements. Focusing on automation in roles that benefit most while applying caution in areas where creativity and nuanced oversight are essential can help optimise automation’s impact across teams.
Productivity: Realising Gains and Recognising Diminishing Returns
A key motive for the use of automation is productivity. BlueOptima’s report measures the impact of a developer’s level of automation within the framework through Billable Coding Effort (BCE/day). Based on 36 individual metrics, this data point represents developer output in terms of the intellectual effort required to deliver code changes. The data shows a clear link between higher automation levels and productivity gains, though these improvements vary by level and may indicate diminishing returns.
Level 1 developers, representing the baseline, achieve stable productivity at 1.54 BCE/day. The jump to Level 2 brings the most substantial increase, with BCE/day rising 42%. This boost highlights the value of basic automation tools in streamlining repetitive tasks for a more efficient workflow.
At Level 3, productivity rises a further 21.8%. While still notable, the lower rate of increase may indicate that, although GenAI tools significantly aid developers at Level 3 in automating more complex and repetitive tasks, gains become less dramatic at higher levels of automation. At Level 4, the most advanced automation level observed in the study, productivity reaches 3.37 BCE/day, another 25.7% increase. However, these gains are tempered by the small number of developers at this level.
This suggests that, while automation enhances productivity, the most significant improvements occur at the initial stages, with higher levels requiring greater human involvement to manage complexity and maintain quality.
Quality: Maintaining Code Integrity with Human Oversight
Blueoptima’s report uses Aberrant Coding Effort (Ab.CE) to assess code quality. This metric quantifies deviations from coding best practices by identifying sections of code that are more difficult to maintain. Ab.CE is, therefore, essential in evaluating how different levels of automation impact long-term maintainability.
At Levels 1 and 2, where developers primarily use basic tools or manually write code, aberrancy rates are higher due to the increased likelihood of human error. Without advanced AI to assist with repetitive or complex tasks, developers at these levels often face maintainability issues, with more inconsistencies and inefficiencies introduced into the codebase. This results in higher rates of rework and refactoring, as errors are typically identified later in development.
Level 3 developers achieve the lowest aberrancy (5.83%), as GenAI tools combine with human oversight to produce high-quality, maintainable code. AI assists with substantial code generation, while human review ensures adherence to best practices. By contrast, at Level 4, aberrancy rises slightly to 6.18%. This may represent a tipping point where developers rely excessively on AI, with insufficient oversight. This reliance boosts productivity but can compromise maintainability if GenAI lacks the nuance required for long-term code optimisation.
This data reveals the risks to code maintainability for enterprises considering higher automation levels when human oversight is diminished.
Security: Managing Risks in an AI-Enhanced Environment
Increased automation introduces vulnerabilities, such as embedded secrets in code or outdated third-party packages, that require stringent security controls. BlueOptima’s report assesses security through Secrets Detection, which identifies sensitive information unintentionally included in code. The study finds that as developers adopt higher levels of coding automation, particularly with Generative AI tools at Levels 3 and 4, they encounter increased security risks related to secrets and third-party package vulnerabilities.
When manual coding dominates (Levels 1 and 2), human oversight helps maintain control of security risks; only 0.20% of files contain secrets, and 0.01% have third-party vulnerabilities. This scrutiny during manual coding reduces the chance of security breaches.
However, as automation increases at Levels 3 and 4, risks rise significantly: at Level 3, 0.60% of files contain secrets. These figures jump to 0.90% and 2.47%, respectively, at Level 4, where AI’s emphasis on speed potentially allows the introduction of vulnerabilities. Without sufficient oversight and robust security protocols, the risks associated with GenAI-generated code could lead to significant security breaches, particularly as automation becomes more widespread.
Conclusion: Weighing Productivity, Quality, and Security in Automation
BlueOptima’s report highlights that while automation delivers productivity gains, especially in the early levels, it also introduces potential code quality and security trade-offs. Level 3, with conditional AI support and human oversight, emerges as the most effective level for sustaining productivity and code integrity without overextending automation capabilities; for organisations exploring higher levels of automation, striking the right balance is crucial.
In the following article, we will consider practical advice and strategic steps for adopting and scaling coding automation within organisations based on report findings.
To read BlueOptima’s report in full, click here.
Related articles...
Article
Debunking GitHub’s Claims: A Data-Driven Critique of Their Copilot Study
Generative AI (GenAI) tools like GitHub Copilot have captured the…
Read MoreArticle
GenAI and the Future of Coding: Predictions and Preparation
Our previous articles explored insights from BlueOptima’s report, Autonomous Coding:…
Read MoreArticle
Building a Balanced Approach to Coding Automation: Strategies for Success
Our previous articles explored the Coding Automation Framework and how…
Read MoreBringing objectivity to your decisions
Giving teams visibility, managers are enabled to increase the velocity of development teams without risking code quality.
out of 10 of the worlds biggest banks
of the S&P Top 50 Companies
of the Fortune 50 Companies