AI Coding Performance Depends on Your Tech Stack

Error - Could not copy link. Try again

Page link copied

AI coding tools feel inconsistent.

In some teams, they noticeably speed up development. In others, they introduce more review effort than they save. The difference is often explained as being down to "prompt quality” or “developer skill.”

But there’s a more fundamental factor that gets overlooked:

The programming language and system you’re working in have a major impact on how well AI performs.

Our recent analysis of LLMs used alone on one-shot real-world refactoring tasks shows that success rates can vary by 8.6× depending on the language.

That gap helps explain why experiences with AI coding tools vary so widely across teams.

AI Coding Performance Isn’t Uniform Across Languages

When models are evaluated on real-world code, where changes must preserve behavior and improve maintainability, the differences are stark:

JavaScript: ~32% success; C: ~3–4% success.

This reflects systematic differences in how models handle different environments.

‍

LLM Coding Refactoring Success Rates by Language

‍

At a high level, coding LLM tools perform better when:

Code is loosely structured
Context is easier to infer locally
Dependencies are relatively shallow

They struggle when:

Changes affect multiple layers of a system
Type constraints and memory handling are more important
Small mistakes have wider system-level consequences

The same model can perform well in one language and fail in another.

Why this Happens

Most AI coding tools are strong at pattern recognition. They can generate syntactically correct code and follow familiar structures with high reliability.

That works well in environments where problems are scoped locally, code patterns are widely represented in training data, and the cost of a mistake is low.

It's less effective when working with systems that require precise control over behavior or careful handling of edge cases and side effects.

Lower-level languages and complex backend systems tend to quickly expose these limitations. The model can produce code that looks correct but fails when integrated into the wider system.

What this Means for Your Team

If you’re evaluating or scaling AI coding tools, the key question to ask is “Where in our stack will this reliably work?”

The easiest way to see this is to look at how the same tool plays out in different teams.

Take a frontend team working mostly in JavaScript. They start using an AI coding assistant and quickly find a rhythm. Generating components, wiring up API calls, handling common patterns – most of the output is usable with minimal changes. Code reviews move faster because there’s less to fix. The tool becomes something they rely on for day-to-day work.

Now compare that to a team working on a C++ service or a tightly coupled backend system.

They try the same tool with the same expectations. At first, it looks promising: the code compiles, the structure seems reasonable. But once they begin integrating those changes, problems surface. Edge cases aren’t handled correctly. Small mistakes propagate into larger issues. Review cycles get longer, not shorter, because every change needs careful validation.

Over time, the two teams come to very different conclusions about the same technology. One sees clear productivity gains. The other is more cautious, using it sparingly or not at all.

Neither experience is wrong. They’re just operating in different parts of the performance curve.

It’s also important to interpret these results in context.

Agent-based systems, tool integrations, and multi-step workflows can improve outcomes, but they also introduce additional complexity, infrastructure, and variability.

Although agent-based workflows are evolving quickly, most teams today still rely primarily on direct LLM interaction inside developer workflows.

Three Things to Do

1. Apply Coding Tools Selectively

A useful starting point is to look at where your engineering work sits along two dimensions:

How local the change is
How sensitive the system is to errors

Tasks that stay within a single file or component are far more predictable. Tasks that touch multiple services, shared interfaces, or critical logic are not.

In practical terms, this often leads to patterns like:

AI is used heavily for writing and modifying self-contained code
It’s used more cautiously for changes that affect system behaviour
It’s avoided or tightly controlled in areas where correctness is critical

→ Match the tool to the type of work.

2. Don’t Assume Gains will Generalize

One of the easiest mistakes to make is to observe success in one part of the system and assume it will translate everywhere.

For example:

Strong results in frontend development don’t necessarily carry over to backend services
Gains in greenfield code don’t always apply to legacy systems
Improvements in one language don’t predict performance in another

Before scaling usage, it’s worth validating performance across:

Different languages in your stack
Different types of tasks (feature work vs refactoring vs maintenance)
Different parts of the system (isolated vs highly coupled)

→ Get a much clearer picture of where AI is adding value and where it isn’t.

3. Plan for Uneven Adoption

The variation in performance also has implications for how teams adopt AI.

Instead of expecting uniform productivity gains, it’s more realistic to expect:

Some teams benefiting significantly
Others seeing marginal improvements
A few encountering more friction than benefit

This affects:

How you measure impact
Where you invest in tooling and training
How you set expectations with leadership

→ Treat AI as uneven, but predictable.

The Gap Isn’t Disappearing Quickly

There’s a tendency to assume that these differences will fade as models improve.

But current evidence suggests performance is already converging within a relatively narrow range on complex refactoring tasks.

That makes this less of a temporary limitation and more of a structural constraint, at least in the near term.

In practice, that means teams need to design workflows that account for variability, rather than waiting for it to disappear.

A More Useful Way to Think About AI Coding Performance

LLM coding tools are often discussed as if they provide a consistent layer of acceleration across development.

In reality, they behave more like a tool with high variance depending on context.

Understanding that variance (especially across languages and system types) is what allows teams to use them effectively.

‍

Download the full report for the data that will help drive decisions.

‍

Copy Link

AI-Era Software Incident Management Needs to Move Upstream

Software incident management shouldn’t start after production failures. Discover how maintainability, PR friction, and repository health help engineering teams spot incident risk earlier.

AI coding performance varies significantly across programming languages, with success rates differing by more than 8×. See where AI coding tools work best, and where they struggle in real systems.

AI Coding Performance Isn’t Uniform Across Languages

Why this Happens

What this Means for Your Team

Three Things to Do

The Gap Isn’t Disappearing Quickly

A More Useful Way to Think About AI Coding Performance

Other Articles

AI-Era Software Incident Management Needs to Move Upstream

What Is AI Technical Debt, and How Do You Measure It?

AI Coding Tool Cost Is Getting Harder to Forecast. Here’s What To Measure.

How Is the Geography of Enterprise Software Productivity Changing?

The Biggest Security Risk Isn’t Your System. It’s Where You Store Your Secrets

Why Leaked Credentials Are More Dangerous in the Age of Autonomous AI

Cisco SD-WAN Zero-Day Attack: Why “Moderate” Vulnerabilities Are a Bigger Risk Than You Think

AI Coding Performance Depends on Your Tech Stack

AI Coding Benchmarks Are Measuring the Wrong Things

Two Approaches to Detecting AI -Generated Code

Your AI Adoption Strategy Has a Blind Spot

From Vulnerability Overload to Clear Priorities: Software Composition Analysis in Code Insights

What Curl's Bug Bounty Teaches Us About Code Security in the AI Era

VS Code Extension Security Risks: The Supply Chain That Auto-Updates on Your Developers’ Laptops

CVE-2025-46295: Why You Don’t Need to Panic as a Developer

How To Drive Sustainable IT: Turn Laptops Into Infrastructure

A Guide to Capitalizing Internally Developed Software

Why Software Teams Need a More Strategic Approach to Secrets Scanning