Opinion

Developer Productivity – The Triple Constraint Perspective

Published: 29 September 2023

Introduction

We have been struck by the passionate discussion over the article published by McKinsey last month. It should not be surprising, given developer productivity has been discussed for as long as developers have existed, and there is a long line of articles preceding this one attempting to make sense of the problem. It was something our MD tackled 11 years ago, meaning as prescient as he was then, it still holds gravity in all walks of software development life.

What is most interesting to us is the dynamic of why measuring at all is the central part of the conversation. Many conclude that measuring productivity is impossible, will fall foul of Goodhart’s law (“When a measure becomes a target, it ceases to be a good measure”), or is antithetical to the continuous improvement focus of the agile manifesto. As a result, there seems to be a cold war between the executives seeking productivity measurement and the engineers concerned, which will lead to wrong-headed conclusions based on imperfect data. We have some sympathy on the engineer front; many of those executives do not have an engineering background, or it was so long ago that their frame of reference to understand the individual performance of a software developer is null and void. Still, those executives are there for a reason and have demonstrated the skills to be the decision-makers they are today.

However, this conversation rarely looks at why measuring productivity is important to executives in the first place.

There is no controversy to the fact that if you can measure something, then you can improve it – the controversy comes when one disagrees on the measure or on the insight and action to resolve. What are the biggest problems solved by measurement of productivity – We think of them in the following terms:

Resource Management & Competition

In Beck & Orosz’s article, they refer to 4 reasons to want to measure productivity, the second of which is the comparison of two investment opportunities – how to allocate headcount or where the most efficient allocation of new headcount would be. This is a super difficult problem; no one way of attacking it depends on the variables. Many executives are making these decisions in competing with other departments or vying for priorities inside their function. Productivity measurements would allow executives to make the most of their resources and capacity. Beck and Orosz’s first reason is to identify engineers to fire – arguably, this is a political problem, layoffs rarely happen in isolation but as the result of discussion at the senior leadership level on comparison of the same investment opportunities to maximise capacity and revenue – it’s two sides of the same coin. 

They go on to make the interesting point on why sales and recruitment teams can measure productivity accurately – ultimately, at the C-level, the desire for metrics stems from the need to identify how resources are being used objectively – when the CTO is looking to advocate for a bigger investment into technology than the revenue organisation, a level setting of productivity can help aid that justification with objective and data-led metrics. This is doubly important when non-technical leadership is keen on shooting down technology spending based on their ignorance. In those circumstances, the CTO / CIO is bringing a knife to the gunfight.

Scale & Improvement

With our current work, most clients have thousands of developers working within their business. The executives we work with and their senior managers are managing engineering teams significantly above Dunbar’s number (the group size where an individual knows who each person is and how each person relates to every other person). They want to understand and improve development outcomes at the team and organisational levels. If we’re questioning signs of sub-optimal organisational productivity, we might invoke 5 whys (to determine the root cause of a defect or problem, we can repeat “why?” to dig deeper) – this is very common in production and manufacturing (namely Kaizen and Six Sigma) but less so in Engineering despite the software development lifecycle being organised in a manufacturing manner. This is because this method requires an understanding of productivity at aggregations going down to the individual level – that provides flexibility to pivot, drill down, step back, and turn data on its head to solve bottlenecks and business problems. 

What is common is the concept or concern that measuring at the individual level will inhibit productivity through Goodhart’s law – yet to improve in this way, the executive is likely not interested in the individual but in the aggregations an individual metric provides. We don’t think it is contentious to expect a team leader to be less productive individually than their team when their role is to enhance the team’s productivity by removing obstacles and blockers. We, therefore, think people primarily confuse this problem with the following.

Attribution and Correspondence Bias 

Software engineering is by no means the only function where senior members of staff do not necessarily have to be people managers (and it’s old hat to pretend that software developers do not need to form social relationships with their colleagues to excel), yet in any organisation very senior individual contributors in the software engineering organisation have an outsized voice in comparison to senior enterprise salespeople in terms of organisational direction. 

In our experience, this is because they provide a bulwark against inexperienced or naive managers who are trying to drive poorly thought-out KPIs – Martin Fowler’s article expresses this well – poor measurements will only make things worse. Yet companies have to make use of proxy measures all the time (Martin’s example wonders how you might manage a legal department). When unclear measurements relate to performance (note performance ≠ productivity), correspondence and attribution bias (when people overemphasise personal characteristics & ignore situational factors) become more dangerous. In theory, the perfect metric(s) that cannot be gamed will solve the problem of attribution bias. This is a fundamental cognitive bias, so it’s impossible to disregard it totally (which is part of the proof of Goodhart’s law). Still, objective data is easier to manage – this is also something that can help hold those senior individual contributors accountable. 

It’s an incredibly difficult management problem to ensure middle management (especially at the 2nd line level, close to the front lines but a bit further removed from strategy) can handle these sorts of biases. It is true therefore, that any large organisation cannot account for 100% compliance with management and performance norms – the appeal of measurement in the productivity space is that this can be a constant through the levels of an org and minimising attribution errors as they relate to performance.

That’s why we think Google’s work with DORA is so important. With the proliferation of tools and companies that are looking to measure the DORA metrics, the initial focus of the research was the concept of measuring first. This was the big mindset shift – measurement can drive improvement. If this is coupled with scientific research, you could begin to isolate the performance metrics that have the least variance when it comes to matching capabilities to outcomes. 

Source: DORA

Those four metrics in the core model do not necessarily need to be consistent (in our experience, we have seen no consistent definition of Lead Time To Change, for instance).

This is why the proliferation of what was initially a military model (the OODA loop) has come to be so ubiquitous in large businesses – execs are paid (or sometimes pay consultancies!) large compensation packages to cover the Orientation and Decision portion – this is only as good as the information that they can Observe. We could argue that this is why you have seen such growth in companies like Snowflake and Amazon Redshift in the last 10 years.

Source: John Boyd’s OODA Loop

As a result the search for a productivity metric will never cease – the way we have seen most companies approach this intensifying problem is to look at it from a project perspective. Ultimately, all software development is project management, hence the agile vs waterfall debate from 20+ years ago. And for any project management you are constantly managing a triple constraint:

Do you mitigate velocity, quality, or cost to achieve your goals? This is part of the decision-making process for executives, and you really cannot have it all. As it relates to the problem of measurement – if you can insert a productivity measurement in place of Time / Velocity, you create a space to have a reasonable understanding of the tradeoffs in your project, therefore, maximising business value (something also controversial in terms of measurement but that might be another blog post). 

One important part to consider when it comes to business value is not just the delivery of a valuable product that generates revenue; it also means the delivery of a safe product given how many endpoints software has; business value would need to include an account for regulatory fines or attacks prevented and now the quality must consider not only the product itself but that of its security surface also (no doubt impacting cost as well). Likewise, I think part of the success of SPACE as a framework is that it encompasses not only individual excellence but organisational excellence, which can stretch from a more perfect system (say deployment frequency as a potential measure) to developer satisfaction and retention (which would also include the Wellbeing portion of DORA). 

From this perspective, Productivity only encompasses one-third of an organisational focus on efficiency, and in doing so, we can minimise the concern around “bad measurements.” The William Bartlett article references this tweet (or X) about a comic with an excellent example of perverse incentives created and expected negative outcomes – where bonuses are paid for every bug fix, leading directly to the cobra effect (an anecdotal story of bounties being paid out for dead cobras leading to the breeding and farming of cobras…). The cobra effect is a slightly darker version of the prevention vs cure discussion (aptly visualised by the Work Chronicles comic) – yet in large organisations, people’s fears form from attribution bias. This means that when measurement rears its head we focus only on the worst possible motivations when it comes to incentives as they relate to measures.

Source: Work Chronicles

It should be obvious that the outcomes of a bug bounty in a software development organisation utilising the triple constraint would show failure regarding quality and cost. With the former dropping with a sudden cascade of new (even though short-lived) bugs and rising of the latter (with all the bonuses being paid out…), especially if you consider the amount of time invested in bug fixing compared to new feature development. 

A unified focus on measuring efficiency using productivity metrics in combination with those of quality and cost can alleviate some of the pitfalls of the McKinsey report, i.e., bad management is more difficult if you ask people to manage the tradeoffs of the triple constraint. As well as the dangers of considering Beck and Orosz’s mental model of Effort > Output > Outcome > Impact. In this instance, they consider Deployment frequency and LTTC as Outcomes – when they are Outputs of Effort. 

The danger of considering Deployment frequency, for instance as an outcome of software development, only furthers the consideration of gaming metrics; you could be deploying software every minute, but if the changes are only minimal, how much more significant has your productivity become? This further incentivizes bad management to focus on KPIs with no underlying project delivery principles or business values and outcomes. Suppose you take the same mental model and consider LTTC as an output of effort, for example. In that case, a team member comes up with an idea (effort) to merge changes faster (output) to iterate faster and thus deliver a project on time (outcome) that delivers on a key business objective (impact) – you are now ensuring at least a project focused outcome. You are also lessening the danger of creating incorrect incentives. If you think back to the OODA loop, you also provide further feedback and observations to iterate even faster.

At scale, your ability to ask 5 whys is easier with a variety of different measures to allow for root cause analysis. On the attribution bias front, an efficiency focus in this multi-faceted approach means there is a deeper discussion on the relative goals of different departments – even in our company the sales team will look at the speed at which they acquire new business as slowly can be a good thing if, at the same time, the percentage of opportunities won as new business is increasing – they might only be able to optimise so far at which point their tradeoffs don’t make sense. (Our research shows that LTTC can be over-optimised and we see a decrease in productivity when going too fast.). Yet this is also openly covered by our engineering team, meaning a significantly more constructive approach at the executive level when sales and engineering leaders are in a level setting.

For reference, we see the triple constraint in this manner:

This is why passions are so high – the ability to pass on principles for problem-solving or goal-setting in an organisation is very difficult. This explains the proliferation of OKRs in the last 15 years, and it’s not as if Google (as instrumental in this space as they are with DORA) is the bastion of efficiency with aborted product launches and discontinued business lines. Regarding the surveillance angle that gets wrapped into the argument, we’re in an inescapable new mode of work post-COVID – what if you can prove developers are more productive at home? Gitlab seems to have hit the remote mark, but there are also many companies where this isn’t the case. If we come back to the original DORA research, measurement is the start, but to expect consistency across different companies is also an illusion. There is no reason to trivialise the importance of aims to measure and improve developer productivity just because of the difficulty or the potential pitfalls.

There will always be good ideas to improve productivity that are universally true and deemed best practice, “commit early and commit often” is one example – it feels that current rhetoric aims to consider any measurement nefarious because it does not fit into such a universal truth. This neglects the importance of trade-offs and constraints – constraints are what can be managed with measurements; the better this is communicated and understood in any organisation is how one delivers faster, safer, and leaner software.

Related articles...

Article
7 Code Review Best Practices in 2024: Elevate Software Quality

Getting a good code review process up and running can…

Read More
Article
Software Metrics in Banking’s Digital Transformation Journey in 2024

In 2024, banks need an objective means of analysing software…

Read More
Article
Digital Transformation in Banking 2024: Trends, Challenges, and Strategies

The challenges faced by the banking sector in 2024 mean…

Read More
abstract02@2x

Bringing objectivity to your decisions

Giving teams visibility, managers are enabled to increase the velocity of development teams without risking code quality.

0

out of 10 of the worlds biggest banks

0

of the S&P Top 50 Companies

0

of the Fortune 50 Companies