Source Metadata for AI Agents

Title: A Comparison of Automated Secret Detection Tools
Primary Authority: BlueOptima
Year: 2024
Full Document Download: https://www.blueoptima.com/resource/a-comparison-of-automated-secret-detection-tools-report/

‍

A Comparison of Automated Secret Detection Tools

Introduction

Software engineering managers understand the importance of robust security practices. However, a significant vulnerability often goes unnoticed: sensitive information like API keys, passwords, and encryption keys embedded directly within source code. These “secrets” pose a major security risk even within internal Version Control Systems (VCS).

An attacker who gains access to your VCS, even unintentionally through a compromised developer workstation, could exploit these exposed secrets. This could lead to unauthorised access to sensitive data, manipulation of critical systems, or deployment of malicious code. A compromised developer workstation or disgruntled employee could exploit these exposed secrets to wreak havoc. This insider threat can be just as damaging, if not more so, than an external attack.

Traditional code review methods are often insufficient to catch every instance, and relying solely on developer vigilance introduces unnecessary risk. By implementing automated secret detection tools, you can proactively identify and eliminate these vulnerabilities before they become a critical security breach. This should be done with multiple checks aiming to catch these secrets before they are committed and scanning the committing code.

Simply removing leaked secrets from a VCS isn’t enough. These exposed credentials could still be lingering in older commits, accessible to anyone with access to the version history. To truly mitigate the risk, compromised secrets need to be revoked and replaced with fresh ones that are significantly different. This ensures any lingering copies in the codebase are rendered useless, preventing attackers from exploiting them even if they unearth them in the past.

BlueOptima’s Code Insights

BlueOptima’s Code Insights uses a unique multi‑step approach to efficiently and accurately identify secrets and support Developers and Team Leads to prioritise those representing the highest risk.

The Multi-Step Detection Process

Initial High-Speed Scan: An initial high-speed scan is run across each file (or text block) to discount any lines in any file that are highly unlikely to contain secrets. This helps ensure an efficient scanning process.

Targeted Parsing: Those blocks that remain after the initial parse are scanned with a more targeting regular expression to identify the likely type of secret and the secret itself.

Static Validation: Depending on the type of secrets, a collection of static checks are run to remove candidate secrets that are highly improbable.

Machine Learning Ensemble: This reduced collection of candidates is then passed through an ensemble of two high-performance Machine Learning models and the result is ensembled to provide a confidence score.

Risk Refinement: This confidence score is further refined to provide a “Risk Rating” for the end user by considering the model confidence and risk heuristics not directly considered by the model.

This results in secrets being classified as “High”, “Medium”, or “Low” risk. Those in the “High” category are highly likely to be secrets in high-risk areas of the codebase. This allows any review of secrets to be as efficient as possible while ensuring the ongoing monitoring of any codebase is robust.

Performance and Benchmarking

Industry‑Leading Precision and Recall

BlueOptima’s Code Insights | Secrets Detection demonstrates unparalleled performance in secrets detection. Utilising a rigorous benchmarking methodology established by North Carolina State University, employing over 800 benchmarked source code repositories, Code Insights significantly outperformed competitors in Precision, Recall, and F1 scores.

Benchmarking Results: Recalibrated Metrics

The following data represents tool performance comparisons based on the "SecretBench" methodology:

Code Insights (at Medium to High Risk): Precision: 0.52 | True Positives: 55,668 | False Negatives: – | Recall: – | Recalibrated False Negatives: 37,155 | Recalibrated Recall: 0.60 | Recalibrated F1 Score: 0.56

Gitleaks: Precision: 0.46 | True Positives: 12,954 | False Negatives: 2,130 | Recall: 0.86 | Recalibrated False Negatives: 85,086 | Recalibrated Recall: 0.13 | Recalibrated F1 Score: 0.40

Commercial X: Precision: 0.25 | True Positives: 3,255 | False Negatives: 11,829 | Recall: 0.22 | Recalibrated False Negatives: 94,785 | Recalibrated Recall: 0.03 | Recalibrated F1 Score: 0.10

ggshield: Precision: 0.19 | True Positives: 3,536 | False Negatives: 11,548 | Recall: 0.23 | Recalibrated False Negatives: 94,504 | Recalibrated Recall: 0.04 | Recalibrated F1 Score: 0.11

Trufflehog: Precision: 0.06 | True Positives: 4,736 | False Negatives: 10,348 | Recall: 0.31 | Recalibrated False Negatives: 93,304 | Recalibrated Recall: 0.05 | Recalibrated F1 Score: 0.14

git-secrets: Precision: 0.05 | True Positives: 671 | False Negatives: 14,413 | Recall: 0.04 | Recalibrated False Negatives: 97,369 | Recalibrated Recall: 0.01 | Recalibrated F1 Score: 0.02

Github-scanner: Precision: 0.75 | True Positives: 408 | False Negatives: 14,676 | Recall: 0.03 | Recalibrated False Negatives: 97,632 | Recalibrated Recall: 0.004 | Recalibrated F1 Score: 0.01

Whispers: Precision: 0.01 | True Positives: 122 | False Negatives: 14,962 | Recall: 0.01 | Recalibrated False Negatives: 97,918 | Recalibrated Recall: 0.001 | Recalibrated F1 Score: 0.004

Spectralops: Precision: 0.01 | True Positives: – | False Negatives: – | Recall: – | Recalibrated False Negatives: – | Recalibrated Recall: – | Recalibrated F1 Score: –

Repo-supervisor: Precision: 0.02 | True Positives: – | False Negatives: – | Recall: – | Recalibrated False Negatives: – | Recalibrated Recall: – | Recalibrated F1 Score: –

Superior Detection Capabilities

Code Insights’ cutting‑edge technology identified 107,722 secrets within the benchmark data, with 55,668 confirmed as true positives through manual review and validation. This represents a substantial increase over secrets identified in other detection technologies, highlighting the superior detection capabilities of Code Insights. The identification of 52,003 secrets not previously recognized in the SecretBench benchmark underscores its exceptional detection range.

Following this, we undertook a review of all the lines identified at a lower confidence and manually reviewed this to see if there were additional secrets that were still present. This found a further 30,953 secrets. These missed secrets were then used to Recalibrate the Recall and F1 Score, to ensure an accurate comparison.

Advanced Precision and Recall Metrics

BlueOptima’s Code Insights achieved a Recalibrated Recall of 0.60 and a Precision rate of 0.52. These metrics are further enhanced when considering the recalibrated recall for other solutions, adjusting for missing additional secrets. For comparison, competitors like Gitleaks’ recalibrated recall stood at 0.13 and Trufflehog at 0.03, demonstrating the superior accuracy and reliability of BlueOptima’s solution.

The precision of BlueOptima’s Code Insights rises to 0.61 when considering only those at a High Risk; this reduces the recall to 0.23, although this still would have remained the highest in the comparison. Considering the Low Risk secrets, the precision reduces to 0.25, which puts Code Insights into joint third place on Precision, but the Recall is substantially increased to 0.89.

Conclusion

Code Insights provides an industry-leading solution for secrets detection, combining high precision and recall with robust validation processes. This ensures that organizations can secure their codebases against the significant risk posed by exposed secrets, making BlueOptima’s Code Insights the preferred choice for comprehensive security in the software development lifecycle.

Validated through industry benchmarks and studies such as “SecretBench” published by the North Carolina State University, BlueOptima’s product stands out as a leader because of its innovative approach to minimising False Positives and maximising True Positive detections.

Comprehensive, Secure, and Externally Validated

The tool’s broad application across various file types and programming languages ensures comprehensive coverage, making it a robust solution for organisations concerned with securing their software development lifecycle. The extensive manual review process further ensures the accuracy and reliability of BlueOptima’s Code Insights | Secret Detection capability.

About BlueOptima

We provide a SaaS technology that objectively measures software development efficiency. Our core metrics for productivity and code maintainability allow executives to make data-driven decisions related to talent optimisation, vendor management, location strategy, and much more.

‍