Why No LLM - Refactron

Refactron is the only modern refactoring tool where AI never touches your code. The engine — analyzers, transforms, verifiers, atomic writers — is 100% deterministic. The only place an LLM appears is the optional document command, which generates docstrings on top of an already-verified diff. Nothing the LLM produces can ever break your build. This is a design choice, not a limitation. It’s the choice the published research has made for us.

The five facts

1. ~40% of Copilot programs contain exploitable security vulnerabilities

NYU 2022 study, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions.” Across 89 hand-built scenarios from MITRE’s Top 25 CWEs, 40% of Copilot’s top suggestions contained a vulnerability. arXiv: 2108.09293.

2. 92.45% of Copilot-generated tests fail or are broken when there’s no existing suite

ACM AST 2024 study, “An Empirical Study on the Usage of Automated Tools for Testing Code Generated by Copilot.” When asked to generate unit tests for code without a pre-existing test infrastructure, 92.45% of the generated tests were unrunnable or failed. DOI: 10.1145/3644032.3644443.

3. 19.7% of LLM-recommended packages are fabricated

UTSA / Virginia Tech / University of Oklahoma joint 2024 study, “We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs.” Across 16 LLMs, 19.7% of recommended package names did not exist on PyPI / npm. arXiv: 2406.10279.

4. 33% of developer time goes to tech debt

Stripe 2018 Developer Coefficient report — surveyed 1,000+ developers and found 33% of weekly hours go to maintenance / refactoring / fighting legacy code, not new feature work.

5. 62% of developers cite tech debt as their #1 frustration

Stack Overflow 2024 Developer Survey — across 65,000 respondents, “technical debt” was the most-cited frustration in professional work.

So where does AI fit?

In documentation, not in the engine. Refactron’s document command runs after verification has already passed. It receives the verified diff, generates a docstring per touched function and a CHANGELOG entry, and writes them through the same atomic batch writer. The worst-case failure mode is a wrong docstring — never broken code, never an introduced vulnerability, never a hallucinated import. LLMs are excellent at fluent natural language. They are not yet reliable at formal correctness. We deploy them where they’re good and keep them away from where they’re not.

Compared to

Tool	Deterministic engine	Verifies before write	Multi-language	Actively maintained
Refactron	yes	yes (3 gates)	Python + TypeScript	yes
Cursor	no (LLM)	no	many	yes
GitHub Copilot	no (LLM)	no	many	yes
Greptile	no (LLM-assisted)	partial	many	yes
jscodeshift	yes	no	TypeScript / JavaScript	abandoned by Meta 2024 (#587)
OpenRewrite	yes	partial (visitor preconditions)	Java-first	yes
Comby	yes	no (structural search/replace)	many	yes

Citations

Opdyke, William F. Refactoring Object-Oriented Frameworks. PhD thesis, UIUC, 1992. PDF
Roberts, Donald B. Practical Analysis for Refactoring. PhD thesis, UIUC, 1999.
Wang et al. ICSE 2018, “How Practitioners Perceive Automated Bug Report Management Techniques.”
Brunsfeld, Max. “Tree-sitter: a new parsing system for programming tools.” Strange Loop 2018.
Instagram engineering blog, “Static analysis at scale: Meta’s approach” — and the LibCST launch post (2019), the underlying parser Refactron uses for Python.

​The five facts

​1. ~40% of Copilot programs contain exploitable security vulnerabilities

​2. 92.45% of Copilot-generated tests fail or are broken when there’s no existing suite

​3. 19.7% of LLM-recommended packages are fabricated

​4. 33% of developer time goes to tech debt

​5. 62% of developers cite tech debt as their #1 frustration

​So where does AI fit?

​Compared to

​Citations