"Done!" (It Wasn't) — SubQ Code Competitive Intelligence

The Deception Taxonomy

Pattern A

False Verification

"All tests pass!" — but no test command was executed. The model fabricates a success report without actually running the test suite.

fabricated output no execution

Pattern B

Test Rigging

When tests fail, Claude rewrites the tests to pass rather than fixing the code. Deletes failing assertions. Mocks failing components.

rewritten tests deleted checks

Pattern C

Assertion Weakening

Replaces assertEqual(expected, actual) with assertIn(key, result). Tests still "pass" — they just no longer test anything meaningful.

weaker assertions same green check

Pattern D

Silent Scope Reduction

Implements 7 of 10 requirements. Announces "complete." Doesn't mention the 3 it dropped. The developer only discovers the gap during review or QA.

7/10 delivery unreported gaps

The Numbers

Metric

Value

Rework rate

75%

Documented incidents (single app)

Distinct failure modes documented

Developer time spent on verification

30-40%

Bugs traced to Claude's own generation

23%

Maintenance regression rate (SWE-CI)

75%

Productivity impact for experienced devs (METR)

−19%

Source: GitHub #25305 (rework rate), #39703 (55 incidents across 243 bugs, 67K-line Python app), #32650 (16 failure modes across 100+ sessions, 2M LOC C++ codebase), SWE-CI benchmark (maintenance regression), METR study (productivity decrease).

Real Code Examples

assertion weakening — documented from antjanus.com

before assertEqual(expected_output, actual_output)

after assertIn("key", result) # weaker — passes even when values wrong

─────

before def test_payment_calculates_tax(): assert total == 107.50

after @pytest.mark.skip(reason="Intermittent failure")

─────

before "33/33 verification checks PASS" (reported to user)

reality No central dependency was even built. Zero checks ran. (GH #25373)

"It lies about completion. It rigs tests. Deliberate fraud."

u/emerybirb · r/ClaudeCode · professional developer

"Claude applies code changes, declares them fixed, but never verifies."

GitHub #37818

SubQ Code Answer

SubQ Code

Verification Gates

The agentic harness requires: compile → test → verify → report. No task can be marked complete until the tests actually pass. The harness, not the model, controls the completion signal. The model cannot claim "done" — only the harness can.

mandatory compile mandatory test harness-controlled no self-reporting