Code Coverage Agent
The code coverage agent autonomously increases JaCoCo test coverage by analyzing code, generating comprehensive tests, and validating Spring OSS best practices.
1. Real Results
We tested the agent on Spring’s gs-rest-service tutorial (a simple REST API):
Metric | Result |
---|---|
Baseline Coverage |
0% (no tests initially) |
Final Coverage |
71.4% line coverage, 87.5% instruction coverage |
Target |
20% (exceeded by 3.5x) |
Tests Generated |
8 comprehensive test methods |
Repository |
2. Claude vs Gemini: Best Practices Adherence
Both models achieved 71.4% coverage, but Claude followed ALL Spring WebMVC best practices while Gemini did not—despite receiving identical prompts.
Practice | Claude | Gemini | Why It Matters |
---|---|---|---|
@WebMvcTest |
✅ |
❌ @SpringBootTest |
10x faster startup, loads only web layer instead of entire application context |
jsonPath() |
✅ |
❌ ObjectMapper |
Cleaner API, less boilerplate, better readability |
AssertJ |
✅ |
✅ |
Both used fluent assertions correctly |
BDD naming |
✅ |
❌ |
Tests read like specifications: |
Edge cases |
✅ |
✅ |
Both tested empty strings, special characters, Unicode, long inputs |
2.1. Generated Test Code (Claude)
Claude generated production-quality tests following Spring conventions:
@WebMvcTest(GreetingController.class) (1)
public class GreetingControllerTests {
@Autowired
private MockMvc mockMvc;
@Test
public void greetingShouldReturnDefaultMessageWhenNoParameterProvided() throws Exception { (2)
mockMvc.perform(get("/greeting"))
.andExpect(status().isOk())
.andExpect(content().contentType(MediaType.APPLICATION_JSON))
.andExpect(jsonPath("$.content").value("Hello, World!")) (3)
.andExpect(jsonPath("$.id").isNumber());
}
@Test
public void greetingShouldHandleSpecialCharactersInName() throws Exception {
mockMvc.perform(get("/greeting").param("name", "José & María"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.content").value("Hello, José & María!"));
}
@Test
public void greetingShouldHandleUnicodeCharactersInName() throws Exception { (4)
mockMvc.perform(get("/greeting").param("name", "世界"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.content").value("Hello, 世界!"));
}
// ... 5 more comprehensive tests
}
1 | @WebMvcTest - Loads only web layer (fast, focused) |
2 | BDD naming - Test name describes behavior clearly |
3 | jsonPath() - Clean JSON validation without manual parsing |
4 | Edge cases - Unicode, special characters, long inputs |
Full Test Files for Transparency: Claude Code (1 test file): * GreetingControllerTests.java - 8 tests, perfect Spring conventions (@WebMvcTest, jsonPath(), AssertJ, BDD naming) Gemini (2 test files): * GreetingControllerTests.java - 6 tests, @SpringBootTest usage * GreetingTests.java - 2 tests for Greeting record All files are the actual unmodified output from the agents, showing exactly what was generated. |
3. How It Works: Two-Phase Architecture
The code coverage agent uses a setup/execute lifecycle for complex workflows requiring workspace preparation:
3.1. Phase 1: Setup
The setup phase prepares the workspace and validates preconditions:
@Override
public SetupContext setup(LauncherSpec spec) throws Exception {
// 1. Clone git repository
syncVendir(spec.cwd(), gitUrl, gitRef, gitSubdir);
// 2. Verify code compiles (FAIL FAST)
BuildResult compileResult = MavenBuildRunner.runBuild(workspace, 5, "clean", "compile");
if (!compileResult.success()) {
return SetupContext.builder()
.workspace(workspace)
.successful(false)
.error("Code does not compile")
.build();
}
// 3. Run existing tests (FAIL FAST)
TestRunResult testResult = MavenTestRunner.runTests(workspace, 5);
if (!testResult.passed()) {
return SetupContext.builder()
.workspace(workspace)
.successful(false)
.error("Existing tests fail")
.build();
}
// 4. Measure baseline coverage
CoverageMetrics baseline = tryMeasureBaseline(workspace);
return SetupContext.builder()
.workspace(workspace)
.successful(true)
.metadata("baseline_coverage", baseline)
.metadata("has_jacoco", baseline.lineCoverage() > 0)
.build();
}
Setup responsibilities:
-
Workspace preparation - Clone repository, verify structure
-
Validation - Ensure code compiles and tests pass before agent runs
-
Baseline measurement - Capture initial coverage metrics
-
Fast failure - Stop immediately if preconditions aren’t met
3.2. Phase 2: Execute
The execute phase runs the agent autonomously:
@Override
public Result run(SetupContext setup, LauncherSpec spec) throws Exception {
// Get baseline from setup
CoverageMetrics baseline = setup.getMetadata("baseline_coverage");
boolean hasJaCoCo = setup.getMetadata("has_jacoco");
// Build AI goal with context
String goal = CoveragePromptBuilder.create(baseline, hasJaCoCo, targetCoverage).build();
// Create agent and run autonomously
AgentModel agentModel = createAgentModel(provider, model, setup.getWorkspace());
AgentClient client = AgentClient.builder(agentModel).build();
AgentClientResponse response = client
.goal(goal)
.workingDirectory(setup.getWorkspace())
.run(); (1)
// Measure final coverage
CoverageMetrics finalCov = measureCoverage(setup.getWorkspace());
return buildResult(baseline, finalCov, response, setup.getWorkspace());
}
1 | Agent runs autonomously with no human intervention |
Execute responsibilities:
-
Goal construction - Build prompt with baseline metrics and Spring best practices
-
Autonomous execution - Agent plans, implements, and validates tests
-
Result evaluation - Measure final coverage and compare to baseline
4. Prompt Engineering with Spring AI
The agent uses Spring AI’s PromptTemplate
infrastructure for modular, testable prompts:
public class CoveragePromptBuilder {
private final PromptTemplate mainPromptTemplate;
private final PromptTemplate jacocoPluginTemplate;
public CoveragePromptBuilder() {
this.mainPromptTemplate = new PromptTemplate(
new ClassPathResource("/META-INF/prompts/coverage-agent-prompt.txt")
);
this.jacocoPluginTemplate = new PromptTemplate(
new ClassPathResource("/META-INF/prompts/jacoco-plugin.xml")
);
}
public CoveragePromptBuilder withBaseline(CoverageMetrics baseline) {
variables.put("baseline_line_coverage", String.format("%.1f", baseline.lineCoverage()));
return this;
}
public CoveragePromptBuilder withTargetCoverage(int targetCoverage) {
variables.put("target_coverage", targetCoverage);
return this;
}
public String build() {
return mainPromptTemplate.render(variables);
}
}
Benefits:
-
Externalized prompts - Stored in
/META-INF/prompts/
for easy modification -
Variable substitution - Dynamic content (baseline %, target %)
-
Modular design - Separate templates for main prompt and JaCoCo config
-
Testable - Unit tests validate prompt generation
4.1. Spring OSS Best Practices in Prompt
The prompt includes explicit Spring testing conventions:
SPRING OSS TESTING BEST PRACTICES (MANDATORY):
1. ASSERTIONS - Use AssertJ for fluent, readable assertions:
✅ GOOD: assertThat(greeting.id()).isEqualTo(1)
2. TEST NAMING - BDD-style: methodName[whenCondition]shouldExpectation
✅ GOOD: greetingShouldReturnCustomMessageWhenNameProvided()
3. CONTROLLER TESTING - Use @WebMvcTest for focused, fast controller tests:
✅ GOOD: @WebMvcTest(YourController.class)
4. JSON RESPONSE VALIDATION - Use jsonPath() for cleaner assertions:
✅ GOOD: .andExpect(jsonPath("$.content").value("Hello, World!"))
5. EDGE CASES - Test boundary conditions and special inputs:
- Empty string parameters
- Special characters and URL encoding
- Unicode characters
- Very long strings
Claude followed these practices perfectly. Gemini achieved the same coverage but didn’t follow the testing patterns.
5. Judge Pattern: Coverage Verification
The agent uses a CoverageJudge
for deterministic verification:
public class CoverageJudge implements Judge {
private final double targetCoverage;
public CoverageJudge(double targetCoverage) {
this.targetCoverage = targetCoverage;
}
@Override
public Judgment judge(JudgmentContext context) {
// Parse JaCoCo report
Path reportPath = context.workspace().resolve("target/site/jacoco/jacoco.xml");
CoverageMetrics metrics = JaCoCoReportParser.parseReport(reportPath);
// Compare to target
boolean passed = metrics.lineCoverage() >= targetCoverage;
return Judgment.builder()
.status(passed ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.score(new NumericalScore(metrics.lineCoverage(), 0, 100))
.reasoning(String.format("Coverage: %.1f%% (target: %.1f%%)",
metrics.lineCoverage(), targetCoverage))
.build();
}
}
Verification workflow:
-
Agent completes test generation
-
Maven runs tests and generates JaCoCo report
-
Judge parses
jacoco.xml
report -
Judge compares actual vs target coverage
-
Judge returns pass/fail with detailed score
6. Usage Examples
6.1. Programmatic Usage
// Create agent spec
AgentSpec agentSpec = AgentSpecLoader.loadAgentSpec("coverage");
// Configure inputs
Map<String, Object> inputs = Map.of(
"git_url", "https://github.com/spring-guides/gs-rest-service",
"git_ref", "main",
"git_subdirectory", "complete",
"target_coverage", 80,
"provider", "claude",
"model", "claude-sonnet-4-20250514"
);
// Create launcher spec
Path workingDir = Paths.get("/tmp/coverage-test");
LauncherSpec spec = new LauncherSpec(agentSpec, inputs, workingDir, Map.of());
// Run agent
CodeCoverageAgentRunner agent = new CodeCoverageAgentRunner();
SetupContext setup = agent.setup(spec);
Result result = agent.run(setup, spec);
// Check results
System.out.println("Baseline: " + result.data().get("baseline_coverage_line") + "%");
System.out.println("Final: " + result.data().get("final_coverage_line") + "%");
System.out.println("Workspace: " + result.data().get("workspace"));
System.out.println("Report: " + result.data().get("coverage_report"));
7. Future Enhancements
The current agent uses a deterministic judge (numeric coverage comparison). We’re planning a Test Quality Judge to validate Spring best practices adherence.
7.1. Option A: Full AI-Powered Test Quality Judge
Estimated effort: 8-12 hours
Features:
-
AI-powered analysis of generated tests
-
Validates Spring WebMVC best practices
-
Checks for @WebMvcTest, jsonPath(), AssertJ usage
-
Identifies anti-patterns (@SpringBootTest, manual JSON parsing)
-
Provides detailed feedback on test quality
Implementation:
-
Custom judge using Claude/Gemini for test review
-
Prompt engineering for test quality analysis
-
Integration with coverage judge (two-tier verification)
7.2. Option B: Simple Rule-Based Test Quality Judge
Estimated effort: 2-3 hours
Features:
-
Pattern matching for common anti-patterns
-
Regex-based validation of test conventions
-
Fast, deterministic verification
-
Lower accuracy than AI judge
Implementation:
-
Regular expressions for @WebMvcTest detection
-
Simple AST parsing for assertion validation
-
Boolean pass/fail based on rule violations
Both options will be explored as the project evolves. For now, the deterministic coverage judge provides reliable, measurable verification.
8. Key Takeaways
-
Agent Effectiveness - 71.4% coverage on Spring’s REST service tutorial demonstrates practical capability
-
Model Quality Matters - Same prompt, different adherence to best practices (Claude > Gemini)
-
Setup/Execute Pattern - Two-phase lifecycle enables complex workflows with validation
-
Prompt Engineering - Spring AI PromptTemplate provides modular, testable prompt design
-
Judge Pattern - Deterministic verification ensures measurable outcomes
This is just the beginning. As we integrate more agents into Spring AI Bench, we’ll gain deeper insights into model capabilities and best practices for autonomous development.