AgentSpec API
The AgentSpec
class defines the configuration and parameters for agent execution in benchmarks.
1. Overview
AgentSpec
is the central configuration object that specifies:
-
Agent Type - Which agent implementation to use
-
Model Configuration - Model version and parameters
-
Task Definition - Natural language prompt describing the task
-
Execution Settings - Timeout, approval settings, and metadata
2. Builder Pattern
Spring AI Bench provides a fluent builder API for creating agent specifications:
AgentSpec spec = AgentSpec.builder()
.kind("claude-code")
.model("claude-3-5-sonnet")
.prompt("Fix the failing JUnit tests in this project")
.autoApprove(true)
.build();
3. Constructor Options
4. Parameters
4.1. kind (Required)
The agent implementation to use:
.kind("claude-code") // Claude Code CLI agent
.kind("gemini") // Google Gemini CLI agent
.kind("hello-world") // Mock agent for testing
4.2. model (Optional)
The specific model version:
// Claude models
.model("claude-3-5-sonnet")
.model("claude-3-haiku")
.model("claude-3-opus")
// Gemini models
.model("gemini-2.0-flash-exp")
.model("gemini-1.5-pro")
.model("gemini-1.5-flash")
4.3. prompt (Required)
Natural language task description:
.prompt("""
This Spring Boot application has a security vulnerability in the
UserController class. The endpoint allows unauthorized access.
Tasks:
1. Identify the security issue
2. Fix the vulnerability using Spring Security
3. Add appropriate tests to verify the fix
4. Ensure all existing tests still pass
""")
4.4. autoApprove (Optional)
Whether to bypass human confirmation prompts:
.autoApprove(true) // Skip confirmation (recommended for benchmarks)
.autoApprove(false) // Require human approval (interactive mode)
5. YAML Configuration
AgentSpec can also be defined in YAML format:
agent:
kind: claude-code
model: claude-3-5-sonnet
autoApprove: true
prompt: |
Fix the failing JUnit tests in this Spring Boot application.
Requirements:
- All tests must pass after fixes
- Do not modify test logic
- Follow Spring Boot best practices
- Add logging where appropriate
genParams:
temperature: 0.1
max_tokens: 4096
role: senior-developer
6. Agent-Specific Configurations
6.1. Claude Code
AgentSpec.builder()
.kind("claude-code")
.model("claude-3-5-sonnet")
.genParams(Map.of(
"max_tokens", 4096,
"temperature", 0.1
))
.autoApprove(true)
.prompt("Comprehensive task description")
.build();
7. Validation
AgentSpec includes built-in validation:
// This will throw IllegalArgumentException
AgentSpec.builder()
.kind("") // Empty kind not allowed
.build();
// This will throw IllegalArgumentException
AgentSpec.builder()
.kind("claude-code")
.prompt("") // Empty prompt not allowed
.build();
8. Integration Examples
8.1. With BenchCase
BenchCase benchCase = BenchCase.builder()
.id("user-auth-security-fix")
.repo(RepoSpec.builder()
.owner("example-org")
.name("spring-boot-app")
.ref("v1.0.0")
.build())
.agent(AgentSpec.builder()
.kind("claude-code")
.model("claude-3-5-sonnet")
.prompt("Fix security vulnerability in user authentication")
.autoApprove(true)
.build())
.success(SuccessSpec.builder()
.cmd("mvn test")
.expectExitCode(0)
.build())
.build();
8.2. With AgentRunner
AgentRunner runner = new ClaudeCodeAgentRunner(agentModel, verifier);
AgentSpec spec = AgentSpec.builder()
.kind("claude-code")
.model("claude-3-5-sonnet")
.prompt("Implement user registration feature")
.autoApprove(true)
.build();
AgentResult result = runner.run(workspace, spec, Duration.ofMinutes(10));
9. Best Practices
9.1. Prompt Design
-
Be Specific - Clearly define requirements and constraints
-
Include Context - Provide relevant background information
-
Set Expectations - Specify success criteria and testing requirements
-
Use Examples - Include code examples or expected outputs when helpful
10. Error Handling
Common validation errors and solutions:
try {
AgentSpec spec = AgentSpec.builder()
.kind("invalid-agent")
.build();
} catch (IllegalArgumentException e) {
// Handle unsupported agent kind
log.error("Unsupported agent kind: {}", e.getMessage());
}
try {
AgentSpec spec = AgentSpec.builder()
.kind("claude-code")
.prompt("")
.build();
} catch (IllegalArgumentException e) {
// Handle empty prompt
log.error("Prompt cannot be empty: {}", e.getMessage());
}
11. Next Steps
-
AgentRunner API - Learn about agent execution
-
Verification System - Understand result verification
-
Writing Benchmarks - Create custom benchmarks with AgentSpec