AgentSpec API

The AgentSpec class defines the configuration and parameters for agent execution in benchmarks.

1. Overview

AgentSpec is the central configuration object that specifies:

  • Agent Type - Which agent implementation to use

  • Model Configuration - Model version and parameters

  • Task Definition - Natural language prompt describing the task

  • Execution Settings - Timeout, approval settings, and metadata

2. Builder Pattern

Spring AI Bench provides a fluent builder API for creating agent specifications:

AgentSpec spec = AgentSpec.builder()
    .kind("claude-code")
    .model("claude-3-5-sonnet")
    .prompt("Fix the failing JUnit tests in this project")
    .autoApprove(true)
    .build();

3. Constructor Options

3.1. Record Constructor

AgentSpec spec = new AgentSpec(
    "claude-code",           // kind
    "claude-3-5-sonnet",     // model
    true,                    // autoApprove
    "Fix failing tests",     // prompt
    Map.of("temperature", 0.1), // genParams
    "developer"              // role
);

3.2. Builder Methods

AgentSpec spec = AgentSpec.builder()
    .kind("gemini")
    .model("gemini-2.0-flash-exp")
    .autoApprove(false)
    .prompt("Implement user authentication feature")
    .genParams(Map.of(
        "temperature", 0.2,
        "max_tokens", 2048
    ))
    .role("senior-developer")
    .build();

4. Parameters

4.1. kind (Required)

The agent implementation to use:

.kind("claude-code")    // Claude Code CLI agent
.kind("gemini")         // Google Gemini CLI agent
.kind("hello-world")    // Mock agent for testing

4.2. model (Optional)

The specific model version:

// Claude models
.model("claude-3-5-sonnet")
.model("claude-3-haiku")
.model("claude-3-opus")

// Gemini models
.model("gemini-2.0-flash-exp")
.model("gemini-1.5-pro")
.model("gemini-1.5-flash")

4.3. prompt (Required)

Natural language task description:

.prompt("""
    This Spring Boot application has a security vulnerability in the
    UserController class. The endpoint allows unauthorized access.

    Tasks:
    1. Identify the security issue
    2. Fix the vulnerability using Spring Security
    3. Add appropriate tests to verify the fix
    4. Ensure all existing tests still pass
    """)

4.4. autoApprove (Optional)

Whether to bypass human confirmation prompts:

.autoApprove(true)   // Skip confirmation (recommended for benchmarks)
.autoApprove(false)  // Require human approval (interactive mode)

4.5. genParams (Optional)

Model-specific generation parameters:

.genParams(Map.of(
    "temperature", 0.1,      // Lower for more deterministic output
    "max_tokens", 4096,      // Maximum response length
    "top_p", 0.9,           // Nucleus sampling parameter
    "frequency_penalty", 0.0 // Repetition penalty
))

4.6. role (Optional)

Agent role or persona:

.role("senior-developer")     // Senior developer persona
.role("security-expert")      // Security-focused approach
.role("test-engineer")        // Testing-focused approach
.role("architect")            // Architecture-focused approach

5. YAML Configuration

AgentSpec can also be defined in YAML format:

agent:
  kind: claude-code
  model: claude-3-5-sonnet
  autoApprove: true
  prompt: |
    Fix the failing JUnit tests in this Spring Boot application.

    Requirements:
    - All tests must pass after fixes
    - Do not modify test logic
    - Follow Spring Boot best practices
    - Add logging where appropriate

  genParams:
    temperature: 0.1
    max_tokens: 4096
  role: senior-developer

6. Agent-Specific Configurations

6.1. Claude Code

AgentSpec.builder()
    .kind("claude-code")
    .model("claude-3-5-sonnet")
    .genParams(Map.of(
        "max_tokens", 4096,
        "temperature", 0.1
    ))
    .autoApprove(true)
    .prompt("Comprehensive task description")
    .build();

6.2. Gemini

AgentSpec.builder()
    .kind("gemini")
    .model("gemini-2.0-flash-exp")
    .genParams(Map.of(
        "temperature", 0.2,
        "top_p", 0.8,
        "top_k", 40
    ))
    .autoApprove(true)
    .prompt("Detailed task specification")
    .build();

6.3. HelloWorld (Testing)

AgentSpec.builder()
    .kind("hello-world")
    .prompt("Create a file named hello.txt with contents: Hello World!")
    .autoApprove(true)
    .build();

7. Validation

AgentSpec includes built-in validation:

// This will throw IllegalArgumentException
AgentSpec.builder()
    .kind("")  // Empty kind not allowed
    .build();

// This will throw IllegalArgumentException
AgentSpec.builder()
    .kind("claude-code")
    .prompt("")  // Empty prompt not allowed
    .build();

8. Integration Examples

8.1. With BenchCase

BenchCase benchCase = BenchCase.builder()
    .id("user-auth-security-fix")
    .repo(RepoSpec.builder()
        .owner("example-org")
        .name("spring-boot-app")
        .ref("v1.0.0")
        .build())
    .agent(AgentSpec.builder()
        .kind("claude-code")
        .model("claude-3-5-sonnet")
        .prompt("Fix security vulnerability in user authentication")
        .autoApprove(true)
        .build())
    .success(SuccessSpec.builder()
        .cmd("mvn test")
        .expectExitCode(0)
        .build())
    .build();

8.2. With AgentRunner

AgentRunner runner = new ClaudeCodeAgentRunner(agentModel, verifier);

AgentSpec spec = AgentSpec.builder()
    .kind("claude-code")
    .model("claude-3-5-sonnet")
    .prompt("Implement user registration feature")
    .autoApprove(true)
    .build();

AgentResult result = runner.run(workspace, spec, Duration.ofMinutes(10));

9. Best Practices

9.1. Prompt Design

  • Be Specific - Clearly define requirements and constraints

  • Include Context - Provide relevant background information

  • Set Expectations - Specify success criteria and testing requirements

  • Use Examples - Include code examples or expected outputs when helpful

9.2. Model Selection

  • Task Complexity - Use more capable models for complex tasks

  • Speed vs Quality - Balance response time with output quality

  • Cost Considerations - Consider API costs for large benchmark suites

9.3. Parameter Tuning

  • Temperature - Lower (0.0-0.2) for deterministic tasks, higher (0.7-1.0) for creative tasks

  • Max Tokens - Set appropriate limits based on expected response length

  • Auto-Approve - Enable for automated benchmarks, disable for interactive development

10. Error Handling

Common validation errors and solutions:

try {
    AgentSpec spec = AgentSpec.builder()
        .kind("invalid-agent")
        .build();
} catch (IllegalArgumentException e) {
    // Handle unsupported agent kind
    log.error("Unsupported agent kind: {}", e.getMessage());
}

try {
    AgentSpec spec = AgentSpec.builder()
        .kind("claude-code")
        .prompt("")
        .build();
} catch (IllegalArgumentException e) {
    // Handle empty prompt
    log.error("Prompt cannot be empty: {}", e.getMessage());
}

11. Next Steps