Claude Code Agent

Claude Code is Anthropic’s autonomous coding agent that can understand codebases, write code, and execute commands.

1. Overview

The Claude Code agent integration provides:

Autonomous Development - End-to-end task completion without human intervention
Command Execution - Full shell access with tool usage
Codebase Understanding - Deep comprehension of complex projects
MCP Tool Support - Model Context Protocol integration

2. Prerequisites

2.1. Install Claude CLI

# Install via npm
npm install -g @anthropic-ai/claude-cli

# Verify installation
claude --version

2.2. API Key Setup

# Set your Anthropic API key
export ANTHROPIC_API_KEY=your-anthropic-api-key

# Verify authentication
claude auth status

3. Configuration

3.1. Basic Agent Specification

agent:
  kind: claude-code
  model: claude-3-5-sonnet
  autoApprove: true
  prompt: |
    Fix the failing JUnit tests in this project.
    Run "./mvnw test" until all tests pass.

3.2. Advanced Configuration

agent:
  kind: claude-code
  model: claude-3-5-sonnet
  autoApprove: true
  genParams:
    max_tokens: 4096
    temperature: 0.1
  extras:
    yolo: true          # Skip permission prompts
    max_steps: 10       # Limit number of actions
    tools: ["bash", "editor", "git"]
  prompt: |
    This Java Spring Boot application has a security vulnerability.

    Tasks:
    1. Identify the security issue
    2. Fix the vulnerability
    3. Add appropriate tests
    4. Ensure all existing tests still pass
    5. Document the fix in CHANGELOG.md

4. Features

4.1. Autonomous Task Execution

Claude Code can execute multi-step workflows:

Code Analysis - Understand existing codebases
File Editing - Create, modify, and delete files
Command Execution - Run build tools, tests, and utilities
Git Operations - Commit changes and manage branches

4.2. Tool Integration

Supported tools include:

Bash - Execute shell commands
Editor - Edit files with syntax awareness
Git - Version control operations
Language-specific tools - Maven, Gradle, npm, etc.

4.3. Error Handling

Claude Code automatically handles:

Build Failures - Diagnose and fix compilation errors
Test Failures - Analyze failing tests and implement fixes
Runtime Errors - Debug and resolve runtime issues
Dependency Conflicts - Resolve version conflicts and missing dependencies

5. Integration with Spring AI Bench

5.1. ClaudeCodeAgentRunner

The ClaudeCodeAgentRunner provides seamless integration:

public class ClaudeCodeAgentRunner implements AgentRunner {

    private final ClaudeCodeAgentModel agentModel;
    private final SuccessVerifier verifier;

    @Override
    public AgentResult run(Path workspace, AgentSpec spec, Duration timeout)
            throws Exception {
        // Configure agent for workspace
        ClaudeCodeAgentModel workspaceModel =
            ClaudeCodeAgentModel.createWithWorkspaceSetup(workspace, timeout);

        // Execute agent task
        AgentResponse response = workspaceModel.call(
            new AgentTaskRequest(spec.prompt(), workspace, options)
        );

        // Verify results
        return verifyAndReport(response, workspace, spec);
    }
}

5.2. Workspace-Specific Configuration

The agent automatically configures itself for each workspace:

Working Directory - Set to benchmark workspace
Tool Configuration - Enable appropriate tools for the project
Timeout Management - Respect benchmark time limits
Resource Isolation - Prevent interference between benchmarks

5.3. Logging and Monitoring

Comprehensive logging captures:

[INFO] CLAUDE - Initializing Claude Code agent
[INFO] CLAUDE - Model: claude-3-5-sonnet
[INFO] CLAUDE - Workspace: /tmp/bench-workspace-123
[INFO] CLAUDE - Tools enabled: [bash, editor, git]
[INFO] CLAUDE - Executing task: Fix failing tests
[INFO] CLAUDE - Step 1: Analyzing test failures
[INFO] CLAUDE - Step 2: Identifying root cause
[INFO] CLAUDE - Step 3: Implementing fix
[INFO] CLAUDE - Step 4: Running tests
[INFO] CLAUDE - Task completed successfully

6. Best Practices

6.1. Prompt Engineering

Write clear, specific prompts:

# Good prompt
prompt: |
  This Spring Boot application has failing integration tests.

  Requirements:
  1. Fix the failing tests in UserControllerTest
  2. Ensure all tests pass with "./mvnw test"
  3. Do not modify the test logic, only fix the implementation
  4. Follow Spring Boot best practices

  The application uses:
  - Spring Boot 3.x
  - JPA with H2 database
  - Spring Security

# Poor prompt
prompt: "Fix the tests"

6.2. Security Considerations

API Key Protection - Never commit API keys to repositories
Workspace Isolation - Use isolated workspaces for each benchmark
Tool Restrictions - Limit tool access based on benchmark requirements
Network Controls - Consider network isolation for security-sensitive benchmarks

6.3. Performance Optimization

Model Selection - Use appropriate model for task complexity
Timeout Configuration - Set realistic timeouts for complex tasks
Step Limits - Prevent infinite loops with max step limits
Resource Monitoring - Monitor CPU and memory usage

7. Troubleshooting

7.1. Common Issues

7.1.1. Authentication Errors

# Check API key
echo $ANTHROPIC_API_KEY

# Verify authentication
claude auth status

# Re-authenticate if needed
claude auth login

7.1.2. CLI Version Issues

# Update Claude CLI
npm update -g @anthropic-ai/claude-cli

# Check compatibility
claude --version

7.1.3. Workspace Permissions

# Check workspace permissions
ls -la /tmp/bench-workspace-*

# Fix permissions if needed
chmod -R 755 /tmp/bench-workspace-*

7.2. Debug Mode

Enable detailed debugging:

# Run with debug output
export CLAUDE_DEBUG=true
./mvnw test -Dtest=ClaudeCodeIntegrationTest

7.3. Log Analysis

Check Claude-specific logs:

# Find Claude logs
grep -i claude /tmp/bench-reports/{run-id}/run.log

# Check for errors
grep -i error /tmp/bench-reports/{run-id}/run.log | grep -i claude

8. Next Steps

Gemini Agent - Alternative agent option
Custom Agents - Build your own agent integration
Writing Benchmarks - Create Claude Code-specific benchmarks