Claude Code Agent
Claude Code is Anthropic’s autonomous coding agent that can understand codebases, write code, and execute commands.
1. Overview
The Claude Code agent integration provides:
-
Autonomous Development - End-to-end task completion without human intervention
-
Command Execution - Full shell access with tool usage
-
Codebase Understanding - Deep comprehension of complex projects
-
MCP Tool Support - Model Context Protocol integration
3. Configuration
3.1. Basic Agent Specification
agent:
kind: claude-code
model: claude-3-5-sonnet
autoApprove: true
prompt: |
Fix the failing JUnit tests in this project.
Run "./mvnw test" until all tests pass.
3.2. Advanced Configuration
agent:
kind: claude-code
model: claude-3-5-sonnet
autoApprove: true
genParams:
max_tokens: 4096
temperature: 0.1
extras:
yolo: true # Skip permission prompts
max_steps: 10 # Limit number of actions
tools: ["bash", "editor", "git"]
prompt: |
This Java Spring Boot application has a security vulnerability.
Tasks:
1. Identify the security issue
2. Fix the vulnerability
3. Add appropriate tests
4. Ensure all existing tests still pass
5. Document the fix in CHANGELOG.md
4. Features
4.1. Autonomous Task Execution
Claude Code can execute multi-step workflows:
-
Code Analysis - Understand existing codebases
-
File Editing - Create, modify, and delete files
-
Command Execution - Run build tools, tests, and utilities
-
Git Operations - Commit changes and manage branches
5. Integration with Spring AI Bench
5.1. ClaudeCodeAgentRunner
The ClaudeCodeAgentRunner
provides seamless integration:
public class ClaudeCodeAgentRunner implements AgentRunner {
private final ClaudeCodeAgentModel agentModel;
private final SuccessVerifier verifier;
@Override
public AgentResult run(Path workspace, AgentSpec spec, Duration timeout)
throws Exception {
// Configure agent for workspace
ClaudeCodeAgentModel workspaceModel =
ClaudeCodeAgentModel.createWithWorkspaceSetup(workspace, timeout);
// Execute agent task
AgentResponse response = workspaceModel.call(
new AgentTaskRequest(spec.prompt(), workspace, options)
);
// Verify results
return verifyAndReport(response, workspace, spec);
}
}
5.2. Workspace-Specific Configuration
The agent automatically configures itself for each workspace:
-
Working Directory - Set to benchmark workspace
-
Tool Configuration - Enable appropriate tools for the project
-
Timeout Management - Respect benchmark time limits
-
Resource Isolation - Prevent interference between benchmarks
5.3. Logging and Monitoring
Comprehensive logging captures:
[INFO] CLAUDE - Initializing Claude Code agent
[INFO] CLAUDE - Model: claude-3-5-sonnet
[INFO] CLAUDE - Workspace: /tmp/bench-workspace-123
[INFO] CLAUDE - Tools enabled: [bash, editor, git]
[INFO] CLAUDE - Executing task: Fix failing tests
[INFO] CLAUDE - Step 1: Analyzing test failures
[INFO] CLAUDE - Step 2: Identifying root cause
[INFO] CLAUDE - Step 3: Implementing fix
[INFO] CLAUDE - Step 4: Running tests
[INFO] CLAUDE - Task completed successfully
6. Best Practices
6.1. Prompt Engineering
Write clear, specific prompts:
# Good prompt
prompt: |
This Spring Boot application has failing integration tests.
Requirements:
1. Fix the failing tests in UserControllerTest
2. Ensure all tests pass with "./mvnw test"
3. Do not modify the test logic, only fix the implementation
4. Follow Spring Boot best practices
The application uses:
- Spring Boot 3.x
- JPA with H2 database
- Spring Security
# Poor prompt
prompt: "Fix the tests"
6.2. Security Considerations
-
API Key Protection - Never commit API keys to repositories
-
Workspace Isolation - Use isolated workspaces for each benchmark
-
Tool Restrictions - Limit tool access based on benchmark requirements
-
Network Controls - Consider network isolation for security-sensitive benchmarks
7. Troubleshooting
7.1. Common Issues
7.1.1. Authentication Errors
# Check API key
echo $ANTHROPIC_API_KEY
# Verify authentication
claude auth status
# Re-authenticate if needed
claude auth login
8. Next Steps
-
Gemini Agent - Alternative agent option
-
Custom Agents - Build your own agent integration
-
Writing Benchmarks - Create Claude Code-specific benchmarks