Architecture
Spring AI Bench is a comprehensive execution framework for running AI agents in isolated environments with support for benchmarking, customization, and monitoring.
1. System Overview
┌─────────────────────────────────────────────────────────────────┐
│ Spring AI Bench │
├─────────────────┬───────────────────┬──────────────────────────┤
│ bench-core │ bench-agents │ bench-app │
│ │ │ │
│ • Execution │ • Agent Runners │ • CLI Interface │
│ • Sandboxes │ • Integration │ • Report Viewing │
│ • Verification │ • Auto-Config │ • Batch Processing │
│ • Specifications│ • Spring AI │ • Site Generation │
│ │ Agents │ │
└─────────────────┴───────────────────┴──────────────────────────┘
2. Core Architecture
2.1. Execution Framework
The system is built around a Sandbox abstraction that provides isolated execution environments:
Sandbox (interface)
├── LocalSandbox - Process exec implementation
├── DockerSandbox - TestContainers implementation
└── [Future: CloudSandbox - Distributed execution]
Key Components:
-
ExecSpec
- Command specification with timeout, environment variables, MCP config -
ExecResult
- Execution results with exit codes, logs, duration -
TimeoutException
- Timeout handling for long-running processes
2.2. Execution Backends
2.2.1. Local Process Execution (LocalSandbox
)
-
Purpose: Execute commands in local processes within isolated directories
-
Security: Directory isolation only - commands execute with JVM privileges
-
Features:
-
Customizable working directories
-
Environment variable support
-
Timeout handling
-
MCP (Model Context Protocol) integration
-
Automatic cleanup of temporary directories
-
2.2.2. Docker/TestContainers (DockerSandbox
)
-
Purpose: Execute commands in Docker containers for strong isolation
-
Features:
-
Uses TestContainers library (v1.21.0)
-
Long-lived containers with "sleep infinity" pattern
-
Multiple command executions within same container environment
-
Automatic container lifecycle management
-
Working directory:
/work
-
2.3. Customization Framework
ExecSpecCustomizer Pattern allows runtime modification of execution specifications:
-
ExecSpecCustomizer
(interface) - Base customization contract -
ClaudeCliCustomizer
- Specialized for Claude CLI integration-
Automatically injects MCP tools via
--tools
flag -
Transforms:
["claude-cli", "agent.py"]
→["claude-cli", "agent.py", "--tools=brave,filesystem"]
-
2.4. Agent Integration
2.4.1. Agent Implementations
Spring AI Bench currently supports:
-
hello-world: Deterministic mock agent for infrastructure testing
-
hello-world-ai: AI-powered agent via Spring AI Agents integration
-
Claude provider support
-
Gemini provider support
-
JBang launcher pattern
-
2.4.2. Spring AI Agents Integration
The integration with Spring AI Agents follows this pattern:
# spring-ai-bench → JBang → spring-ai-agents → AI provider
jbang /path/to/spring-ai-agents/jbang/launcher.java \
hello-world-agent-ai \
path=hello.txt \
content="Hello World!" \
provider=claude
This ensures benchmark success guarantees good end-user experience by testing the exact CLI interface users would use.
2.5. Benchmarking System
2.5.1. Benchmark Specifications
-
BenchSpec
- Top-level benchmark specification -
BenchCase
- Individual benchmark case with:-
ID, category ("coding", "project-mgmt", "version-upgrade")
-
Repository specification (
RepoSpec
) -
Agent specification (
AgentSpec
) -
Success criteria (
SuccessSpec
) -
Timeout configuration
-
2.5.2. Agent Support
AgentSpec
supports multiple agent types:
-
"hello-world"
- Deterministic mock agent -
"hello-world-ai"
- AI-powered agent via Spring AI Agents -
Configurable models, prompts, generation parameters
2.5.3. Execution Harness
-
BenchHarness
- End-to-end benchmark execution -
AgentRunner
- Agent execution interface -
HelloWorldAgentRunner
- Deterministic implementation -
HelloWorldAIAgentRunner
- AI-powered implementation -
SuccessVerifier
- Validation of benchmark results (temporary implementation - evolving into judge concept in spring-ai-agents)
3. Integration Components
3.1. Spring Cloud Deployer
-
SPI Integration:
spring-cloud-deployer-spi
(v2.9.5) -
Local Implementation:
spring-cloud-deployer-local
(v2.9.5) -
Purpose: Process management and distributed task execution
-
Usage:
LocalTaskLauncher
for process orchestration
4. Key Design Decisions
4.1. Sandbox Abstraction
-
Rationale: Support multiple execution environments (local, Docker, future cloud)
-
Pattern: Interface-based design for extensibility
-
Trade-offs: Abstraction overhead vs. flexibility
4.2. Merged Log Output
-
Design:
ExecResult
combines stdout/stderr intomergedLog
-
Rationale: Optimized for AI analysis - preserves temporal ordering
-
Use Case: LLMs can analyze execution logs in chronological order
5. Module Structure
spring-ai-bench/
├── bench-core/ # Core execution framework
│ ├── exec/ # Execution system (Sandbox, ExecSpec, etc.)
│ ├── spec/ # Benchmark specifications
│ ├── repo/ # Repository & workspace management
│ ├── run/ # Benchmark harness & execution
│ └── io/ # Configuration loading
├── bench-agents/ # Agent integration layer
│ ├── runner/ # Agent runners (Claude, Gemini, HelloWorld)
│ └── integration/ # Spring Boot auto-configuration
├── bench-app/ # Application CLI
├── bench-site/ # Static site generation
└── bench-tracks/ # Benchmark track definitions
└── hello-world/ # Hello world track (current)
6. Dependencies & Technology Stack
7. Development Timeline
September 2024 Implementation:
-
Complete execution framework with sandbox isolation
-
Spring AI Agents integration via JBang launcher
-
Agent implementations (hello-world deterministic and AI-powered)
-
Basic reporting and HTML generation
-
Docker and local sandbox support
8. Testing Strategy
-
Unit Tests: Individual component testing
-
Integration Tests: End-to-end sandbox execution
-
Smoke Tests: Basic functionality validation
-
E2E Tests: Complete benchmark execution flows
9. Future Development Areas
9.1. Cloud Implementation
-
Cloud-based sandbox implementations
-
Auto-scaling execution clusters
-
Distributed benchmark orchestration
-
Cost optimization strategies
9.2. Enhanced Agent Support
-
Additional agent integrations beyond current implementations
-
Agent-specific optimizations and customizations
-
Multi-agent benchmark scenarios
10. Getting Started
10.1. Prerequisites
-
Java 17+
-
Docker (for DockerSandbox)
-
Maven 3.6+
-
GitHub access token (for repository operations)
10.2. Basic Usage
// Local execution
try (var sandbox = LocalSandbox.builder().build()) {
var spec = ExecSpec.of("echo", "Hello World");
var result = sandbox.exec(spec);
System.out.println("Exit code: " + result.exitCode());
System.out.println("Output: " + result.mergedLog());
}
// Docker execution
try (var sandbox = new DockerSandbox("openjdk:17-jdk")) {
var spec = ExecSpec.of("java", "-version");
var result = sandbox.exec(spec);
System.out.println("Java version: " + result.mergedLog());
}
This document reflects the current architecture as of September 2024.