Spring AI Agents

Table of Contents

1. What Is an Agent?
2. Why CLI Agents?
3. Key Features
4. Better Benchmarks for Java
- 4.1. The Benchmark Problem
- 4.2. Spring AI Bench
5. Agent Providers
6. Requirements
7. Getting Started
8. Documentation
9. Contributing
10. Resources

Spring AI Agents is the pragmatic integration layer for autonomous agents in Java enterprise development.

GitHub Repository: github.com/spring-ai-community/spring-ai-agents

1. What Is an Agent?

"An agent is AI-powered software that accomplishes a goal. Period." — Dharmesh Shah, HubSpot CTO and Agents.ai co-founder

At the core, every agent is software that pursues a goal. The common pattern is an LLM executing a loop: think → act (via tools) → observe → repeat until the goal is achieved. But building effective agents is far harder than this simple description suggests.

In Spring AI Agents, we model agents around these components:

Goals - Clear objectives that guide agent execution
Tools - Actions the agent can take (call APIs, run commands, read files)
Context - Information the agent needs to make decisions
Judges - Verification that the goal was achieved

1.1. The Paradigm Shift

After hundreds of hours using agentic CLI tools and conversations with engineers at Google, Amazon, and Netflix, a clear pattern emerged: these tools are incredibly effective at their jobs. The real paradigm shift isn’t any single feature—it’s capability moving into the models themselves. As models evolved from completions → function calling → reasoning → planning, and as protocols like MCP (Model Context Protocol—named that for a reason) standardized and enriched tool and context capabilities, the scaffolding we built to compensate for weaker models became unnecessary. We’ve reached a new tipping point with these tools.

Before reasoning models, we built harnesses and scaffolding—complex multi-step systems to coax capabilities from weaker models. We coded workflows step-by-step because models couldn’t plan. Reasoning models now handle what used to require elaborate client-side engineering: planning which steps to take and in what order, capabilities that traditionally belonged to workflow engines in application code.

The shift: from imperative (code every workflow step) to declarative (describe the goal and let the model plan the steps). What remains critical: context engineering and tool design.

"Before the reasoning models emerged, there was all of this work that went into engineering these agentic systems that made a lot of calls to GPT-4… to get reasoning behavior. And then it turns out… we just created reasoning models and you don’t need this complex behavior. In fact, in many ways it makes it worse."

"There are a lot of things that people are building right now that will eventually be washed away by scale."

— Noam Brown, OpenAI Research Lead (Latent Space podcast)

Agentic Search > Semantic Search

"Semantic search is usually faster than agentic search, but less accurate, more difficult to maintain, and less transparent… we suggest starting with agentic search, and only adding semantic search if you need faster results."

— Anthropic, Building Agents with the Claude Agent SDK

"I actually found that lm.txt with good descriptions… just that passed to the code agent, with a simple tool just to grab files, is extremely effective… I actually personally don’t do vector store indexing."

— Lance Martin, LangChain (Latent Space podcast)

This upends the traditional RAG pattern: simple file-based search with agent tools often outperforms complex vector indexing.

Spring AI Agents leans into this direction: trust the model to plan and execute, validate through benchmarking.

1.2. The Journey: From Building to Using

I started wrapping the Claude Code CLI myself. Then I discovered Anthropic had created the Claude Code SDK (now Claude Agent SDK)—a Python wrapper providing a clean interface to their CLI tool. Similar SDKs emerged from Google (Gemini CLI), Amazon (Q Developer), and others.

As Anthropic explains in their blog:

The Claude Agent SDK enables developers to build powerful, flexible agents by giving Claude access to a computer where it can write files, run commands, and iterate on its work.

— Anthropic Engineering Blog

The realization: You can build custom agents with Spring AI’s @Tool annotations and MCP support. Mini-swe-agent proves a simple "LLM in a loop" works. And we will build custom agents for domain-specific needs.

But why reinvent? Building effective agents is hard. You’re solving problems that Anthropic, Google, and OpenAI invest heavily in: context management, error recovery, planning, tool selection, performance optimization. Why not leverage that R&D?

1.3. The Spring AI Agents Approach

The pattern looked familiar: Just like database access before JDBC—many powerful tools doing similar things, but all slightly different. Spring AI solved this for LLM completions with ChatClient, providing portability and a higher-level developer experience.

Spring AI Agents applies the same principle to autonomous agents:

Agent SDK portability layer - Java wrappers for leading agentic CLI tools: Claude Agent SDK, Gemini CLI Agent SDK, Amp CLI SDK, OpenAI Codex CLI SDK, Amazon Q Developer CLI SDK, mini-swe-agent, with planned support for Goose and GitHub Copilot Agent
Familiar Spring AI patterns - AgentClient API following ChatClient design
Advisor pattern - Extend agent behavior with context and judges as advisors, just like ChatClient
Agent sandbox - Isolated execution

Example - Declarative agent execution:

CoverageJudge judge = new CoverageJudge(80.0);  (1)

AgentClientResponse response = agentClient  (2)
    .goal("Increase JaCoCo test coverage to 80%")  (3)
    .workingDirectory(projectRoot)  (4)
    .advisors(JudgeAdvisor.builder().judge(judge).build())  (5)
    .run();  (6)

// Real results: 0% → 71.4% coverage in 6 minutes

1	Judge - Automated verification of coverage target
2	Start with `AgentClient` instance (auto-configured by Spring Boot)
3	Goal - What you want to accomplish (the "what", not the "how")
4	Working directory - Where the agent executes (sandbox isolation)
5	Verification - JudgeAdvisor verifies 80% coverage achieved
6	Execute - Run autonomously until goal achieved

Declarative approach: You describe the goal and provide context. The LLM plans the workflow, decides which tools to use, and adapts when things go wrong. No coding workflows, no predefined steps—just the goal and context.

The code coverage agent increased test coverage from 0% to 71.4% on Spring’s gs-rest-service tutorial. Claude Code followed all Spring WebMVC best practices (@WebMvcTest, jsonPath(), AssertJ) while Gemini achieved the same coverage but used slower patterns (@SpringBootTest). Same coverage, different quality—model choice matters for enterprise standards.

Or run agents directly with JBang - no build required:

jbang agents@springai coverage target_coverage=80

Zero setup - the agent runs on your local codebase, pulls context as needed, and achieves the goal. Once you see it working, tweak the configuration or create your own agents.^[1]

See the Getting Started guide for complete examples.

2. Why CLI Agents?

Spring AI Agents focuses specifically on autonomous CLI agents - agents that execute goals by directly interacting with your computer through command-line interfaces.

CLI agents are uniquely effective because they:

Manage context through the file system - Write intermediate state to files, read when needed, avoiding context window limitations (see Context Engineering)
Execute bash commands - Run builds, tests, searches—anything you can type in a terminal
Iterate autonomously - Keep working until the goal is achieved, no human intervention required

Human-in-the-Loop vs Autonomous: Chatbots like ChatGPT and code completion tools like Copilot excel at exploration and pair programming. Autonomous CLI agents excel at executing well-defined goals end-to-end without human intervention. Different tools for different needs.

The space is evolving. Both paths coexist: use agentic CLI tools (like Claude Agent SDK, Gemini CLI, Amp) for general development tasks, or build custom agents with Spring AI’s @Tool/MCP for specialized needs. Leading companies invest heavily in context engineering, planning strategies, and continuous model improvements—Spring AI Agents lets you leverage that R&D while maintaining flexibility to build custom solutions when appropriate.

Spring AI Agents makes autonomous agents as easy to use in Spring Boot as ChatClient is for conversational AI.

3. Key Features

Zero-Setup Quick Start - Try agents via JBang catalog without cloning or building
ChatClient-style API - Same fluent patterns Spring developers already know
JBang Agent Runner - Primary developer entry point for trying agents locally with LocalSandbox
Multiple agent providers - Claude Code, Gemini CLI, Amp, and SWE Agent support (more to come!)
Fluent API design - Clean, intuitive interface following Spring patterns
Spring Boot ready - Auto-configuration and dependency injection support
Production essentials - Built-in error handling, timeouts, and metadata
Evaluation-first design - Judge API for deterministic and AI-powered verification

4. Better Benchmarks for Java

How do you know if your agent is effective?

The agent ecosystem has a Python bias. Most benchmarks, research, and tooling assume Python workflows. But enterprise software development is multi-language, and Java remains the backbone of mission-critical systems.

4.1. The Benchmark Problem

SWE-bench: Python-centric, curated dataset with inflated scores
SWE-bench-Live: More realistic fresh issues—scores drop significantly
Multi-SWE-bench & SWE-PolyBench (2025): Added Java, revealed Python bias—Java agents score lower not because they’re worse, but because benchmarks don’t reflect Java workflows

For a detailed analysis of these benchmarking issues, see the Spring AI Bench documentation.

4.2. Spring AI Bench

We’re building Spring AI Bench—an open-source benchmark suite for Java that evaluates agents on goal-directed, enterprise workflows. Following Stanford’s BetterBench principles for reproducibility and contamination resistance.

Spring AI Bench and Spring AI Agents work hand-in-hand: Spring AI Agents provides the integration layer, making it easy to run different agents (Claude, Gemini, Amp, custom solutions). Spring AI Bench provides the measurement framework, evaluating agents across multiple dimensions.

Philosophy: Let the best agent per use case win. Benchmark ALL approaches—annotation-based tools, CLI agents, custom solutions—and measure what actually matters.

As Dharmesh Shah frames it on the Latent Space podcast, evaluating agents is like hiring for a job: effectiveness depends on your specific constraints and goals. Spring AI Bench measures across multiple axes:

Objective metrics: * Success rate - Can it achieve the goal? * Cost - Token usage, API costs * Speed - Execution time, latency * Reliability - Consistency across runs

Qualitative factors: * Quality vs. cost tradeoff - Is the premium model worth it for this task? * Time-to-value - How quickly does it deliver results? * Workflow fit - Does it integrate cleanly into your process?

Different scenarios optimize for different combinations:

Fastest at least cost - Routine tasks, CI/CD automation
Highest quality regardless of cost - Critical migrations, security audits
Balanced tradeoffs - Most development tasks

We’ll learn which agent wins for which scenario. That’s the point of benchmarking.

5. Agent Providers

Spring AI Agents provides Java integration for leading autonomous agentic CLI tools:

Provider	Status	Description
Claude Agent SDK	✅ Available	Agent SDK for Anthropic’s autonomous coding agent. Renamed from Claude Code SDK (Sept 2025) to reflect broader applications beyond coding.
Gemini CLI Agent SDK	✅ Available	Agent SDK for Google’s command-line coding agent with multimodal capabilities.
Amp CLI	✅ Available	Agent SDK for Sourcegraph’s autonomous coding agent. Full-featured CLI tool for code generation, refactoring, and debugging.
mini-swe-agent	✅ Available	Agent SDK for lightweight 100-line autonomous agent for benchmarking. Simpler alternative to the original SWE-agent (thousands of lines of Python).
Goose	🚧 Planned	Agent SDK for Block’s open-source extensible AI agent. Runs locally, automates engineering tasks from start to finish, builds entire projects autonomously.
GitHub Copilot Agent	🚧 Planned	Agent SDK for GitHub’s autonomous coding agent. Assign issues to Copilot and it creates PRs autonomously in a GitHub Actions environment.
Amazon Q Developer	✅ Available	Agent SDK for AWS’s autonomous /dev agent. Multi-file implementation with natural language, autonomous planning and execution across codebases.
OpenAI Codex	✅ Available	Agent SDK for OpenAI’s GPT-5-Codex optimized for agentic coding. Handles both quick sessions and long autonomous tasks.

Provider

Status

Description

Claude Agent SDK

✅ Available

Agent SDK for Anthropic’s autonomous coding agent. Renamed from Claude Code SDK (Sept 2025) to reflect broader applications beyond coding.

Gemini CLI Agent SDK

✅ Available

Agent SDK for Google’s command-line coding agent with multimodal capabilities.

Amp CLI

✅ Available

Agent SDK for Sourcegraph’s autonomous coding agent. Full-featured CLI tool for code generation, refactoring, and debugging.

mini-swe-agent

✅ Available

Agent SDK for lightweight 100-line autonomous agent for benchmarking. Simpler alternative to the original SWE-agent (thousands of lines of Python).

Goose

🚧 Planned

Agent SDK for Block’s open-source extensible AI agent. Runs locally, automates engineering tasks from start to finish, builds entire projects autonomously.

GitHub Copilot Agent

🚧 Planned

Agent SDK for GitHub’s autonomous coding agent. Assign issues to Copilot and it creates PRs autonomously in a GitHub Actions environment.

Amazon Q Developer

✅ Available

Agent SDK for AWS’s autonomous /dev agent. Multi-file implementation with natural language, autonomous planning and execution across codebases.

OpenAI Codex

✅ Available

Agent SDK for OpenAI’s GPT-5-Codex optimized for agentic coding. Handles both quick sessions and long autonomous tasks.

6. Requirements

Java 17 or higher
Maven 3.6.3 or higher
Agent CLI tools installed (Claude, Gemini, Amp, etc.)
Valid API keys for your chosen providers

7. Getting Started

Get started using Spring AI Agents by following our Getting Started guide.

8. Documentation

JBang Agent Runner - Primary developer entry point for trying agents locally
AgentClient API - Learn the core API for running autonomous tasks
AgentClient vs ChatClient - See how AgentClient follows ChatClient patterns
Claude Agent SDK
Gemini CLI Agent SDK
Amp Agent SDK
Codex Agent SDK
Amazon Q Developer Agent SDK
Sample Agents - Real-world agent examples and patterns

9. Contributing

We welcome contributions to Spring AI Agents! Please see our Contribution Guidelines for more information on how to get involved.

10. Resources

Spring AI Agents
- GitHub: github.com/spring-ai-community/spring-ai-agents
- Documentation: This site
Spring AI Bench
- GitHub: github.com/spring-ai-community/spring-ai-bench
- Documentation: spring-ai-bench documentation

1. Code coverage agent coming soon to the JBang catalog