JudgeAdvisor: Integrating Judges with AgentClient
JudgeAdvisor
is the bridge between the Judge API and AgentClient
. It integrates automated evaluation into the agent execution pipeline, allowing you to verify agent success programmatically.
1. Overview
JudgeAdvisor
is an implementation of the AgentAdvisor
interface that executes judges after agent task completion. It intercepts the agent response, evaluates it using the configured judge, and attaches the judgment to the response.
1. Agent executes task
↓
2. JudgeAdvisor intercepts response
↓
3. Judge evaluates result
↓
4. Judgment attached to response
↓
5. Your code checks judgment.pass()
2. Basic Usage
2.1. Single Judge
The simplest pattern uses one judge:
import org.springaicommunity.agents.advisors.judge.JudgeAdvisor;
import org.springaicommunity.agents.judge.fs.FileExistsJudge;
import org.springaicommunity.agents.judge.result.Judgment;
// Create a judge
Judge fileJudge = new FileExistsJudge("output.txt");
// Attach to agent execution
AgentClientResponse response = agentClientBuilder
.goal("Create output.txt with system metrics")
.workingDirectory(Path.of("/tmp/reports"))
.advisors(JudgeAdvisor.builder()
.judge(fileJudge)
.build())
.call();
// Check judgment
Judgment judgment = response.getJudgment();
if (judgment.pass()) {
System.out.println("✓ Success: " + judgment.reasoning());
} else {
System.out.println("✗ Failed: " + judgment.reasoning());
}
2.2. Multiple Judges
You can attach multiple JudgeAdvisor
instances for comprehensive evaluation:
AgentClientResponse response = agentClientBuilder
.goal("Build project and generate documentation")
.workingDirectory(projectRoot)
.advisors(
// Judge 1: Verify build
JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build(),
// Judge 2: Verify documentation
JudgeAdvisor.builder()
.judge(new FileExistsJudge("docs/README.md"))
.build(),
// Judge 3: Verify quality
JudgeAdvisor.builder()
.judge(new CorrectnessJudge(chatClient))
.build()
)
.call();
// Get the final judgment (last advisor)
Judgment judgment = response.getJudgment();
boolean allPassed = judgment.pass();
When using multiple For ensemble evaluation where you want to aggregate multiple judgments into a single verdict, use Jury instead. |
3. JudgeAdvisor Builder
The JudgeAdvisor.Builder
provides a fluent API for configuration:
JudgeAdvisor advisor = JudgeAdvisor.builder()
.judge(myJudge) // Required
.order(100) // Optional: execution order
.name("build-verification") // Optional: advisor name
.build();
3.1. Configuration Options
Method | Purpose | Default |
---|---|---|
|
Required. The judge to execute |
None (required) |
|
Execution order relative to other advisors |
|
|
Human-readable advisor name |
|
3.2. Execution Order
JudgeAdvisor executes after the agent task completes. The order()
value controls when it runs relative to other advisors:
import org.springframework.core.Ordered;
// Run early (higher priority)
JudgeAdvisor earlyJudge = JudgeAdvisor.builder()
.judge(new FileExistsJudge("config.yml"))
.order(Ordered.HIGHEST_PRECEDENCE + 100)
.build();
// Run late (lower priority)
JudgeAdvisor lateJudge = JudgeAdvisor.builder()
.judge(new CorrectnessJudge(chatClient))
.order(Ordered.LOWEST_PRECEDENCE - 100)
.build();
Default order: Ordered.LOWEST_PRECEDENCE - 100
(runs near the end of advisor chain)
4. Production Patterns
4.1. Pattern 1: Fail Fast on Critical Checks
Stop execution if critical checks fail:
@Service
public class DeploymentService {
private final AgentClient.Builder agentClientBuilder;
public void deploy(Path projectRoot) {
// Step 1: Fix and build
AgentClientResponse buildResponse = agentClientBuilder
.goal("Fix any failing tests and run 'mvn clean install'")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
if (!buildResponse.getJudgment().pass()) {
throw new DeploymentException(
"Build failed: " + buildResponse.getJudgment().reasoning()
);
}
// Step 2: Security scan
AgentClientResponse securityResponse = agentClientBuilder
.goal("Run security scan and fix critical vulnerabilities")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(new SecurityScanJudge())
.build())
.call();
if (!securityResponse.getJudgment().pass()) {
throw new DeploymentException(
"Security scan failed: " + securityResponse.getJudgment().reasoning()
);
}
// Both checks passed - safe to deploy
performDeployment(projectRoot);
}
}
4.2. Pattern 2: Conditional Logic Based on Judgment
Make decisions based on evaluation results:
AgentClientResponse response = agentClientBuilder
.goal("Refactor UserService for better maintainability")
.workingDirectory(projectRoot)
.advisors(
JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build(),
JudgeAdvisor.builder()
.judge(new CodeQualityJudge(chatClient, 0.8))
.build()
)
.call();
Judgment judgment = response.getJudgment();
if (judgment.pass()) {
// Quality threshold met - commit changes
gitCommit("Refactor UserService");
createPullRequest();
} else if (judgment.score() instanceof NumericalScore numerical) {
double quality = numerical.normalized();
if (quality >= 0.6) {
// Partial success - flag for review
createDraftPullRequest("Needs review: quality score " + quality);
} else {
// Poor quality - retry
logger.warn("Refactoring quality too low: {}. Retrying...", quality);
retryRefactoring();
}
}
4.3. Pattern 3: Detailed Logging and Metrics
Track judge results for observability:
@Service
public class ObservableAgentService {
private final AgentClient.Builder agentClientBuilder;
private final MeterRegistry meterRegistry;
public void executeTask(String goal, Path workspace) {
long startTime = System.currentTimeMillis();
AgentClientResponse response = agentClientBuilder
.goal(goal)
.workingDirectory(workspace)
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
Judgment judgment = response.getJudgment();
// Record metrics
meterRegistry.counter("agent.executions.total").increment();
if (judgment.pass()) {
meterRegistry.counter("agent.executions.success").increment();
} else {
meterRegistry.counter("agent.executions.failed").increment();
}
long duration = System.currentTimeMillis() - startTime;
meterRegistry.timer("agent.execution.duration").record(duration, TimeUnit.MILLISECONDS);
// Detailed logging
logger.info("Agent task completed: goal={}, status={}, duration={}ms",
goal,
judgment.status(),
duration
);
if (judgment.pass()) {
logger.debug("Success reasoning: {}", judgment.reasoning());
} else {
logger.error("Failure reasoning: {}", judgment.reasoning());
// Log individual checks
judgment.checks().forEach(check -> {
logger.error("Check '{}' failed: {}", check.name(), check.message());
});
}
}
}
4.4. Pattern 4: Retry with Judgment Feedback
Use judgment reasoning to improve retry attempts:
public class RetryableAgentService {
private final AgentClient.Builder agentClientBuilder;
private static final int MAX_RETRIES = 3;
public AgentClientResponse executeWithRetry(String goal, Path workspace) {
AgentClientResponse response = null;
Judgment judgment = null;
for (int attempt = 1; attempt <= MAX_RETRIES; attempt++) {
String enhancedGoal = goal;
// Enhance goal with previous failure reasoning
if (judgment != null && !judgment.pass()) {
enhancedGoal = goal + "\n\n" +
"Previous attempt failed:\n" +
judgment.reasoning() + "\n\n" +
"Please address these issues in this attempt.";
}
response = agentClientBuilder
.goal(enhancedGoal)
.workingDirectory(workspace)
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
judgment = response.getJudgment();
if (judgment.pass()) {
logger.info("Task succeeded on attempt {}", attempt);
return response;
}
logger.warn("Attempt {} failed: {}", attempt, judgment.reasoning());
}
throw new MaxRetriesExceededException(
"Task failed after " + MAX_RETRIES + " attempts. Last error: " + judgment.reasoning()
);
}
}
5. Spring Boot Integration
Define judges and advisors as Spring beans:
@Configuration
public class JudgeConfiguration {
@Bean
public JudgeAdvisor buildVerificationAdvisor() {
return JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.name("build-verification")
.build();
}
@Bean
public JudgeAdvisor fileVerificationAdvisor() {
return JudgeAdvisor.builder()
.judge(new FileExistsJudge("output.txt"))
.name("file-verification")
.build();
}
@Bean
public JudgeAdvisor correctnessAdvisor(ChatClient.Builder chatClientBuilder) {
return JudgeAdvisor.builder()
.judge(new CorrectnessJudge(chatClientBuilder.build()))
.name("correctness-check")
.build();
}
}
// Inject and use
@Service
public class MyService {
private final AgentClient.Builder agentClientBuilder;
private final JudgeAdvisor buildVerificationAdvisor;
private final JudgeAdvisor correctnessAdvisor;
public MyService(
AgentClient.Builder agentClientBuilder,
JudgeAdvisor buildVerificationAdvisor,
JudgeAdvisor correctnessAdvisor) {
this.agentClientBuilder = agentClientBuilder;
this.buildVerificationAdvisor = buildVerificationAdvisor;
this.correctnessAdvisor = correctnessAdvisor;
}
public void buildAndVerify(Path projectRoot) {
agentClientBuilder
.goal("Build and test the application")
.workingDirectory(projectRoot)
.advisors(buildVerificationAdvisor, correctnessAdvisor)
.call();
}
}
6. Accessing Judgment Results
6.1. Via AgentClientResponse
The primary way to access judgment results:
AgentClientResponse response = agentClientBuilder
.goal("Create a REST API")
.advisors(JudgeAdvisor.builder().judge(myJudge).build())
.call();
// Get judgment
Judgment judgment = response.getJudgment();
// Check status
if (judgment.pass()) {
System.out.println("✓ " + judgment.reasoning());
}
// Access score
Score score = judgment.score();
if (score instanceof NumericalScore numerical) {
System.out.println("Quality: " + numerical.normalized());
}
// Examine checks
judgment.checks().forEach(check -> {
System.out.println(check.name() + ": " + check.passed());
});
6.2. Convenience Methods
AgentClientResponse
provides convenience methods:
AgentClientResponse response = agentClientBuilder
.goal("Build project")
.advisors(JudgeAdvisor.builder().judge(myJudge).build())
.call();
// Convenience method
if (response.isJudgmentPassed()) {
deploy();
}
// Equivalent to:
if (response.getJudgment() != null && response.getJudgment().pass()) {
deploy();
}
6.3. No Judgment Case
If no JudgeAdvisor
is used, getJudgment()
returns null
:
AgentClientResponse response = agentClientBuilder
.goal("Create a file")
.call(); // No judge
Judgment judgment = response.getJudgment(); // null
// Safe checking
if (response.getJudgment() != null && response.getJudgment().pass()) {
// Handle success
}
7. Error Handling
7.1. Judge Execution Errors
If a judge throws an exception during evaluation, the judgment has ERROR
status:
AgentClientResponse response = agentClientBuilder
.goal("Complex task")
.advisors(JudgeAdvisor.builder()
.judge(myJudge)
.build())
.call();
Judgment judgment = response.getJudgment();
switch (judgment.status()) {
case PASS -> {
deploy();
}
case FAIL -> {
logger.error("Task failed: {}", judgment.reasoning());
rollback();
}
case ERROR -> {
logger.error("Judge error: {}", judgment.error());
alertOps("Judge execution failed");
}
case ABSTAIN -> {
logger.warn("Judge abstained: {}", judgment.reasoning());
requestManualReview();
}
}
7.2. Handling Missing Context
Judges may abstain if required context is missing:
// Judge expects agent output
Judge correctnessJudge = new CorrectnessJudge(chatClient);
AgentClientResponse response = agentClientBuilder
.goal("Some task")
.advisors(JudgeAdvisor.builder()
.judge(correctnessJudge)
.build())
.call();
Judgment judgment = response.getJudgment();
if (judgment.status() == JudgmentStatus.ABSTAIN) {
// Judge couldn't evaluate - missing output or context
logger.warn("Judge abstained: {}", judgment.reasoning());
}
8. JudgeAdvisor vs Jury
Understanding when to use each:
Aspect | JudgeAdvisor | Jury |
---|---|---|
Use Case |
Single judge or independent judges |
Ensemble evaluation with aggregation |
Result |
Individual |
Aggregated |
Aggregation |
No aggregation - each judge independent |
Multiple voting strategies (majority, weighted, etc.) |
Composition |
Multiple advisors execute sequentially |
Single jury executes judges in parallel |
Integration |
|
|
Example comparison:
// Multiple JudgeAdvisors - independent judgments
AgentClientResponse response1 = agentClientBuilder
.goal("Task")
.advisors(
JudgeAdvisor.builder().judge(judgeA).build(),
JudgeAdvisor.builder().judge(judgeB).build(),
JudgeAdvisor.builder().judge(judgeC).build()
)
.call();
// Each judge runs, but only last judgment accessible via response.getJudgment()
// Jury - aggregated verdict
Jury jury = Juries.builder()
.addJudge("A", judgeA)
.addJudge("B", judgeB)
.addJudge("C", judgeC)
.votingStrategy(VotingStrategies.majority())
.build();
AgentClientResponse response2 = agentClientBuilder
.goal("Task")
.advisors(JudgeAdvisor.builder().judge(jury).build())
.call();
// Get aggregated verdict
Judgment aggregated = response2.getJudgment();
// Access jury-specific result
if (jury.getLastVerdict() != null) {
Verdict verdict = jury.getLastVerdict();
verdict.individual().forEach(System.out::println); // All judgments
}
See Jury Pattern for ensemble evaluation details.
9. Best Practices
9.1. 1. Always Use Judges in Production
// ❌ No verification - dangerous
agentClientBuilder
.goal("Deploy to production")
.call();
// ✅ Verified deployment - safe
agentClientBuilder
.goal("Deploy to production")
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
9.2. 2. Choose the Right Judge Type
// Fast deterministic check for build
JudgeAdvisor buildCheck = JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build();
// Expensive LLM check for correctness
JudgeAdvisor correctnessCheck = JudgeAdvisor.builder()
.judge(new CorrectnessJudge(chatClient))
.build();
// Use deterministic first, then LLM if needed
agentClientBuilder
.goal("Implement feature")
.advisors(buildCheck) // Fast fail if build breaks
.call();
9.3. 3. Log Judgment Details
Judgment judgment = response.getJudgment();
logger.info("Judgment status: {}", judgment.status());
logger.info("Reasoning: {}", judgment.reasoning());
logger.info("Score: {}", judgment.score());
if (!judgment.pass()) {
judgment.checks().forEach(check -> {
logger.error("Failed check: {}", check.name());
logger.error(" Message: {}", check.message());
});
}
9.4. 4. Use Appropriate Error Handling
try {
AgentClientResponse response = agentClientBuilder
.goal("Critical task")
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
Judgment judgment = response.getJudgment();
if (judgment.status() == JudgmentStatus.ERROR) {
throw new JudgeExecutionException(
"Judge failed to execute",
judgment.error().orElse(null)
);
}
if (!judgment.pass()) {
throw new TaskFailedException(judgment.reasoning());
}
} catch (JudgeExecutionException | TaskFailedException e) {
// Handle errors appropriately
logger.error("Task failed", e);
alertTeam(e);
}
10. Next Steps
-
Deterministic Judges: File, Command, and Build judges
-
LLM-Powered Judges: AI-based evaluation
-
Agent as Judge: Agents evaluating agents
-
Jury Pattern: Ensemble evaluation
-
Judge API Overview: Complete Judge API documentation
11. Further Reading
-
Your First Judge - Practical introduction
-
Agent Advisors - Complete advisor pattern documentation
-
AgentClient - High-level client API
JudgeAdvisor
is the primary integration point for automated agent evaluation. Every production agent task should use at least one judge to verify success.