Deterministic Judges
- 1. What Are Deterministic Judges?
- 2. Judge Categories
- 3. Basic Usage Pattern
- 4. Comparison with LLM Judges
- 5. Common Patterns
- 6. Performance Characteristics
- 7. Error Handling
- 8. Creating Custom Deterministic Judges
- 9. Best Practices
- 10. Detailed Judge Documentation
- 11. Next Steps
- 12. Further Reading
Deterministic judges use rule-based evaluation without AI. They provide fast, reliable, and cost-free verification of agent execution through file system checks, command execution, and assertions.
1. What Are Deterministic Judges?
Deterministic judges evaluate agent outcomes using predefined rules rather than AI inference. They:
-
✅ Are fast - No LLM inference overhead (milliseconds vs seconds)
-
✅ Are predictable - Same input always produces same output
-
✅ Are free - No API costs
-
✅ Are reliable - No rate limits, no API downtime
-
✅ Are precise - Exact checks, no hallucinations
When to use deterministic judges:
-
File creation/modification verification
-
Build and test execution
-
Command success checking
-
Structured output validation
-
Boolean pass/fail criteria
2. Judge Categories
Spring AI Agents provides three categories of deterministic judges:
2.1. 1. File System Judges
Verify file and directory operations:
Judge | Purpose | Example |
---|---|---|
|
Verify file/directory exists |
|
|
Verify file contents |
|
|
Verify file/directory does NOT exist |
|
See File Judges for complete details.
2.2. 2. Command Execution Judges
Verify command success and output:
Judge | Purpose | Example |
---|---|---|
|
Verify command exits successfully |
|
|
Verify build command success |
|
|
Verify test execution success |
|
See Command Judges for complete details.
2.3. 3. Assertion-Based Judges
Leverage AssertJ for rich validation:
AssertJJudge judge = AssertJJudge.create(context -> assertions -> {
String output = context.agentOutput().get().asText();
assertions.assertThat(output)
.contains("Hello")
.hasLineCount(10)
.startsWith("Success:");
});
AssertJ provides 2000+ assertions covering strings, collections, files, dates, JSON, XML, and more.
3. Basic Usage Pattern
All deterministic judges follow the same pattern:
// 1. Create judge
Judge judge = new FileExistsJudge("output.txt");
// 2. Attach to agent execution
AgentClientResponse response = agentClientBuilder
.goal("Create output.txt with system metrics")
.workingDirectory(Path.of("/tmp/reports"))
.advisors(JudgeAdvisor.builder()
.judge(judge)
.build())
.call();
// 3. Check judgment
Judgment judgment = response.getJudgment();
if (judgment.pass()) {
System.out.println("✓ Success");
} else {
System.out.println("✗ Failed: " + judgment.reasoning());
}
4. Comparison with LLM Judges
Understanding when to use deterministic vs LLM judges:
Aspect | Deterministic Judges | LLM Judges |
---|---|---|
Speed |
Milliseconds |
Seconds (LLM inference) |
Cost |
Free |
API costs per judgment |
Reliability |
100% deterministic |
Non-deterministic (variance) |
Use Cases |
File checks, build success, exact validation |
Semantic correctness, quality assessment, subjective criteria |
Precision |
Exact matches only |
Semantic understanding |
Setup |
Simple - no API keys |
Requires LLM API access |
Examples |
File exists, tests pass, output matches regex |
Code quality, correctness, naturalness |
Best practice: Start with deterministic judges for objective criteria, add LLM judges for subjective evaluation.
5. Common Patterns
5.1. Pattern 1: Build Verification
Verify build and tests pass before proceeding:
AgentClientResponse response = agentClientBuilder
.goal("Fix failing tests in UserServiceTest")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(new BuildSuccessJudge())
.build())
.call();
if (response.isJudgmentPassed()) {
deploy();
} else {
alert("Build still failing after agent fix attempt");
}
5.2. Pattern 2: File Creation Verification
Verify required files were created:
AgentClientResponse response = agentClientBuilder
.goal("Generate project documentation")
.workingDirectory(projectRoot)
.advisors(
JudgeAdvisor.builder()
.judge(new FileExistsJudge("README.md"))
.build(),
JudgeAdvisor.builder()
.judge(new FileExistsJudge("docs/installation.md"))
.build(),
JudgeAdvisor.builder()
.judge(new FileExistsJudge("docs/api.md"))
.build()
)
.call();
boolean allFilesCreated = response.isJudgmentPassed();
5.3. Pattern 3: Multi-Criteria Validation
Combine multiple deterministic checks:
// Check 1: Build succeeds
Judge buildJudge = new BuildSuccessJudge();
// Check 2: README created
Judge readmeJudge = new FileExistsJudge("README.md");
// Check 3: README has required content
Judge contentJudge = new FileContentJudge("README.md", content ->
content.contains("# Installation") &&
content.contains("# Usage") &&
content.contains("# License")
);
AgentClientResponse response = agentClientBuilder
.goal("Create Spring Boot project with documentation")
.workingDirectory(projectRoot)
.advisors(
JudgeAdvisor.builder().judge(buildJudge).build(),
JudgeAdvisor.builder().judge(readmeJudge).build(),
JudgeAdvisor.builder().judge(contentJudge).build()
)
.call();
5.4. Pattern 4: Hybrid Deterministic + LLM
Fast deterministic checks first, then expensive LLM evaluation:
// Fast check: Build must succeed
Judge buildJudge = new BuildSuccessJudge();
// Expensive check: Code quality assessment
Judge qualityJudge = new CodeQualityJudge(chatClient);
AgentClientResponse response = agentClientBuilder
.goal("Refactor UserService for better maintainability")
.workingDirectory(projectRoot)
.advisors(
// Fast fail if build breaks
JudgeAdvisor.builder()
.judge(buildJudge)
.order(100) // Run first
.build(),
// Only run if build passed
JudgeAdvisor.builder()
.judge(qualityJudge)
.order(200) // Run second
.build()
)
.call();
6. Performance Characteristics
Deterministic judges are extremely fast:
Judge Type | Typical Duration | Notes |
---|---|---|
|
< 5ms |
File system check |
|
< 50ms |
File read + predicate |
|
Varies |
Command execution time |
|
Varies (10s - 60s) |
Build/test duration |
|
< 10ms |
In-memory assertions |
Recommendation: Use deterministic judges liberally—they’re fast and free.
7. Error Handling
Deterministic judges handle common error cases:
7.1. File Not Found
Judge judge = new FileExistsJudge("missing.txt");
Judgment judgment = judge.judge(context);
// Status: FAIL
// Reasoning: "File 'missing.txt' does not exist in workspace"
assertThat(judgment.pass()).isFalse();
7.2. Command Execution Failure
Judge judge = new CommandJudge("mvn test");
Judgment judgment = judge.judge(context);
if (!judgment.pass()) {
// Command failed
System.out.println("Build failed: " + judgment.reasoning());
// Check metadata for exit code
Integer exitCode = (Integer) judgment.metadata().get("exitCode");
System.out.println("Exit code: " + exitCode);
}
8. Creating Custom Deterministic Judges
Extend DeterministicJudge
for custom rules:
import org.springaicommunity.agents.judge.DeterministicJudge;
import org.springaicommunity.agents.judge.result.Judgment;
import org.springaicommunity.agents.judge.result.JudgmentContext;
import org.springaicommunity.agents.judge.result.Score;
public class CustomFileCountJudge extends DeterministicJudge {
private final int expectedCount;
public CustomFileCountJudge(int expectedCount) {
this.expectedCount = expectedCount;
}
@Override
public Judgment judge(JudgmentContext context) {
Path workspace = context.workspace();
try (var files = Files.list(workspace)) {
long count = files.filter(Files::isRegularFile).count();
boolean pass = count == expectedCount;
return Judgment.builder()
.status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.score(new BooleanScore(pass))
.reasoning(String.format(
"Expected %d files, found %d files in workspace",
expectedCount, count
))
.metadata(Map.of("fileCount", count))
.build();
} catch (IOException e) {
return Judgment.error(e, "Failed to count files in workspace");
}
}
}
Usage:
Judge judge = new CustomFileCountJudge(5);
AgentClientResponse response = agentClientBuilder
.goal("Create 5 data files in workspace")
.workingDirectory(Path.of("/tmp/data"))
.advisors(JudgeAdvisor.builder().judge(judge).build())
.call();
boolean correctFileCount = response.isJudgmentPassed();
9. Best Practices
9.1. 1. Use Deterministic Judges for Objective Criteria
// ✅ Good: Objective check
new BuildSuccessJudge()
// ❌ Overkill: LLM for simple check
new LLMJudge(chatClient, "Did the build succeed?")
9.2. 2. Combine Multiple Checks
// Verify build, tests, and documentation
agentClientBuilder
.goal("Complete feature implementation")
.advisors(
JudgeAdvisor.builder().judge(new BuildSuccessJudge()).build(),
JudgeAdvisor.builder().judge(new TestSuccessJudge()).build(),
JudgeAdvisor.builder().judge(new FileExistsJudge("docs/feature.md")).build()
)
.call();
9.3. 3. Fail Fast with Deterministic Checks
// Fast deterministic check first
Judge buildJudge = new BuildSuccessJudge();
AgentClientResponse response = agentClientBuilder
.goal("Implement new feature")
.advisors(JudgeAdvisor.builder().judge(buildJudge).build())
.call();
if (!response.isJudgmentPassed()) {
// Stop here - don't proceed to expensive LLM evaluation
return;
}
// Only run expensive LLM judge if build passed
Judge qualityJudge = new CodeQualityJudge(chatClient);
// ... continue with LLM evaluation
9.4. 4. Use Meaningful Error Messages
// ✅ Good: Clear reasoning
return Judgment.builder()
.status(JudgmentStatus.FAIL)
.reasoning("Expected file 'output.txt' but found 'output.csv'")
.build();
// ❌ Poor: Vague reasoning
return Judgment.builder()
.status(JudgmentStatus.FAIL)
.reasoning("Failed")
.build();
10. Detailed Judge Documentation
Explore specific deterministic judge types:
-
File Judges - FileExists, FileContent, FileNotExists
-
Command Judges - Command, BuildSuccess, TestSuccess
-
Custom Judges (coming soon) - Creating your own deterministic judges
11. Next Steps
-
File Judges: Complete file system verification
-
Command Judges: Build and test verification
-
LLM Judges: AI-based evaluation
-
Judge Advisor: Integration with AgentClient
12. Further Reading
-
Judge API Overview - Complete Judge API documentation
-
Your First Judge - Practical introduction
Deterministic judges provide fast, reliable, cost-free verification of agent execution. They should be the first line of defense in any production agent evaluation strategy.