Deterministic Judges

Deterministic judges use rule-based evaluation without AI. They provide fast, reliable, and cost-free verification of agent execution through file system checks, command execution, and assertions.

1. What Are Deterministic Judges?

Deterministic judges evaluate agent outcomes using predefined rules rather than AI inference. They:

  • Are fast - No LLM inference overhead (milliseconds vs seconds)

  • Are predictable - Same input always produces same output

  • Are free - No API costs

  • Are reliable - No rate limits, no API downtime

  • Are precise - Exact checks, no hallucinations

When to use deterministic judges:

  • File creation/modification verification

  • Build and test execution

  • Command success checking

  • Structured output validation

  • Boolean pass/fail criteria

2. Judge Categories

Spring AI Agents provides three categories of deterministic judges:

2.1. 1. File System Judges

Verify file and directory operations:

Judge Purpose Example

FileExistsJudge

Verify file/directory exists

new FileExistsJudge("report.txt")

FileContentJudge

Verify file contents

new FileContentJudge("pom.xml", content → content.contains("<version>1.0</version>"))

FileNotExistsJudge

Verify file/directory does NOT exist

new FileNotExistsJudge("temp.log")

See File Judges for complete details.

2.2. 2. Command Execution Judges

Verify command success and output:

Judge Purpose Example

CommandJudge

Verify command exits successfully

new CommandJudge("mvn test")

BuildSuccessJudge

Verify build command success

new BuildSuccessJudge()

TestSuccessJudge

Verify test execution success

new TestSuccessJudge()

See Command Judges for complete details.

2.3. 3. Assertion-Based Judges

Leverage AssertJ for rich validation:

AssertJJudge judge = AssertJJudge.create(context -> assertions -> {
    String output = context.agentOutput().get().asText();

    assertions.assertThat(output)
        .contains("Hello")
        .hasLineCount(10)
        .startsWith("Success:");
});

AssertJ provides 2000+ assertions covering strings, collections, files, dates, JSON, XML, and more.

3. Basic Usage Pattern

All deterministic judges follow the same pattern:

// 1. Create judge
Judge judge = new FileExistsJudge("output.txt");

// 2. Attach to agent execution
AgentClientResponse response = agentClientBuilder
    .goal("Create output.txt with system metrics")
    .workingDirectory(Path.of("/tmp/reports"))
    .advisors(JudgeAdvisor.builder()
        .judge(judge)
        .build())
    .call();

// 3. Check judgment
Judgment judgment = response.getJudgment();

if (judgment.pass()) {
    System.out.println("✓ Success");
} else {
    System.out.println("✗ Failed: " + judgment.reasoning());
}

4. Comparison with LLM Judges

Understanding when to use deterministic vs LLM judges:

Aspect Deterministic Judges LLM Judges

Speed

Milliseconds

Seconds (LLM inference)

Cost

Free

API costs per judgment

Reliability

100% deterministic

Non-deterministic (variance)

Use Cases

File checks, build success, exact validation

Semantic correctness, quality assessment, subjective criteria

Precision

Exact matches only

Semantic understanding

Setup

Simple - no API keys

Requires LLM API access

Examples

File exists, tests pass, output matches regex

Code quality, correctness, naturalness

Best practice: Start with deterministic judges for objective criteria, add LLM judges for subjective evaluation.

5. Common Patterns

5.1. Pattern 1: Build Verification

Verify build and tests pass before proceeding:

AgentClientResponse response = agentClientBuilder
    .goal("Fix failing tests in UserServiceTest")
    .workingDirectory(projectRoot)
    .advisors(JudgeAdvisor.builder()
        .judge(new BuildSuccessJudge())
        .build())
    .call();

if (response.isJudgmentPassed()) {
    deploy();
} else {
    alert("Build still failing after agent fix attempt");
}

5.2. Pattern 2: File Creation Verification

Verify required files were created:

AgentClientResponse response = agentClientBuilder
    .goal("Generate project documentation")
    .workingDirectory(projectRoot)
    .advisors(
        JudgeAdvisor.builder()
            .judge(new FileExistsJudge("README.md"))
            .build(),
        JudgeAdvisor.builder()
            .judge(new FileExistsJudge("docs/installation.md"))
            .build(),
        JudgeAdvisor.builder()
            .judge(new FileExistsJudge("docs/api.md"))
            .build()
    )
    .call();

boolean allFilesCreated = response.isJudgmentPassed();

5.3. Pattern 3: Multi-Criteria Validation

Combine multiple deterministic checks:

// Check 1: Build succeeds
Judge buildJudge = new BuildSuccessJudge();

// Check 2: README created
Judge readmeJudge = new FileExistsJudge("README.md");

// Check 3: README has required content
Judge contentJudge = new FileContentJudge("README.md", content ->
    content.contains("# Installation") &&
    content.contains("# Usage") &&
    content.contains("# License")
);

AgentClientResponse response = agentClientBuilder
    .goal("Create Spring Boot project with documentation")
    .workingDirectory(projectRoot)
    .advisors(
        JudgeAdvisor.builder().judge(buildJudge).build(),
        JudgeAdvisor.builder().judge(readmeJudge).build(),
        JudgeAdvisor.builder().judge(contentJudge).build()
    )
    .call();

5.4. Pattern 4: Hybrid Deterministic + LLM

Fast deterministic checks first, then expensive LLM evaluation:

// Fast check: Build must succeed
Judge buildJudge = new BuildSuccessJudge();

// Expensive check: Code quality assessment
Judge qualityJudge = new CodeQualityJudge(chatClient);

AgentClientResponse response = agentClientBuilder
    .goal("Refactor UserService for better maintainability")
    .workingDirectory(projectRoot)
    .advisors(
        // Fast fail if build breaks
        JudgeAdvisor.builder()
            .judge(buildJudge)
            .order(100) // Run first
            .build(),

        // Only run if build passed
        JudgeAdvisor.builder()
            .judge(qualityJudge)
            .order(200) // Run second
            .build()
    )
    .call();

6. Performance Characteristics

Deterministic judges are extremely fast:

Judge Type Typical Duration Notes

FileExistsJudge

< 5ms

File system check

FileContentJudge

< 50ms

File read + predicate

CommandJudge

Varies

Command execution time

BuildSuccessJudge

Varies (10s - 60s)

Build/test duration

AssertJJudge

< 10ms

In-memory assertions

Recommendation: Use deterministic judges liberally—they’re fast and free.

7. Error Handling

Deterministic judges handle common error cases:

7.1. File Not Found

Judge judge = new FileExistsJudge("missing.txt");

Judgment judgment = judge.judge(context);

// Status: FAIL
// Reasoning: "File 'missing.txt' does not exist in workspace"
assertThat(judgment.pass()).isFalse();

7.2. Command Execution Failure

Judge judge = new CommandJudge("mvn test");

Judgment judgment = judge.judge(context);

if (!judgment.pass()) {
    // Command failed
    System.out.println("Build failed: " + judgment.reasoning());

    // Check metadata for exit code
    Integer exitCode = (Integer) judgment.metadata().get("exitCode");
    System.out.println("Exit code: " + exitCode);
}

7.3. Permission Errors

Judge judge = new FileContentJudge("/etc/passwd", content -> true);

Judgment judgment = judge.judge(context);

if (judgment.status() == JudgmentStatus.ERROR) {
    // Permission denied or other I/O error
    System.err.println("Error: " + judgment.error());
}

8. Creating Custom Deterministic Judges

Extend DeterministicJudge for custom rules:

import org.springaicommunity.agents.judge.DeterministicJudge;
import org.springaicommunity.agents.judge.result.Judgment;
import org.springaicommunity.agents.judge.result.JudgmentContext;
import org.springaicommunity.agents.judge.result.Score;

public class CustomFileCountJudge extends DeterministicJudge {

    private final int expectedCount;

    public CustomFileCountJudge(int expectedCount) {
        this.expectedCount = expectedCount;
    }

    @Override
    public Judgment judge(JudgmentContext context) {
        Path workspace = context.workspace();

        try (var files = Files.list(workspace)) {
            long count = files.filter(Files::isRegularFile).count();

            boolean pass = count == expectedCount;

            return Judgment.builder()
                .status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
                .score(new BooleanScore(pass))
                .reasoning(String.format(
                    "Expected %d files, found %d files in workspace",
                    expectedCount, count
                ))
                .metadata(Map.of("fileCount", count))
                .build();

        } catch (IOException e) {
            return Judgment.error(e, "Failed to count files in workspace");
        }
    }
}

Usage:

Judge judge = new CustomFileCountJudge(5);

AgentClientResponse response = agentClientBuilder
    .goal("Create 5 data files in workspace")
    .workingDirectory(Path.of("/tmp/data"))
    .advisors(JudgeAdvisor.builder().judge(judge).build())
    .call();

boolean correctFileCount = response.isJudgmentPassed();

9. Best Practices

9.1. 1. Use Deterministic Judges for Objective Criteria

// ✅ Good: Objective check
new BuildSuccessJudge()

// ❌ Overkill: LLM for simple check
new LLMJudge(chatClient, "Did the build succeed?")

9.2. 2. Combine Multiple Checks

// Verify build, tests, and documentation
agentClientBuilder
    .goal("Complete feature implementation")
    .advisors(
        JudgeAdvisor.builder().judge(new BuildSuccessJudge()).build(),
        JudgeAdvisor.builder().judge(new TestSuccessJudge()).build(),
        JudgeAdvisor.builder().judge(new FileExistsJudge("docs/feature.md")).build()
    )
    .call();

9.3. 3. Fail Fast with Deterministic Checks

// Fast deterministic check first
Judge buildJudge = new BuildSuccessJudge();

AgentClientResponse response = agentClientBuilder
    .goal("Implement new feature")
    .advisors(JudgeAdvisor.builder().judge(buildJudge).build())
    .call();

if (!response.isJudgmentPassed()) {
    // Stop here - don't proceed to expensive LLM evaluation
    return;
}

// Only run expensive LLM judge if build passed
Judge qualityJudge = new CodeQualityJudge(chatClient);
// ... continue with LLM evaluation

9.4. 4. Use Meaningful Error Messages

// ✅ Good: Clear reasoning
return Judgment.builder()
    .status(JudgmentStatus.FAIL)
    .reasoning("Expected file 'output.txt' but found 'output.csv'")
    .build();

// ❌ Poor: Vague reasoning
return Judgment.builder()
    .status(JudgmentStatus.FAIL)
    .reasoning("Failed")
    .build();

10. Detailed Judge Documentation

Explore specific deterministic judge types:

  • File Judges - FileExists, FileContent, FileNotExists

  • Command Judges - Command, BuildSuccess, TestSuccess

  • Custom Judges (coming soon) - Creating your own deterministic judges

11. Next Steps

12. Further Reading


Deterministic judges provide fast, reliable, cost-free verification of agent execution. They should be the first line of defense in any production agent evaluation strategy.