Command Judges: Build and Command Verification

Table of Contents

1. CommandJudge
2. BuildSuccessJudge
3. Production Patterns
4. Error Handling
5. Performance Considerations
6. Best Practices
7. Next Steps
8. Further Reading

Command judges verify command execution and build success. They provide deterministic evaluation of shell commands, build tools, and test suites.

1. CommandJudge

Executes a shell command and judges success based on exit code.

1.1. Basic Usage

import org.springaicommunity.agents.judge.exec.CommandJudge;

// Simple command - expects exit code 0
Judge judge = new CommandJudge("mvn compile");

AgentClientResponse response = agentClientBuilder
    .goal("Fix compilation errors in UserService")
    .workingDirectory(projectRoot)
    .advisors(JudgeAdvisor.builder()
        .judge(judge)
        .build())
    .call();

Judgment judgment = response.getJudgment();

if (judgment.pass()) {
    System.out.println("✓ Compilation successful");
} else {
    System.out.println("✗ Compilation failed");
    System.out.println("Output: " + judgment.metadata().get("output"));
}

1.2. Constructor Options

// Default: exit code 0, 2-minute timeout
new CommandJudge("mvn test");

// Custom exit code and timeout
new CommandJudge("my-script.sh", 0, Duration.ofMinutes(5));

Parameters: - command - Shell command to execute - expectedExitCode - Expected exit code for success (default: 0) - timeout - Maximum execution duration (default: 2 minutes)

1.3. Exit Code Evaluation

Commands pass when exit code matches expected:

// Expect success (exit code 0)
Judge successJudge = new CommandJudge("mvn clean install", 0, Duration.ofMinutes(10));

// Expect failure (exit code 1) - useful for negative tests
Judge failureJudge = new CommandJudge("grep 'ERROR' build.log", 1, Duration.ofSeconds(30));

1.4. Common Commands

1.4.1. Maven

// Compile
new CommandJudge("mvn compile")

// Run tests
new CommandJudge("mvn test")

// Full build
new CommandJudge("mvn clean install", 0, Duration.ofMinutes(10))

// Verify phase
new CommandJudge("mvn verify")

1.4.2. Gradle

// Build
new CommandJudge("gradle build")

// Run tests
new CommandJudge("gradle test")

// Clean build
new CommandJudge("gradle clean build", 0, Duration.ofMinutes(10))

1.4.3. npm/Node.js

// Install dependencies
new CommandJudge("npm install", 0, Duration.ofMinutes(5))

// Run tests
new CommandJudge("npm test")

// Build
new CommandJudge("npm run build", 0, Duration.ofMinutes(5))

// Lint
new CommandJudge("npm run lint")

1.4.4. Custom Scripts

// Shell script
new CommandJudge("./scripts/verify-deployment.sh", 0, Duration.ofMinutes(3))

// Python script
new CommandJudge("python validate.py")

// Docker build
new CommandJudge("docker build -t myapp:latest .", 0, Duration.ofMinutes(15))

1.5. Judgment Structure

When command succeeds:

Judgment {
    status = PASS
    score = BooleanScore(true)
    reasoning = "Command succeeded with exit code 0"
    checks = [
        Check(name="command_execution", passed=true, message="Command executed successfully")
    ]
    metadata = {
        "command": "mvn test",
        "exitCode": 0,
        "expectedExitCode": 0,
        "output": "[maven output...]",
        "duration": "PT45.2S"
    }
}

When command fails:

Judgment {
    status = FAIL
    score = BooleanScore(false)
    reasoning = "Command failed. Expected exit code 0 but got 1"
    checks = [
        Check(name="command_execution", passed=false, message="Command execution failed")
    ]
    metadata = {
        "command": "mvn test",
        "exitCode": 1,
        "expectedExitCode": 0,
        "output": "[maven error output...]",
        "duration": "PT12.5S"
    }
}

1.6. Accessing Command Output

The judgment metadata includes command output:

Judgment judgment = response.getJudgment();

if (!judgment.pass()) {
    // Get command output for debugging
    String output = (String) judgment.metadata().get("output");
    Integer exitCode = (Integer) judgment.metadata().get("exitCode");

    logger.error("Command failed with exit code {}", exitCode);
    logger.error("Output:\n{}", output);

    // Parse output for specific errors
    if (output.contains("BUILD FAILURE")) {
        logger.error("Maven build failed");
    }
}

1.7. Timeout Handling

Commands that exceed timeout are terminated:

// Long-running command with appropriate timeout
Judge judge = new CommandJudge(
    "npm install",
    0,
    Duration.ofMinutes(10) // Allow sufficient time
);

Judgment judgment = judge.judge(context);

if (judgment.status() == JudgmentStatus.FAIL) {
    String output = (String) judgment.metadata().get("output");
    if (output.contains("timeout") || output.contains("killed")) {
        logger.error("Command timed out after 10 minutes");
    }
}

2. BuildSuccessJudge

Specialized judge for build verification with smart wrapper detection and longer default timeout.

2.1. Maven Build Verification

BuildSuccessJudge.maven() auto-detects Maven wrapper:

import org.springaicommunity.agents.judge.exec.BuildSuccessJudge;

// Auto-detects ./mvnw or falls back to mvn
Judge judge = BuildSuccessJudge.maven("clean", "install");

AgentClientResponse response = agentClientBuilder
    .goal("Fix failing tests and build the project")
    .workingDirectory(projectRoot)
    .advisors(JudgeAdvisor.builder()
        .judge(judge)
        .build())
    .call();

if (response.isJudgmentPassed()) {
    System.out.println("✓ Build successful - safe to deploy");
    deploy();
} else {
    System.out.println("✗ Build failed");
}

Wrapper Detection:

Checks for ./mvnw in workspace
Uses ./mvnw if found and executable
Falls back to mvn on PATH otherwise

Workspace contains mvnw → executes: ./mvnw clean install
Workspace lacks mvnw    → executes: mvn clean install

2.2. Gradle Build Verification

BuildSuccessJudge.gradle() auto-detects Gradle wrapper:

// Auto-detects ./gradlew or falls back to gradle
Judge judge = BuildSuccessJudge.gradle("build");

AgentClientResponse response = agentClientBuilder
    .goal("Build the Gradle project")
    .workingDirectory(projectRoot)
    .advisors(JudgeAdvisor.builder()
        .judge(judge)
        .build())
    .call();

Wrapper Detection:

Checks for ./gradlew in workspace
Uses ./gradlew if found and executable
Falls back to gradle on PATH otherwise

2.3. Custom Build Commands

For non-Maven/Gradle builds:

// npm build
Judge npmJudge = new BuildSuccessJudge("npm run build");

// Cargo (Rust)
Judge cargoJudge = new BuildSuccessJudge("cargo build --release");

// Make
Judge makeJudge = new BuildSuccessJudge("make all");

// Custom script
Judge customJudge = new BuildSuccessJudge("./build.sh");

2.4. Build-Specific Timeout

BuildSuccessJudge uses 10-minute default timeout (vs 2-minute for CommandJudge):

// BuildSuccessJudge - 10 minute timeout (default)
BuildSuccessJudge.maven("clean", "install");
// Allows time for dependency downloads, compilation, tests

// CommandJudge - 2 minute timeout (default)
new CommandJudge("mvn clean install");
// May timeout on large builds

Recommendation: Use BuildSuccessJudge for build commands, CommandJudge for quick verifications.

2.5. Common Maven Goals

// Compile only
BuildSuccessJudge.maven("compile");

// Run tests
BuildSuccessJudge.maven("test");

// Clean and compile
BuildSuccessJudge.maven("clean", "compile");

// Full build with tests
BuildSuccessJudge.maven("clean", "install");

// Verify (includes integration tests)
BuildSuccessJudge.maven("verify");

// Skip tests
BuildSuccessJudge.maven("clean", "install", "-DskipTests");

2.6. Common Gradle Tasks

// Build
BuildSuccessJudge.gradle("build");

// Test
BuildSuccessJudge.gradle("test");

// Clean build
BuildSuccessJudge.gradle("clean", "build");

// Assemble (no tests)
BuildSuccessJudge.gradle("assemble");

3. Production Patterns

3.1. Pattern 1: CI/CD Build Verification

Verify builds before deployment:

@Service
public class ContinuousDeployment {

    private final AgentClient.Builder agentClientBuilder;

    public void fixAndDeploy(Path projectRoot) {
        // Step 1: Fix failing tests
        AgentClientResponse fixResponse = agentClientBuilder
            .goal("Fix all failing unit tests")
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder()
                .judge(BuildSuccessJudge.maven("test"))
                .build())
            .call();

        if (!fixResponse.isJudgmentPassed()) {
            throw new CIException("Tests still failing after fix attempt");
        }

        // Step 2: Full build
        AgentClientResponse buildResponse = agentClientBuilder
            .goal("Build the project")
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder()
                .judge(BuildSuccessJudge.maven("clean", "install"))
                .build())
            .call();

        if (!buildResponse.isJudgmentPassed()) {
            throw new CIException("Build failed");
        }

        // Safe to deploy
        deploy(projectRoot);
    }
}

3.2. Pattern 2: Multi-Stage Build Verification

Verify each build stage independently:

public class MultiStageBuild {

    public void buildProject(Path projectRoot) {
        // Stage 1: Compilation
        verifyStage(
            projectRoot,
            "Fix compilation errors",
            BuildSuccessJudge.maven("compile")
        );

        // Stage 2: Unit tests
        verifyStage(
            projectRoot,
            "Fix unit test failures",
            BuildSuccessJudge.maven("test")
        );

        // Stage 3: Integration tests
        verifyStage(
            projectRoot,
            "Fix integration test failures",
            BuildSuccessJudge.maven("verify")
        );

        System.out.println("✓ All build stages passed");
    }

    private void verifyStage(Path projectRoot, String goal, Judge judge) {
        AgentClientResponse response = agentClientBuilder
            .goal(goal)
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder().judge(judge).build())
            .call();

        if (!response.isJudgmentPassed()) {
            throw new BuildException("Build stage failed: " + goal);
        }
    }
}

3.3. Pattern 3: Quality Gates

Combine build with quality checks:

public class QualityGate {

    public void enforceQuality(Path projectRoot) {
        AgentClientResponse response = agentClientBuilder
            .goal("Ensure code meets quality standards")
            .workingDirectory(projectRoot)
            .advisors(
                // Build must succeed
                JudgeAdvisor.builder()
                    .judge(BuildSuccessJudge.maven("clean", "install"))
                    .build(),

                // Code coverage check
                JudgeAdvisor.builder()
                    .judge(new CommandJudge("mvn jacoco:check"))
                    .build(),

                // Code style check
                JudgeAdvisor.builder()
                    .judge(new CommandJudge("mvn checkstyle:check"))
                    .build(),

                // Security scan
                JudgeAdvisor.builder()
                    .judge(new CommandJudge("mvn dependency:check"))
                    .build()
            )
            .call();

        if (!response.isJudgmentPassed()) {
            throw new QualityException("Quality gate failed");
        }
    }
}

3.4. Pattern 4: Build Output Analysis

Parse build output for specific issues:

public class BuildAnalyzer {

    public void analyzeAndFix(Path projectRoot) {
        AgentClientResponse response = agentClientBuilder
            .goal("Build the project")
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder()
                .judge(BuildSuccessJudge.maven("clean", "install"))
                .build())
            .call();

        Judgment judgment = response.getJudgment();

        if (!judgment.pass()) {
            String output = (String) judgment.metadata().get("output");

            // Analyze output for specific issues
            if (output.contains("compilation error")) {
                logger.error("Compilation errors detected");
                handleCompilationErrors(output);
            }
            else if (output.contains("test failures")) {
                logger.error("Test failures detected");
                handleTestFailures(output);
            }
            else if (output.contains("dependency resolution failed")) {
                logger.error("Dependency issues detected");
                handleDependencyIssues(output);
            }

            throw new BuildException("Build failed - see analysis above");
        }
    }
}

3.5. Pattern 5: Incremental Build Verification

Verify builds incrementally during development:

public class IncrementalBuild {

    public void developFeature(Path projectRoot, String featureName) {
        // Quick compile check
        quickCheck(projectRoot, "Ensure code compiles",
            BuildSuccessJudge.maven("compile"));

        // Unit test check
        quickCheck(projectRoot, "Ensure unit tests pass",
            BuildSuccessJudge.maven("test"));

        // Full verification before commit
        AgentClientResponse response = agentClientBuilder
            .goal("Complete " + featureName + " feature")
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder()
                .judge(BuildSuccessJudge.maven("clean", "verify"))
                .build())
            .call();

        if (response.isJudgmentPassed()) {
            gitCommit(featureName);
            createPullRequest(featureName);
        }
    }

    private void quickCheck(Path projectRoot, String goal, Judge judge) {
        AgentClientResponse response = agentClientBuilder
            .goal(goal)
            .workingDirectory(projectRoot)
            .advisors(JudgeAdvisor.builder().judge(judge).build())
            .call();

        if (!response.isJudgmentPassed()) {
            throw new BuildException("Quick check failed: " + goal);
        }
    }
}

4. Error Handling

4.1. Command Not Found

Judge judge = new CommandJudge("nonexistent-command");
Judgment judgment = judge.judge(context);

// Status: FAIL
// Reasoning: "Command execution failed: command not found"
assertThat(judgment.pass()).isFalse();

4.2. Timeout Exceeded

// Short timeout for demonstration
Judge judge = new CommandJudge(
    "sleep 300",
    0,
    Duration.ofSeconds(5)
);

Judgment judgment = judge.judge(context);

// Command terminated after timeout
assertThat(judgment.pass()).isFalse();

4.3. Permission Denied

Judge judge = new CommandJudge("./non-executable-script.sh");
Judgment judgment = judge.judge(context);

// Status: FAIL
// Exit code: 126 (permission denied)
assertThat(judgment.pass()).isFalse();

5. Performance Considerations

Command execution times vary significantly:

Command Type	Typical Duration	Timeout Recommendation
Quick compile	10-30 seconds	2 minutes (default)
Unit tests	30-120 seconds	5 minutes
Full build (Maven)	2-5 minutes	10 minutes (BuildSuccessJudge default)
Integration tests	3-10 minutes	15 minutes
Docker builds	5-15 minutes	20 minutes

Command Type

Typical Duration

Timeout Recommendation

Quick compile

10-30 seconds

2 minutes (default)

Unit tests

30-120 seconds

5 minutes

Full build (Maven)

2-5 minutes

10 minutes (BuildSuccessJudge default)

Integration tests

3-10 minutes

15 minutes

Docker builds

5-15 minutes

20 minutes

Best practice: Set timeout to 2-3x expected duration to account for variability.

6. Best Practices

6.1. 1. Use BuildSuccessJudge for Builds

// ✅ Good: Auto-detects wrapper, appropriate timeout
BuildSuccessJudge.maven("clean", "install");

// ❌ Manual: Must specify wrapper path, default timeout too short
new CommandJudge("./mvnw clean install");

6.2. 2. Set Appropriate Timeouts

// ✅ Good: Generous timeout for build
new CommandJudge("npm install", 0, Duration.ofMinutes(10));

// ❌ Risky: Default 2-minute timeout may be insufficient
new CommandJudge("npm install");

6.3. 3. Capture Output for Debugging

Judgment judgment = response.getJudgment();

if (!judgment.pass()) {
    // Log full output for debugging
    logger.error("Build output:\n{}", judgment.metadata().get("output"));
}

6.4. 4. Fail Fast with Quick Checks

// Quick compile check first (fast fail)
BuildSuccessJudge.maven("compile");

// Then expensive full build if compile succeeds
BuildSuccessJudge.maven("clean", "install");

7. Next Steps

File Judges: File verification and content checks
LLM Judges: AI-based evaluation
Judge Advisor: Integration with AgentClient
Deterministic Overview: All deterministic judge types

8. Further Reading

Judge API Overview - Complete Judge API documentation
Your First Judge - Practical introduction
CLI Agents - Understanding autonomous agents

Command judges provide fast, reliable verification of shell commands and builds. They’re essential for CI/CD pipelines and production agent workflows.