Command Judges: Build and Command Verification
Command judges verify command execution and build success. They provide deterministic evaluation of shell commands, build tools, and test suites.
1. CommandJudge
Executes a shell command and judges success based on exit code.
1.1. Basic Usage
import org.springaicommunity.agents.judge.exec.CommandJudge;
// Simple command - expects exit code 0
Judge judge = new CommandJudge("mvn compile");
AgentClientResponse response = agentClientBuilder
.goal("Fix compilation errors in UserService")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(judge)
.build())
.call();
Judgment judgment = response.getJudgment();
if (judgment.pass()) {
System.out.println("✓ Compilation successful");
} else {
System.out.println("✗ Compilation failed");
System.out.println("Output: " + judgment.metadata().get("output"));
}
1.2. Constructor Options
// Default: exit code 0, 2-minute timeout
new CommandJudge("mvn test");
// Custom exit code and timeout
new CommandJudge("my-script.sh", 0, Duration.ofMinutes(5));
Parameters:
- command
- Shell command to execute
- expectedExitCode
- Expected exit code for success (default: 0
)
- timeout
- Maximum execution duration (default: 2 minutes
)
1.3. Exit Code Evaluation
Commands pass when exit code matches expected:
// Expect success (exit code 0)
Judge successJudge = new CommandJudge("mvn clean install", 0, Duration.ofMinutes(10));
// Expect failure (exit code 1) - useful for negative tests
Judge failureJudge = new CommandJudge("grep 'ERROR' build.log", 1, Duration.ofSeconds(30));
1.4. Common Commands
1.4.1. Maven
// Compile
new CommandJudge("mvn compile")
// Run tests
new CommandJudge("mvn test")
// Full build
new CommandJudge("mvn clean install", 0, Duration.ofMinutes(10))
// Verify phase
new CommandJudge("mvn verify")
1.4.2. Gradle
// Build
new CommandJudge("gradle build")
// Run tests
new CommandJudge("gradle test")
// Clean build
new CommandJudge("gradle clean build", 0, Duration.ofMinutes(10))
1.5. Judgment Structure
When command succeeds:
Judgment {
status = PASS
score = BooleanScore(true)
reasoning = "Command succeeded with exit code 0"
checks = [
Check(name="command_execution", passed=true, message="Command executed successfully")
]
metadata = {
"command": "mvn test",
"exitCode": 0,
"expectedExitCode": 0,
"output": "[maven output...]",
"duration": "PT45.2S"
}
}
When command fails:
Judgment {
status = FAIL
score = BooleanScore(false)
reasoning = "Command failed. Expected exit code 0 but got 1"
checks = [
Check(name="command_execution", passed=false, message="Command execution failed")
]
metadata = {
"command": "mvn test",
"exitCode": 1,
"expectedExitCode": 0,
"output": "[maven error output...]",
"duration": "PT12.5S"
}
}
1.6. Accessing Command Output
The judgment metadata includes command output:
Judgment judgment = response.getJudgment();
if (!judgment.pass()) {
// Get command output for debugging
String output = (String) judgment.metadata().get("output");
Integer exitCode = (Integer) judgment.metadata().get("exitCode");
logger.error("Command failed with exit code {}", exitCode);
logger.error("Output:\n{}", output);
// Parse output for specific errors
if (output.contains("BUILD FAILURE")) {
logger.error("Maven build failed");
}
}
1.7. Timeout Handling
Commands that exceed timeout are terminated:
// Long-running command with appropriate timeout
Judge judge = new CommandJudge(
"npm install",
0,
Duration.ofMinutes(10) // Allow sufficient time
);
Judgment judgment = judge.judge(context);
if (judgment.status() == JudgmentStatus.FAIL) {
String output = (String) judgment.metadata().get("output");
if (output.contains("timeout") || output.contains("killed")) {
logger.error("Command timed out after 10 minutes");
}
}
2. BuildSuccessJudge
Specialized judge for build verification with smart wrapper detection and longer default timeout.
2.1. Maven Build Verification
BuildSuccessJudge.maven()
auto-detects Maven wrapper:
import org.springaicommunity.agents.judge.exec.BuildSuccessJudge;
// Auto-detects ./mvnw or falls back to mvn
Judge judge = BuildSuccessJudge.maven("clean", "install");
AgentClientResponse response = agentClientBuilder
.goal("Fix failing tests and build the project")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(judge)
.build())
.call();
if (response.isJudgmentPassed()) {
System.out.println("✓ Build successful - safe to deploy");
deploy();
} else {
System.out.println("✗ Build failed");
}
Wrapper Detection:
-
Checks for
./mvnw
in workspace -
Uses
./mvnw
if found and executable -
Falls back to
mvn
on PATH otherwise
Workspace contains mvnw → executes: ./mvnw clean install
Workspace lacks mvnw → executes: mvn clean install
2.2. Gradle Build Verification
BuildSuccessJudge.gradle()
auto-detects Gradle wrapper:
// Auto-detects ./gradlew or falls back to gradle
Judge judge = BuildSuccessJudge.gradle("build");
AgentClientResponse response = agentClientBuilder
.goal("Build the Gradle project")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(judge)
.build())
.call();
Wrapper Detection:
-
Checks for
./gradlew
in workspace -
Uses
./gradlew
if found and executable -
Falls back to
gradle
on PATH otherwise
2.3. Custom Build Commands
For non-Maven/Gradle builds:
// npm build
Judge npmJudge = new BuildSuccessJudge("npm run build");
// Cargo (Rust)
Judge cargoJudge = new BuildSuccessJudge("cargo build --release");
// Make
Judge makeJudge = new BuildSuccessJudge("make all");
// Custom script
Judge customJudge = new BuildSuccessJudge("./build.sh");
2.4. Build-Specific Timeout
BuildSuccessJudge
uses 10-minute default timeout (vs 2-minute for CommandJudge
):
// BuildSuccessJudge - 10 minute timeout (default)
BuildSuccessJudge.maven("clean", "install");
// Allows time for dependency downloads, compilation, tests
// CommandJudge - 2 minute timeout (default)
new CommandJudge("mvn clean install");
// May timeout on large builds
Recommendation: Use BuildSuccessJudge
for build commands, CommandJudge
for quick verifications.
2.5. Common Maven Goals
// Compile only
BuildSuccessJudge.maven("compile");
// Run tests
BuildSuccessJudge.maven("test");
// Clean and compile
BuildSuccessJudge.maven("clean", "compile");
// Full build with tests
BuildSuccessJudge.maven("clean", "install");
// Verify (includes integration tests)
BuildSuccessJudge.maven("verify");
// Skip tests
BuildSuccessJudge.maven("clean", "install", "-DskipTests");
3. Production Patterns
3.1. Pattern 1: CI/CD Build Verification
Verify builds before deployment:
@Service
public class ContinuousDeployment {
private final AgentClient.Builder agentClientBuilder;
public void fixAndDeploy(Path projectRoot) {
// Step 1: Fix failing tests
AgentClientResponse fixResponse = agentClientBuilder
.goal("Fix all failing unit tests")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(BuildSuccessJudge.maven("test"))
.build())
.call();
if (!fixResponse.isJudgmentPassed()) {
throw new CIException("Tests still failing after fix attempt");
}
// Step 2: Full build
AgentClientResponse buildResponse = agentClientBuilder
.goal("Build the project")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(BuildSuccessJudge.maven("clean", "install"))
.build())
.call();
if (!buildResponse.isJudgmentPassed()) {
throw new CIException("Build failed");
}
// Safe to deploy
deploy(projectRoot);
}
}
3.2. Pattern 2: Multi-Stage Build Verification
Verify each build stage independently:
public class MultiStageBuild {
public void buildProject(Path projectRoot) {
// Stage 1: Compilation
verifyStage(
projectRoot,
"Fix compilation errors",
BuildSuccessJudge.maven("compile")
);
// Stage 2: Unit tests
verifyStage(
projectRoot,
"Fix unit test failures",
BuildSuccessJudge.maven("test")
);
// Stage 3: Integration tests
verifyStage(
projectRoot,
"Fix integration test failures",
BuildSuccessJudge.maven("verify")
);
System.out.println("✓ All build stages passed");
}
private void verifyStage(Path projectRoot, String goal, Judge judge) {
AgentClientResponse response = agentClientBuilder
.goal(goal)
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder().judge(judge).build())
.call();
if (!response.isJudgmentPassed()) {
throw new BuildException("Build stage failed: " + goal);
}
}
}
3.3. Pattern 3: Quality Gates
Combine build with quality checks:
public class QualityGate {
public void enforceQuality(Path projectRoot) {
AgentClientResponse response = agentClientBuilder
.goal("Ensure code meets quality standards")
.workingDirectory(projectRoot)
.advisors(
// Build must succeed
JudgeAdvisor.builder()
.judge(BuildSuccessJudge.maven("clean", "install"))
.build(),
// Code coverage check
JudgeAdvisor.builder()
.judge(new CommandJudge("mvn jacoco:check"))
.build(),
// Code style check
JudgeAdvisor.builder()
.judge(new CommandJudge("mvn checkstyle:check"))
.build(),
// Security scan
JudgeAdvisor.builder()
.judge(new CommandJudge("mvn dependency:check"))
.build()
)
.call();
if (!response.isJudgmentPassed()) {
throw new QualityException("Quality gate failed");
}
}
}
3.4. Pattern 4: Build Output Analysis
Parse build output for specific issues:
public class BuildAnalyzer {
public void analyzeAndFix(Path projectRoot) {
AgentClientResponse response = agentClientBuilder
.goal("Build the project")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(BuildSuccessJudge.maven("clean", "install"))
.build())
.call();
Judgment judgment = response.getJudgment();
if (!judgment.pass()) {
String output = (String) judgment.metadata().get("output");
// Analyze output for specific issues
if (output.contains("compilation error")) {
logger.error("Compilation errors detected");
handleCompilationErrors(output);
}
else if (output.contains("test failures")) {
logger.error("Test failures detected");
handleTestFailures(output);
}
else if (output.contains("dependency resolution failed")) {
logger.error("Dependency issues detected");
handleDependencyIssues(output);
}
throw new BuildException("Build failed - see analysis above");
}
}
}
3.5. Pattern 5: Incremental Build Verification
Verify builds incrementally during development:
public class IncrementalBuild {
public void developFeature(Path projectRoot, String featureName) {
// Quick compile check
quickCheck(projectRoot, "Ensure code compiles",
BuildSuccessJudge.maven("compile"));
// Unit test check
quickCheck(projectRoot, "Ensure unit tests pass",
BuildSuccessJudge.maven("test"));
// Full verification before commit
AgentClientResponse response = agentClientBuilder
.goal("Complete " + featureName + " feature")
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder()
.judge(BuildSuccessJudge.maven("clean", "verify"))
.build())
.call();
if (response.isJudgmentPassed()) {
gitCommit(featureName);
createPullRequest(featureName);
}
}
private void quickCheck(Path projectRoot, String goal, Judge judge) {
AgentClientResponse response = agentClientBuilder
.goal(goal)
.workingDirectory(projectRoot)
.advisors(JudgeAdvisor.builder().judge(judge).build())
.call();
if (!response.isJudgmentPassed()) {
throw new BuildException("Quick check failed: " + goal);
}
}
}
4. Error Handling
4.1. Command Not Found
Judge judge = new CommandJudge("nonexistent-command");
Judgment judgment = judge.judge(context);
// Status: FAIL
// Reasoning: "Command execution failed: command not found"
assertThat(judgment.pass()).isFalse();
5. Performance Considerations
Command execution times vary significantly:
Command Type | Typical Duration | Timeout Recommendation |
---|---|---|
Quick compile |
10-30 seconds |
2 minutes (default) |
Unit tests |
30-120 seconds |
5 minutes |
Full build (Maven) |
2-5 minutes |
10 minutes (BuildSuccessJudge default) |
Integration tests |
3-10 minutes |
15 minutes |
Docker builds |
5-15 minutes |
20 minutes |
Best practice: Set timeout to 2-3x expected duration to account for variability.
6. Best Practices
6.1. 1. Use BuildSuccessJudge for Builds
// ✅ Good: Auto-detects wrapper, appropriate timeout
BuildSuccessJudge.maven("clean", "install");
// ❌ Manual: Must specify wrapper path, default timeout too short
new CommandJudge("./mvnw clean install");
6.2. 2. Set Appropriate Timeouts
// ✅ Good: Generous timeout for build
new CommandJudge("npm install", 0, Duration.ofMinutes(10));
// ❌ Risky: Default 2-minute timeout may be insufficient
new CommandJudge("npm install");
7. Next Steps
-
File Judges: File verification and content checks
-
LLM Judges: AI-based evaluation
-
Judge Advisor: Integration with AgentClient
-
Deterministic Overview: All deterministic judge types
8. Further Reading
-
Judge API Overview - Complete Judge API documentation
-
Your First Judge - Practical introduction
-
CLI Agents - Understanding autonomous agents
Command judges provide fast, reliable verification of shell commands and builds. They’re essential for CI/CD pipelines and production agent workflows.