Voting Strategies: Aggregating Judge Opinions
Voting strategies determine how a jury aggregates individual judge opinions into a single verdict. Spring AI Agents provides five built-in strategies, each optimized for different evaluation scenarios.
1. Overview
A VotingStrategy
implements the aggregation logic:
public interface VotingStrategy {
Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights);
String getName();
}
Key responsibilities:
-
Convert scores to common format (boolean, numerical)
-
Apply weights if provided
-
Handle edge cases (ties, errors, abstentions)
-
Return aggregated
Judgment
with reasoning
2. Built-in Strategies
Spring AI Agents provides five voting strategies:
Strategy | Use Case | Pass Condition |
---|---|---|
Majority |
Binary decisions |
More than 50% judges pass |
Average |
Equal importance |
Average score >= 0.5 |
Weighted Average |
Unequal importance |
Weighted average >= 0.5 |
Median |
Outlier resistance |
Median score >= 0.5 |
Consensus |
Strict agreement |
All judges must agree |
3. Majority Voting
Purpose: Democratic voting where the majority opinion wins.
3.1. Basic Usage
import org.springaicommunity.agents.judge.jury.MajorityVotingStrategy;
VotingStrategy strategy = new MajorityVotingStrategy();
Jury jury = Juries.builder()
.addJudge("build", new BuildSuccessJudge())
.addJudge("files", new FileExistsJudge("README.md"))
.addJudge("quality", new CorrectnessJudge(chatClient))
.votingStrategy(strategy)
.build();
Verdict verdict = jury.vote(context);
// Verdict passes if 2 out of 3 judges pass
3.2. How It Works
// From MajorityVotingStrategy.java
@Override
public Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights) {
// Count pass/fail (excluding abstentions)
long passCount = judgments.stream()
.filter(j -> j.status() == JudgmentStatus.PASS)
.count();
long failCount = judgments.stream()
.filter(j -> j.status() == JudgmentStatus.FAIL)
.count();
// Check for tie
if (passCount == failCount) {
return applyTiePolicy(passCount, failCount);
}
// Determine majority
boolean majorityPass = passCount > failCount;
return Judgment.builder()
.score(new BooleanScore(majorityPass))
.status(majorityPass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.reasoning(String.format("Majority vote: %d passed, %d failed",
passCount, failCount))
.build();
}
Score conversion:
-
BooleanScore(true)
→ PASS -
BooleanScore(false)
→ FAIL -
NumericalScore >= 0.5
→ PASS -
NumericalScore < 0.5
→ FAIL
3.3. Tie and Error Policies
Majority voting supports configurable policies:
// Default: ties fail, errors treated as failures
VotingStrategy defaultStrategy = new MajorityVotingStrategy();
// Custom: ties pass, errors ignored
VotingStrategy optimistic = new MajorityVotingStrategy(
TiePolicy.PASS,
ErrorPolicy.IGNORE
);
// Custom: ties abstain, errors treated as abstentions
VotingStrategy neutral = new MajorityVotingStrategy(
TiePolicy.ABSTAIN,
ErrorPolicy.TREAT_AS_ABSTAIN
);
TiePolicy options:
-
PASS
- Optimistic (benefit of the doubt) -
FAIL
- Pessimistic (safest, default) -
ABSTAIN
- Neutral (defer decision)
ErrorPolicy options:
-
TREAT_AS_FAIL
- Safest (default) -
TREAT_AS_ABSTAIN
- Neutral (ignore in vote) -
IGNORE
- Skip errored judgments entirely
3.4. When to Use
// ✅ Good: Binary decisions
Jury deploymentGate = Juries.builder()
.addJudge("build", new BuildSuccessJudge())
.addJudge("tests", new CommandJudge("npm test"))
.addJudge("security", new CommandJudge("npm audit"))
.votingStrategy(new MajorityVotingStrategy())
.build();
// ❌ Poor: When judges have unequal importance
// Use WeightedAverage instead
4. Average Voting
Purpose: Simple average of all scores (equal importance).
4.1. Basic Usage
import org.springaicommunity.agents.judge.jury.AverageVotingStrategy;
VotingStrategy strategy = new AverageVotingStrategy();
Jury jury = Juries.builder()
.addJudge("quality", new CodeQualityJudge(chatClient)) // Returns 8.0/10
.addJudge("maintainability", new CustomJudge()) // Returns 7.0/10
.addJudge("performance", new CustomJudge()) // Returns 6.0/10
.votingStrategy(strategy)
.build();
// Average: (0.8 + 0.7 + 0.6) / 3 = 0.7 (PASS)
4.2. How It Works
// From AverageVotingStrategy.java
@Override
public Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights) {
double sum = judgments.stream()
.mapToDouble(j -> toNumerical(j.score()))
.sum();
double average = sum / judgments.size();
boolean pass = average >= 0.5; // Threshold: 0.5
return Judgment.builder()
.score(new NumericalScore(average, 0.0, 1.0))
.status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.reasoning(String.format("Average score: %.2f (threshold: 0.5)", average))
.build();
}
private double toNumerical(Score score) {
if (score instanceof BooleanScore bs) {
return bs.value() ? 1.0 : 0.0;
}
else if (score instanceof NumericalScore ns) {
return ns.normalized(); // Normalized to [0.0, 1.0]
}
return 0.0;
}
Score normalization:
-
BooleanScore(true)
→ 1.0 -
BooleanScore(false)
→ 0.0 -
NumericalScore(8.0, 0, 10)
→ 0.8 -
NumericalScore(3.5, 0, 5)
→ 0.7
4.3. When to Use
// ✅ Good: All judges equally important
Jury qualityJury = Juries.builder()
.addJudge("readability", new ReadabilityJudge(chatClient))
.addJudge("maintainability", new MaintainabilityJudge(chatClient))
.addJudge("testability", new TestabilityJudge(chatClient))
.votingStrategy(new AverageVotingStrategy())
.build();
// ❌ Poor: Judges have different importance
// Use WeightedAverage instead
5. Weighted Average
Purpose: Average with configurable importance per judge.
5.1. Basic Usage
import org.springaicommunity.agents.judge.jury.WeightedAverageStrategy;
Jury jury = Juries.builder()
.addJudge("build", new BuildSuccessJudge(), 0.5) // 50% weight
.addJudge("quality", new CorrectnessJudge(chatClient), 0.3) // 30% weight
.addJudge("docs", new CustomJudge(), 0.2) // 20% weight
.votingStrategy(new WeightedAverageStrategy())
.build();
// Weighted average: (1.0 * 0.5) + (0.8 * 0.3) + (0.6 * 0.2) = 0.86 (PASS)
5.2. How It Works
// From WeightedAverageStrategy.java
@Override
public Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights) {
// If no weights, fall back to simple average
if (weights == null || weights.isEmpty()) {
return new AverageVotingStrategy().aggregate(judgments, weights);
}
double weightedSum = 0.0;
double weightSum = 0.0;
for (int i = 0; i < judgments.size(); i++) {
String key = String.valueOf(i);
double weight = weights.getOrDefault(key, 1.0);
double score = toNumerical(judgments.get(i).score());
weightedSum += score * weight;
weightSum += weight;
}
double weightedAverage = weightedSum / weightSum;
boolean pass = weightedAverage >= 0.5;
return Judgment.builder()
.score(new NumericalScore(weightedAverage, 0.0, 1.0))
.status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.reasoning(String.format("Weighted average: %.2f", weightedAverage))
.build();
}
Weight normalization:
Weights do NOT need to sum to 1.0—they are normalized automatically:
// These are equivalent:
.addJudge("a", judgeA, 0.5)
.addJudge("b", judgeB, 0.3)
.addJudge("c", judgeC, 0.2)
// Same as:
.addJudge("a", judgeA, 5.0)
.addJudge("b", judgeB, 3.0)
.addJudge("c", judgeC, 2.0)
// Both normalize to 50%, 30%, 20%
5.3. Production Example
@Service
public class ProductionDeploymentService {
public void validateDeployment(Path projectRoot) {
Jury deploymentJury = Juries.builder()
// Critical: Build must succeed (50%)
.addJudge("build", BuildSuccessJudge.maven("clean", "install"), 0.5)
// Important: Tests must pass (30%)
.addJudge("tests", BuildSuccessJudge.maven("test"), 0.3)
// Nice-to-have: Documentation exists (20%)
.addJudge("docs", new FileExistsJudge("README.md"), 0.2)
.votingStrategy(new WeightedAverageStrategy())
.build();
AgentClientResponse response = agentClientBuilder
.goal("Prepare application for production")
.workingDirectory(projectRoot)
.advisors(JuryAdvisor.builder()
.jury(deploymentJury)
.build())
.call();
Verdict verdict = response.getVerdict();
if (verdict.aggregated().pass()) {
deploy(projectRoot);
} else {
logger.error("Deployment blocked: {}",
verdict.aggregated().reasoning());
}
}
}
5.4. When to Use
// ✅ Good: Critical vs nice-to-have checks
Jury prioritizedJury = Juries.builder()
.addJudge("security", AgentJudge.securityAudit(agentClient), 0.6) // Critical
.addJudge("quality", new CorrectnessJudge(chatClient), 0.3) // Important
.addJudge("style", new CustomJudge(), 0.1) // Minor
.votingStrategy(new WeightedAverageStrategy())
.build();
// ❌ Poor: All judges equally important
// Use AverageVotingStrategy instead
6. Median Voting
Purpose: Robust to outliers and extreme scores.
6.1. Basic Usage
import org.springaicommunity.agents.judge.jury.MedianVotingStrategy;
VotingStrategy strategy = new MedianVotingStrategy();
Jury jury = Juries.builder()
.addJudge("judge1", judgeA) // Returns 9.0/10
.addJudge("judge2", judgeB) // Returns 8.0/10
.addJudge("judge3", judgeC) // Returns 2.0/10 (outlier)
.votingStrategy(strategy)
.build();
// Average would be: (0.9 + 0.8 + 0.2) / 3 = 0.63
// Median is: 0.8 (middle value, outlier ignored)
6.2. How It Works
// From MedianVotingStrategy.java
@Override
public Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights) {
List<Double> scores = judgments.stream()
.map(j -> toNumerical(j.score()))
.sorted()
.toList();
double median;
int size = scores.size();
if (size % 2 == 0) {
// Even number: average of two middle values
median = (scores.get(size / 2 - 1) + scores.get(size / 2)) / 2.0;
} else {
// Odd number: middle value
median = scores.get(size / 2);
}
boolean pass = median >= 0.5;
return Judgment.builder()
.score(new NumericalScore(median, 0.0, 1.0))
.status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.reasoning(String.format("Median score: %.2f", median))
.build();
}
Median calculation examples:
// Odd number of judges (3)
Scores: [0.2, 0.8, 0.9]
Median: 0.8 (middle value)
// Even number of judges (4)
Scores: [0.3, 0.7, 0.8, 0.9]
Median: (0.7 + 0.8) / 2 = 0.75 (average of two middle values)
6.3. When to Use
Median voting is inspired by statistical robustness techniques used in evaluation frameworks like ragas. When combining multiple LLM judges, median aggregation prevents a single outlier judgment (due to model hallucination or prompt misinterpretation) from skewing the entire verdict. |
// ✅ Good: Multiple LLM judges (outlier risk)
Jury llmJury = Juries.builder()
.addJudge("gpt4", new CorrectnessJudge(gpt4Client))
.addJudge("claude", new CorrectnessJudge(claudeClient))
.addJudge("gemini", new CorrectnessJudge(geminiClient))
.votingStrategy(new MedianVotingStrategy()) // Robust to outliers
.build();
// ✅ Good: Self-consistency with multiple runs
Jury selfConsistentJury = Juries.builder()
.addJudge("run1", new CorrectnessJudge(chatClient))
.addJudge("run2", new CorrectnessJudge(chatClient))
.addJudge("run3", new CorrectnessJudge(chatClient))
.addJudge("run4", new CorrectnessJudge(chatClient))
.addJudge("run5", new CorrectnessJudge(chatClient))
.votingStrategy(new MedianVotingStrategy())
.build();
// ❌ Poor: All judges deterministic (no outliers)
// Use AverageVotingStrategy instead
7. Consensus Voting
Purpose: Strictest strategy requiring unanimous agreement.
7.1. Basic Usage
import org.springaicommunity.agents.judge.jury.ConsensusStrategy;
VotingStrategy strategy = new ConsensusStrategy();
Jury securityJury = Juries.builder()
.addJudge("sql_injection", new CustomJudge())
.addJudge("xss", new CustomJudge())
.addJudge("csrf", new CustomJudge())
.votingStrategy(strategy)
.build();
// Passes only if ALL judges pass (unanimous)
7.2. How It Works
// From ConsensusStrategy.java
@Override
public Judgment aggregate(List<Judgment> judgments, Map<String, Double> weights) {
long passCount = judgments.stream()
.filter(j -> toBoolean(j.score()))
.count();
long failCount = judgments.size() - passCount;
// Consensus requires all judges to agree
boolean consensus = (passCount == judgments.size())
|| (failCount == judgments.size());
boolean pass = consensus && passCount == judgments.size();
String reasoning;
if (!consensus) {
reasoning = String.format("No consensus: %d passed, %d failed",
passCount, failCount);
} else {
reasoning = String.format("Unanimous consensus: all %d judges %s",
judgments.size(), pass ? "passed" : "failed");
}
return Judgment.builder()
.score(new BooleanScore(pass))
.status(pass ? JudgmentStatus.PASS : JudgmentStatus.FAIL)
.reasoning(reasoning)
.build();
}
Consensus logic:
// Scenario 1: All pass → PASS
Judges: [PASS, PASS, PASS]
Result: PASS (unanimous consensus)
// Scenario 2: All fail → FAIL
Judges: [FAIL, FAIL, FAIL]
Result: FAIL (unanimous consensus)
// Scenario 3: Mixed → FAIL
Judges: [PASS, PASS, FAIL]
Result: FAIL (no consensus)
// Scenario 4: Mixed → FAIL
Judges: [PASS, FAIL, FAIL]
Result: FAIL (no consensus)
7.3. When to Use
// ✅ Good: Security checks (zero tolerance)
Jury securityGate = Juries.builder()
.addJudge("sql_injection", new SecurityJudge("SQL Injection"))
.addJudge("xss", new SecurityJudge("XSS"))
.addJudge("csrf", new SecurityJudge("CSRF"))
.addJudge("secrets", new SecurityJudge("Hardcoded Secrets"))
.votingStrategy(new ConsensusStrategy())
.build();
// All security checks must pass (no exceptions)
// ✅ Good: Compliance requirements
Jury complianceJury = Juries.builder()
.addJudge("gdpr", new ComplianceJudge("GDPR"))
.addJudge("hipaa", new ComplianceJudge("HIPAA"))
.addJudge("sox", new ComplianceJudge("SOX"))
.votingStrategy(new ConsensusStrategy())
.build();
// ❌ Poor: Most situations (too strict)
// Use MajorityVotingStrategy instead
8. Choosing a Strategy
Decision tree for selecting the right voting strategy:
Do all judges have equal importance?
├─ Yes: Are you concerned about outliers?
│ ├─ Yes: MedianVotingStrategy
│ └─ No: AverageVotingStrategy
└─ No: WeightedAverageStrategy
Is unanimous agreement required?
└─ Yes: ConsensusStrategy
Is it a binary decision (pass/fail)?
└─ Yes: MajorityVotingStrategy
8.1. Strategy Comparison
Strategy | Sensitivity | Outlier Handling | Best For | Avoid For |
---|---|---|---|---|
Majority |
Medium |
Not applicable |
Binary decisions |
Numerical scores |
Average |
High |
Sensitive |
Equal importance |
Outliers present |
Weighted Average |
High |
Sensitive |
Unequal importance |
Outliers present |
Median |
Low |
Robust |
Outlier resistance |
Need every opinion |
Consensus |
Extreme |
Not applicable |
Zero tolerance |
Most situations |
9. Production Patterns
9.1. Pattern 1: Layered Evaluation
Combine strategies for different criteria:
@Service
public class LayeredEvaluation {
public void evaluateFeature(Path workspace) {
// Layer 1: Security (consensus required)
Jury securityLayer = Juries.builder()
.addJudge("sql", new SecurityJudge("SQL"))
.addJudge("xss", new SecurityJudge("XSS"))
.votingStrategy(new ConsensusStrategy())
.build();
// Layer 2: Quality (weighted)
Jury qualityLayer = Juries.builder()
.addJudge("build", BuildSuccessJudge.maven("test"), 0.5)
.addJudge("quality", new CorrectnessJudge(chatClient), 0.5)
.votingStrategy(new WeightedAverageStrategy())
.build();
AgentClientResponse response = agentClientBuilder
.goal("Implement new feature")
.advisors(
// Security must pass (consensus)
JuryAdvisor.builder()
.jury(securityLayer)
.order(100)
.build(),
// Quality evaluated after (weighted)
JuryAdvisor.builder()
.jury(qualityLayer)
.order(200)
.build()
)
.call();
}
}
9.2. Pattern 2: Self-Consistency
Multiple runs with median aggregation:
@Service
public class SelfConsistentEvaluation {
public Verdict evaluateWithConsistency(JudgmentContext context, int runs) {
// Run same judge N times
Juries.Builder juryBuilder = Juries.builder()
.votingStrategy(new MedianVotingStrategy());
for (int i = 0; i < runs; i++) {
juryBuilder.addJudge("run" + i, new CorrectnessJudge(chatClientBuilder));
}
Jury jury = juryBuilder.build();
return jury.vote(context);
}
}
This pattern is used in ragas for robust LLM evaluation.
9.3. Pattern 3: Progressive Strictness
Start lenient, get stricter over time:
@Service
public class ProgressiveEvaluation {
public void evaluateWithProgress(Path workspace, int iteration) {
VotingStrategy strategy = switch(iteration) {
case 1 -> new MajorityVotingStrategy(); // Lenient (iteration 1)
case 2 -> new AverageVotingStrategy(); // Moderate (iteration 2)
case 3 -> new WeightedAverageStrategy(); // Stricter (iteration 3)
default -> new ConsensusStrategy(); // Strictest (final)
};
Jury jury = Juries.builder()
.addJudge("build", new BuildSuccessJudge())
.addJudge("quality", new CorrectnessJudge(chatClient))
.addJudge("security", AgentJudge.securityAudit(agentClient))
.votingStrategy(strategy)
.build();
AgentClientResponse response = agentClientBuilder
.goal("Improve code quality")
.advisors(JuryAdvisor.builder()
.jury(jury)
.build())
.call();
}
}
9.4. Pattern 4: Dynamic Weight Adjustment
Adjust weights based on context:
@Service
public class DynamicWeightEvaluation {
public Verdict evaluate(Path workspace, Environment env) {
// Production: security critical (70%)
// Development: speed critical (30%)
double securityWeight = env == Environment.PRODUCTION ? 0.7 : 0.3;
double speedWeight = 1.0 - securityWeight;
Jury jury = Juries.builder()
.addJudge("security", AgentJudge.securityAudit(agentClient), securityWeight)
.addJudge("speed", new CommandJudge("time npm test"), speedWeight)
.votingStrategy(new WeightedAverageStrategy())
.build();
return jury.vote(JudgmentContext.builder()
.goal("Deploy application")
.workspace(workspace)
.build());
}
}
10. Best Practices
10.1. 1. Match Strategy to Use Case
// ✅ Good: Security → Consensus
new ConsensusStrategy()
// ✅ Good: Quality → Weighted Average
new WeightedAverageStrategy()
// ✅ Good: Multiple LLMs → Median
new MedianVotingStrategy()
// ❌ Poor: Security → Average (too lenient)
new AverageVotingStrategy()
10.2. 2. Handle Edge Cases
// Configure policies for robustness
VotingStrategy robust = new MajorityVotingStrategy(
TiePolicy.FAIL, // Conservative on ties
ErrorPolicy.TREAT_AS_FAIL // Conservative on errors
);
VotingStrategy optimistic = new MajorityVotingStrategy(
TiePolicy.PASS, // Optimistic on ties
ErrorPolicy.IGNORE // Ignore errors
);
10.3. 3. Document Weight Rationale
Jury jury = Juries.builder()
// Critical: Must compile (40%)
.addJudge("compile", BuildSuccessJudge.maven("compile"), 0.4)
// Critical: Tests must pass (40%)
.addJudge("tests", BuildSuccessJudge.maven("test"), 0.4)
// Nice-to-have: Documentation exists (20%)
.addJudge("docs", new FileExistsJudge("README.md"), 0.2)
.votingStrategy(new WeightedAverageStrategy())
.build();
10.4. 4. Log Detailed Reasoning
Verdict verdict = jury.vote(context);
logger.info("Voting Strategy: {}", jury.getVotingStrategy().getName());
logger.info("Aggregated: {}", verdict.aggregated().reasoning());
logger.info("Individual judgments:");
verdict.individual().forEach(j -> {
logger.info(" - Judge: {} → {} (score: {})",
j.metadata().get("judge_name"),
j.status(),
j.score());
});
11. Next Steps
-
Jury Overview: Ensemble evaluation pattern
-
JudgeAdvisor: Integration with AgentClient
-
LLM Judges: AI-powered evaluation
-
Agent as Judge: Agent evaluating agent
12. Further Reading
-
Judge API Overview - Complete Judge API documentation
-
Your First Judge - Practical introduction
Voting strategies provide flexible aggregation logic for combining multiple judge opinions. Choose the strategy that matches your evaluation requirements for optimal results.