Core Concepts
This section explains the fundamental concepts and architecture of Spring AI Watsonx.ai integration.
Overview
Spring AI Watsonx.ai provides a Spring-native way to integrate IBM Watsonx.ai foundation models into your applications. It follows Spring AI's unified API design while leveraging Watsonx.ai's powerful AI capabilities.
Architecture
Component Overview
┌─────────────────────────────────────────────────────────┐
│ Spring Application │
├─────────────────────────────────────────────────────────┤
│ Spring AI Interfaces │
│ (ChatModel, EmbeddingModel, etc.) │
├─────────────────────────────────────────────────────────┤
│ Spring AI Watsonx.ai Implementation │
│ (WatsonxAiChatModel, WatsonxAiEmbeddingModel) │
├─────────────────────────────────────────────────────────┤
│ Watsonx.ai API Client │
│ (HTTP Client, Authentication) │
├─────────────────────────────────────────────────────────┤
│ IBM Watsonx.ai Service │
└─────────────────────────────────────────────────────────┘
Module Structure
The integration consists of three main modules:
watsonx-ai-core
- Core API clients and model implementations
- Request/response objects
- Authentication handling
- No Spring dependencies
spring-ai-autoconfigure-model-watsonx-ai
- Spring Boot auto-configuration
- Configuration properties
- Bean definitions
- Conditional configuration
spring-ai-starter-model-watsonx-ai
- Spring Boot starter
- Dependency aggregation
- Quick setup for applications
Key Concepts
Chat Models
Chat models enable conversational AI through a request-response pattern:
Prompt: User input or instruction
Prompt prompt = new Prompt("What is Spring AI?");
Response: Model-generated output
ChatResponse response = chatModel.call(prompt);
String content = response.getResult().getOutput().getContent();
Streaming: Real-time response generation
Flux<ChatResponse> stream = chatModel.stream(prompt);
Embeddings
Embeddings convert text into numerical vectors for semantic operations:
Vector Representation: Text → Numbers
List<Double> embedding = embeddingModel.embed("Spring AI is great");
// [0.123, -0.456, 0.789, ...]
Similarity: Compare semantic meaning
double similarity = cosineSimilarity(embedding1, embedding2);
Use Cases:
- Semantic search
- Document clustering
- Recommendation systems
- Duplicate detection
Messages
Messages represent conversation turns:
UserMessage: User input
Message userMsg = new UserMessage("Hello!");
AssistantMessage: AI response
Message assistantMsg = new AssistantMessage("Hi! How can I help?");
SystemMessage: System instructions
Message systemMsg = new SystemMessage("You are a helpful assistant.");
Options
Options control model behavior:
Temperature: Randomness (0.0 - 2.0)
- Low (0.0-0.3): Deterministic, focused
- Medium (0.4-0.7): Balanced
- High (0.8-2.0): Creative, diverse
Max Tokens: Response length limit
Top-P: Nucleus sampling threshold
Top-K: Token selection pool size
Stop Sequences: Generation terminators
Function Calling
Enable models to use external tools:
@Bean
public FunctionCallback weatherFunction() {
return FunctionCallback.builder()
.function("getWeather", this::getWeather)
.description("Get current weather")
.inputType(WeatherRequest.class)
.build();
}
The model can decide when to call functions based on user input.
Streaming
Stream responses for better UX:
Benefits:
- Immediate feedback
- Progressive rendering
- Lower perceived latency
- Better user experience
Implementation:
Flux<ChatResponse> stream = chatModel.stream(prompt);
stream.subscribe(
chunk -> System.out.print(chunk.getResult().getOutput().getContent()),
error -> handleError(error),
() -> System.out.println("Complete!")
);
Observability
Built-in observability support:
Metrics: Track usage and performance
- Request count
- Response time
- Token usage
- Error rates
Tracing: Distributed tracing support
- Request flow
- Latency breakdown
- Error tracking
Logging: Detailed operation logs
- Request/response logging
- Error logging
- Debug information
Design Patterns
Dependency Injection
Use Spring's DI for clean architecture:
@Service
public class ChatService {
private final ChatModel chatModel;
public ChatService(ChatModel chatModel) {
this.chatModel = chatModel;
}
}
Configuration Management
Externalize configuration:
spring:
ai:
watsonx:
ai:
api-key: ${WATSONX_AI_API_KEY}
url: ${WATSONX_AI_URL}
project-id: ${WATSONX_AI_PROJECT_ID}
Error Handling
Handle errors gracefully:
try {
String response = chatModel.call(prompt);
return response;
} catch (WatsonxAiAuthenticationException e) {
log.error("Authentication failed", e);
throw new ServiceException("Unable to authenticate");
} catch (WatsonxAiRateLimitException e) {
log.warn("Rate limit exceeded", e);
return "Service temporarily unavailable";
}
Retry Logic
Implement retry for transient failures:
@Retryable(
value = WatsonxAiApiException.class,
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String callWithRetry(String prompt) {
return chatModel.call(prompt);
}
Best Practices
Security
- Never hardcode API keys
- Use environment variables
- Rotate credentials regularly
- Implement rate limiting
- Validate user input
Performance
- Cache embeddings when possible
- Use streaming for long responses
- Implement connection pooling
- Monitor token usage
- Optimize prompt length
Reliability
- Implement retry logic
- Handle rate limits gracefully
- Provide fallback responses
- Monitor error rates
- Set appropriate timeouts
Cost Optimization
- Choose appropriate models
- Limit max tokens
- Cache common responses
- Batch embedding requests
- Monitor usage metrics
Integration Patterns
RAG (Retrieval Augmented Generation)
Combine retrieval with generation:
User Query → Vector Search → Retrieve Documents →
Build Context → Generate Response
Multi-Agent Systems
Coordinate multiple AI agents:
User Input → Router Agent → Specialized Agents →
Aggregator → Final Response
Conversational Memory
Maintain conversation context:
Session → Message History → Context Window →
Model Call → Update History