Core Concepts
This section explains the fundamental concepts and architecture of Spring AI Watsonx.ai integration.
Overview
Spring AI Watsonx.ai provides a Spring-native way to integrate IBM Watsonx.ai foundation models into your applications. It follows Spring AI’s unified API design while leveraging Watsonx.ai’s powerful AI capabilities.
Architecture
Component Overview
┌─────────────────────────────────────────────────────────┐
│ Spring Application │
├─────────────────────────────────────────────────────────┤
│ Spring AI Interfaces │
│ (ChatModel, EmbeddingModel, etc.) │
├─────────────────────────────────────────────────────────┤
│ Spring AI Watsonx.ai Implementation │
│ (WatsonxAiChatModel, WatsonxAiEmbeddingModel) │
├─────────────────────────────────────────────────────────┤
│ Watsonx.ai API Client │
│ (HTTP Client, Authentication) │
├─────────────────────────────────────────────────────────┤
│ IBM Watsonx.ai Service │
└─────────────────────────────────────────────────────────┘
Module Structure
The integration consists of three main modules:
watsonx-ai-core
-
Core API clients and model implementations
-
Request/response objects
-
Authentication handling
-
No Spring dependencies
spring-ai-autoconfigure-model-watsonx-ai
-
Spring Boot auto-configuration
-
Configuration properties
-
Bean definitions
-
Conditional configuration
spring-ai-starter-model-watsonx-ai
-
Spring Boot starter
-
Dependency aggregation
-
Quick setup for applications
Key Concepts
Chat Models
Chat models enable conversational AI through a request-response pattern:
Prompt: User input or instruction
Prompt prompt = new Prompt("What is Spring AI?");
Response: Model-generated output
ChatResponse response = chatModel.call(prompt);
String content = response.getResult().getOutput().getContent();
Streaming: Real-time response generation
Flux<ChatResponse> stream = chatModel.stream(prompt);
Embeddings
Embeddings convert text into numerical vectors for semantic operations:
Vector Representation: Text → Numbers
List<Double> embedding = embeddingModel.embed("Spring AI is great");
// [0.123, -0.456, 0.789, ...]
Similarity: Compare semantic meaning
double similarity = cosineSimilarity(embedding1, embedding2);
Use Cases: * Semantic search * Document clustering * Recommendation systems * Duplicate detection
Messages
Messages represent conversation turns:
UserMessage: User input
Message userMsg = new UserMessage("Hello!");
AssistantMessage: AI response
Message assistantMsg = new AssistantMessage("Hi! How can I help?");
SystemMessage: System instructions
Message systemMsg = new SystemMessage("You are a helpful assistant.");
Options
Options control model behavior:
Temperature: Randomness (0.0 - 2.0) * Low (0.0-0.3): Deterministic, focused * Medium (0.4-0.7): Balanced * High (0.8-2.0): Creative, diverse
Max Tokens: Response length limit
Top-P: Nucleus sampling threshold
Top-K: Token selection pool size
Stop Sequences: Generation terminators
Function Calling
Enable models to use external tools:
@Bean
public FunctionCallback weatherFunction() {
return FunctionCallback.builder()
.function("getWeather", this::getWeather)
.description("Get current weather")
.inputType(WeatherRequest.class)
.build();
}
The model can decide when to call functions based on user input.
Streaming
Stream responses for better UX:
Benefits: * Immediate feedback * Progressive rendering * Lower perceived latency * Better user experience
Implementation:
Flux<ChatResponse> stream = chatModel.stream(prompt);
stream.subscribe(
chunk -> System.out.print(chunk.getResult().getOutput().getContent()),
error -> handleError(error),
() -> System.out.println("Complete!")
);
Observability
Built-in observability support:
Metrics: Track usage and performance * Request count * Response time * Token usage * Error rates
Tracing: Distributed tracing support * Request flow * Latency breakdown * Error tracking
Logging: Detailed operation logs * Request/response logging * Error logging * Debug information
Design Patterns
Dependency Injection
Use Spring’s DI for clean architecture:
@Service
public class ChatService {
private final ChatModel chatModel;
public ChatService(ChatModel chatModel) {
this.chatModel = chatModel;
}
}
Configuration Management
Externalize configuration:
spring:
ai:
watsonx:
ai:
api-key: ${WATSONX_AI_API_KEY}
url: ${WATSONX_AI_URL}
project-id: ${WATSONX_AI_PROJECT_ID}
Error Handling
Handle errors gracefully:
try {
String response = chatModel.call(prompt);
return response;
} catch (WatsonxAiAuthenticationException e) {
log.error("Authentication failed", e);
throw new ServiceException("Unable to authenticate");
} catch (WatsonxAiRateLimitException e) {
log.warn("Rate limit exceeded", e);
return "Service temporarily unavailable";
}
Retry Logic
Implement retry for transient failures:
@Retryable(
value = WatsonxAiApiException.class,
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String callWithRetry(String prompt) {
return chatModel.call(prompt);
}
Best Practices
Security
-
Never hardcode API keys
-
Use environment variables
-
Rotate credentials regularly
-
Implement rate limiting
-
Validate user input
Performance
-
Cache embeddings when possible
-
Use streaming for long responses
-
Implement connection pooling
-
Monitor token usage
-
Optimize prompt length
Reliability
-
Implement retry logic
-
Handle rate limits gracefully
-
Provide fallback responses
-
Monitor error rates
-
Set appropriate timeouts
Cost Optimization
-
Choose appropriate models
-
Limit max tokens
-
Cache common responses
-
Batch embedding requests
-
Monitor usage metrics
Integration Patterns
RAG (Retrieval Augmented Generation)
Combine retrieval with generation:
User Query → Vector Search → Retrieve Documents →
Build Context → Generate Response
Multi-Agent Systems
Coordinate multiple AI agents:
User Input → Router Agent → Specialized Agents →
Aggregator → Final Response
Conversational Memory
Maintain conversation context:
Session → Message History → Context Window →
Model Call → Update History