Version: 1.1.0 (Latest)

Core Concepts

This section explains the fundamental concepts and architecture of Spring AI Watsonx.ai integration.

Overview

Spring AI Watsonx.ai provides a Spring-native way to integrate IBM Watsonx.ai foundation models into your applications. It follows Spring AI's unified API design while leveraging Watsonx.ai's powerful AI capabilities.

Architecture

Component Overview

┌─────────────────────────────────────────────────────────┐
│                  Spring Application                      │
├─────────────────────────────────────────────────────────┤
│              Spring AI Interfaces                        │
│         (ChatModel, EmbeddingModel, etc.)               │
├─────────────────────────────────────────────────────────┤
│         Spring AI Watsonx.ai Implementation             │
│    (WatsonxAiChatModel, WatsonxAiEmbeddingModel)       │
├─────────────────────────────────────────────────────────┤
│              Watsonx.ai API Client                      │
│         (HTTP Client, Authentication)                    │
├─────────────────────────────────────────────────────────┤
│              IBM Watsonx.ai Service                     │
└─────────────────────────────────────────────────────────┘

Module Structure

The integration consists of three main modules:

watsonx-ai-core

Core API clients and model implementations
Request/response objects
Authentication handling
No Spring dependencies

spring-ai-autoconfigure-model-watsonx-ai

Spring Boot auto-configuration
Configuration properties
Bean definitions
Conditional configuration

spring-ai-starter-model-watsonx-ai

Spring Boot starter
Dependency aggregation
Quick setup for applications

Key Concepts

Chat Models

Chat models enable conversational AI through a request-response pattern:

Prompt: User input or instruction

Prompt prompt = new Prompt("What is Spring AI?");

Response: Model-generated output

ChatResponse response = chatModel.call(prompt);
String content = response.getResult().getOutput().getContent();

Streaming: Real-time response generation

Flux<ChatResponse> stream = chatModel.stream(prompt);

Embeddings

Embeddings convert text into numerical vectors for semantic operations:

Vector Representation: Text → Numbers

List<Double> embedding = embeddingModel.embed("Spring AI is great");
// [0.123, -0.456, 0.789, ...]

Similarity: Compare semantic meaning

double similarity = cosineSimilarity(embedding1, embedding2);

Use Cases:

Semantic search
Document clustering
Recommendation systems
Duplicate detection

Messages

Messages represent conversation turns:

UserMessage: User input

Message userMsg = new UserMessage("Hello!");

AssistantMessage: AI response

Message assistantMsg = new AssistantMessage("Hi! How can I help?");

SystemMessage: System instructions

Message systemMsg = new SystemMessage("You are a helpful assistant.");

Options

Options control model behavior:

Temperature: Randomness (0.0 - 2.0)

Low (0.0-0.3): Deterministic, focused
Medium (0.4-0.7): Balanced
High (0.8-2.0): Creative, diverse

Max Tokens: Response length limit

Top-P: Nucleus sampling threshold

Top-K: Token selection pool size

Stop Sequences: Generation terminators

Function Calling

Enable models to use external tools:

@Bean
public FunctionCallback weatherFunction() {
    return FunctionCallback.builder()
        .function("getWeather", this::getWeather)
        .description("Get current weather")
        .inputType(WeatherRequest.class)
        .build();
}

The model can decide when to call functions based on user input.

Streaming

Stream responses for better UX:

Benefits:

Immediate feedback
Progressive rendering
Lower perceived latency
Better user experience

Implementation:

Flux<ChatResponse> stream = chatModel.stream(prompt);
stream.subscribe(
    chunk -> System.out.print(chunk.getResult().getOutput().getContent()),
    error -> handleError(error),
    () -> System.out.println("Complete!")
);

Observability

Built-in observability support:

Metrics: Track usage and performance

Request count
Response time
Token usage
Error rates

Tracing: Distributed tracing support

Request flow
Latency breakdown
Error tracking

Logging: Detailed operation logs

Request/response logging
Error logging
Debug information

Design Patterns

Dependency Injection

Use Spring's DI for clean architecture:

@Service
public class ChatService {

    private final ChatModel chatModel;

    public ChatService(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
}

Configuration Management

Externalize configuration:

spring:
  ai:
    watsonx:
      ai:
        api-key: ${WATSONX_AI_API_KEY}
        url: ${WATSONX_AI_URL}
        project-id: ${WATSONX_AI_PROJECT_ID}

Error Handling

Handle errors gracefully:

try {
    String response = chatModel.call(prompt);
    return response;
} catch (WatsonxAiAuthenticationException e) {
    log.error("Authentication failed", e);
    throw new ServiceException("Unable to authenticate");
} catch (WatsonxAiRateLimitException e) {
    log.warn("Rate limit exceeded", e);
    return "Service temporarily unavailable";
}

Retry Logic

Implement retry for transient failures:

@Retryable(
    value = WatsonxAiApiException.class,
    maxAttempts = 3,
    backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String callWithRetry(String prompt) {
    return chatModel.call(prompt);
}

Best Practices

Security

Never hardcode API keys
Use environment variables
Rotate credentials regularly
Implement rate limiting
Validate user input

Performance

Cache embeddings when possible
Use streaming for long responses
Implement connection pooling
Monitor token usage
Optimize prompt length

Reliability

Implement retry logic
Handle rate limits gracefully
Provide fallback responses
Monitor error rates
Set appropriate timeouts

Cost Optimization

Choose appropriate models
Limit max tokens
Cache common responses
Batch embedding requests
Monitor usage metrics

Integration Patterns

RAG (Retrieval Augmented Generation)

Combine retrieval with generation:

User Query → Vector Search → Retrieve Documents →
Build Context → Generate Response

Multi-Agent Systems

Coordinate multiple AI agents:

User Input → Router Agent → Specialized Agents →
Aggregator → Final Response

Conversational Memory

Maintain conversation context:

Session → Message History → Context Window →
Model Call → Update History

Overview​

Architecture​

Component Overview​

Module Structure​

Key Concepts​

Chat Models​

Embeddings​

Messages​

Options​

Function Calling​

Streaming​

Observability​

Design Patterns​

Dependency Injection​

Configuration Management​

Error Handling​

Retry Logic​

Best Practices​

Security​

Performance​

Reliability​

Cost Optimization​

Integration Patterns​

RAG (Retrieval Augmented Generation)​

Multi-Agent Systems​

Conversational Memory​

See Also​