Chat Models

Table of Contents

Supported Models
Auto-configuration
Configuration Properties
Runtime Options
Sample Controller
Manual Configuration
Low-level WatsonxAiChatApi
WatsonxAiChatOptions
Function Calling

Watsonx.ai Chat Models provide powerful conversational AI capabilities for building intelligent applications. The Spring AI Watsonx.ai integration supports various foundation models available in IBM’s Watsonx.ai platform.

Supported Models

The following foundation models are supported:

IBM Granite models - IBM’s enterprise-focused language models
Meta Llama models - Meta’s open-source language models
Mistral AI models - Mistral’s efficient language models
Other foundation models available in your Watsonx.ai deployment

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the Watsonx.ai Chat Model. To enable it, add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springaicommunity</groupId>
    <artifactId>spring-ai-starter-model-watsonx-ai</artifactId>
    <version>1.0.0</version>
</dependency>

Or to your Gradle build.gradle build file:

dependencies {
    implementation 'org.springaicommunity:spring-ai-starter-model-watsonx-ai:1.0.0'
}

Configuration Properties

The prefix spring.ai.watsonx.ai.chat is used as the property prefix that lets you configure the connection to Watsonx.ai.

Property	Default	Required	Description
spring.ai.watsonx.ai.api-key		true	Your Watsonx.ai API key
spring.ai.watsonx.ai.url		true	Your Watsonx.ai service URL
spring.ai.watsonx.ai.project-id		true	Your Watsonx.ai project ID
spring.ai.watsonx.ai.chat.options.model	ibm/granite-13b-chat-v2	false	The model to use for chat completions
spring.ai.watsonx.ai.chat.options.temperature	0.7	false	Controls randomness in the response
spring.ai.watsonx.ai.chat.options.max-new-tokens	1024	false	Maximum number of tokens to generate
spring.ai.watsonx.ai.chat.options.top-p	1.0	false	Controls diversity via nucleus sampling
spring.ai.watsonx.ai.chat.options.top-k	50	false	Controls diversity by limiting vocabulary
spring.ai.watsonx.ai.chat.options.repetition-penalty	1.0	false	Penalty for repeating tokens

Property

Default

Required

Description

spring.ai.watsonx.ai.api-key

true

Your Watsonx.ai API key

spring.ai.watsonx.ai.url

true

Your Watsonx.ai service URL

spring.ai.watsonx.ai.project-id

true

Your Watsonx.ai project ID

spring.ai.watsonx.ai.chat.options.model

ibm/granite-13b-chat-v2

false

The model to use for chat completions

spring.ai.watsonx.ai.chat.options.temperature

0.7

false

Controls randomness in the response

spring.ai.watsonx.ai.chat.options.max-new-tokens

1024

false

Maximum number of tokens to generate

spring.ai.watsonx.ai.chat.options.top-p

1.0

false

Controls diversity via nucleus sampling

spring.ai.watsonx.ai.chat.options.top-k

false

Controls diversity by limiting vocabulary

spring.ai.watsonx.ai.chat.options.repetition-penalty

1.0

false

Penalty for repeating tokens

All properties prefixed with spring.ai.watsonx.ai.chat.options can be overridden at runtime by adding a request specific WatsonxAiChatOptions to the Prompt call.

Runtime Options

The WatsonxAiChatOptions.java provides model configurations, such as the model to use, the temperature, max tokens, etc.

On start-up, the default options can be configured with the WatsonxAiChatModel(api, options) constructor or the spring.ai.watsonx.ai.chat.options.* properties.

At runtime you can override the default options by adding new ones, using the WatsonxAiChatOptions.Builder, to a Prompt call. For example to override the default temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        WatsonxAiChatOptions.builder()
            .withTemperature(0.4)
            .build()
    ));

In addition to the model specific WatsonxAiChatOptions you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder().

Sample Controller

@RestController
public class ChatController {

    private final WatsonxAiChatModel chatModel;

    public ChatController(WatsonxAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
    public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return chatModel.stream(prompt);
    }
}

Manual Configuration

The WatsonxAiChatModel implements the ChatModel and StreamingChatModel and uses the [low-level-api] to connect to the Watsonx.ai service.

Add the spring-ai-watsonx-ai-core dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springaicommunity</groupId>
    <artifactId>watsonx-ai-core</artifactId>
    <version>1.0.0</version>
</dependency>

Refer to the Getting Started guide for information about adding dependencies to your build file.

Next, create a WatsonxAiChatModel and use it for text generations:

var watsonxAiApi = new WatsonxAiChatApi(apiKey, url, projectId);

var chatModel = new WatsonxAiChatModel(watsonxAiApi,
    WatsonxAiChatOptions.builder()
        .withModel("ibm/granite-13b-chat-v2")
        .withTemperature(0.4)
        .withMaxNewTokens(200)
        .build());

ChatResponse response = chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// Or with streaming
Flux<ChatResponse> response = chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

The WatsonxAiChatOptions provides the configuration information for the chat requests. The WatsonxAiChatOptions.Builder is fluent options builder.

Low-level WatsonxAiChatApi

The WatsonxAiChatApi provides is a lightweight Java client on top of Watsonx.ai Chat Completions API.

Here is a simple snippet showing how to use the api programmatically:

WatsonxAiChatApi watsonxAiApi =
    new WatsonxAiChatApi(apiKey, url, projectId);

WatsonxAiChatRequest request = WatsonxAiChatRequest.builder()
    .withModel("ibm/granite-13b-chat-v2")
    .withMessages(List.of(new WatsonxAiChatRequest.Message("Tell me about 3 famous pirates from the Golden Age of Piracy and why they were famous.", Role.USER)))
    .withTemperature(0.8)
    .withMaxNewTokens(300)
    .build();

ChatCompletionResponse response = watsonxAiApi.chatCompletionEntity(request).getBody();

Follow the WatsonxAiChatApi.java's JavaDoc for further information.

WatsonxAiChatOptions

The WatsonxAiChatOptions class provides various options for configuring chat requests:

Option	Default	Description
model	ibm/granite-13b-chat-v2	The foundation model to use for chat completions
temperature	0.7	Controls randomness in the response. Higher values make output more random
maxNewTokens	1024	Maximum number of tokens to generate in the completion
topP	1.0	Controls diversity via nucleus sampling. Lower values focus on more likely tokens
topK	50	Controls diversity by limiting the vocabulary to the top K tokens
repetitionPenalty	1.0	Penalty for repeating tokens. Values > 1.0 discourage repetition
stopSequences	null	List of strings that will stop generation when encountered
presencePenalty	0.0	Penalty for new tokens based on their presence in the text so far
frequencyPenalty	0.0	Penalty for new tokens based on their frequency in the text so far

Option

Default

Description

model

ibm/granite-13b-chat-v2

The foundation model to use for chat completions

temperature

0.7

Controls randomness in the response. Higher values make output more random

maxNewTokens

1024

Maximum number of tokens to generate in the completion

topP

1.0

Controls diversity via nucleus sampling. Lower values focus on more likely tokens

topK

Controls diversity by limiting the vocabulary to the top K tokens

repetitionPenalty

1.0

Penalty for repeating tokens. Values > 1.0 discourage repetition

stopSequences

null

List of strings that will stop generation when encountered

presencePenalty

0.0

Penalty for new tokens based on their presence in the text so far

frequencyPenalty

0.0

Penalty for new tokens based on their frequency in the text so far

Function Calling

You can register custom Java functions with the WatsonxAiChatModel and have the Watsonx.ai model intelligently choose to output a JSON object containing arguments to call one or many of the registered functions. This allows you to connect the LLM capabilities with external tools and APIs.

The Watsonx.ai models will intelligently choose when to call functions based on the input provided. Here’s a complete example:

@Component
public class MockWeatherService implements Function<MockWeatherService.Request, MockWeatherService.Response> {

    public enum Unit { C, F }
    public record Request(String location, Unit unit) {}
    public record Response(double temp, Unit unit, String location) {}

    public Response apply(Request request) {
        return new Response(30.0, request.unit(), request.location());
    }
}

@RestController
public class WeatherController {

    private final WatsonxAiChatModel chatModel;

    public WeatherController(WatsonxAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/weather")
    public String weather(String location) {
        UserMessage userMessage = new UserMessage("What's the weather like in " + location + "?");

        var promptOptions = WatsonxAiChatOptions.builder()
            .withFunction("CurrentWeather") // Enable the function
            .build();

        ChatResponse response = chatModel.call(new Prompt(List.of(userMessage), promptOptions));

        return response.getResult().getOutput().getContent();
    }
}

@Configuration
public class FunctionConfiguration {

    @Bean
    @Description("Get the weather in location") // function description
    public Function<MockWeatherService.Request, MockWeatherService.Response> currentWeather() {
        return new MockWeatherService();
    }
}