Skip to content

Observability

The observability layer is the visibility arm of Spring AI Playground's safety model — the user-facing surface that answers what the agent did, in what order, against which integration, at what cost. Where the sandbox prevents unsafe actions at the call boundary, this layer captures every action that did happen and presents it through twelve dashboards in the desktop app.

The pages under this section document the user surface. For the trace pipeline, storage tiers, configuration, and external export paths, see AI Agent Observability Architecture.

Who uses these dashboards

The dashboards are designed around three roles, all of which can be the same person on a desktop deployment:

  • Builder — authoring tools, iterating on a prompt, debugging why an agent picked the wrong tool. Lives in Traces, Agentic Chat, Tool Studio.
  • Operator — monitoring a deployment over time: cost trends, error rates, MCP server health, system load. Lives in Overview, Tokens & Cost, MCP Servers, Host.
  • Investigator — drilling into a specific incident from a log line back to the originating model decision. Lives in Logs, Traces, Trace Detail dialog.

Every dashboard is read-only and passive — opening it never alters trace data or model behaviour.

The twelve dashboards are grouped into four sections in the left sidebar. Each group answers a different category of question:

flowchart TB
    OV["Overview"]
    subgraph U["AI Usage"]
        direction TB
        TC["Tokens & Cost"]
        AM["AI Models"]
    end
    subgraph S["AI Stack"]
        direction TB
        TS["Tool Studio"]
        MS["MCP Servers"]
        MI["MCP Inspector"]
        VD["Vector Database"]
        AC["Agentic Chat"]
    end
    subgraph R["Runtime"]
        direction TB
        HO["Host"]
        WA["Web Application"]
        LG["Logs"]
        TR["Traces"]
    end
    OV --> U
    OV --> S
    OV --> R

The Overview tab is the landing surface — every group has its own dedicated tabs for depth, but Overview shows one panel from each so an operator can spot anomalies at a glance and drill in from there.

Global settings

Every dashboard shares one ObservabilityGlobalSettings singleton — so picking Last 1H on the Tokens & Cost tab and clicking over to AI Models shows the same hour, and changing the refresh interval applies everywhere at once. Three surfaces touch this state:

  • Header time-window picker — top-right of every dashboard, six chips: Last 5m · 10m · 20m · 30m · 1h · 3h (default 30m). Clicking a chip switches the sliding window and retickes charts.
  • Header refresh chip — beside the time-window picker. Quick-pick Off · 1s · 2s · 5s · 10s · 30s · 60s (default 5s). When Off, charts only update on manual refresh.
  • Cog drawer (Observability settings) — opened by the gear icon. Three sections:
Section What it does
Refresh interval Wider preset chips (3s · 5s · 10s · 30s) plus a numeric Custom field. Identical state to the header chip; opening either edits the same value.
Time range Toggle between Sliding window (the same 6 presets as the header) and Fixed range (From + To DateTimePickers). Fixed range caps at 180 minutes; values outside that window are clipped server-side. When a fixed range is active, the header chips become read-only and auto-refresh pauses (the data window is static).
Per-tab settings Optional — the current dashboard injects its own panel here. Example: Logs adds a "Reset to live tail" button.

An Apply button at the bottom commits staged changes; closing the drawer without Apply discards them.

Code:

  • ObservabilityGlobalSettingssrc/main/java/.../webui/observability/components/ObservabilityGlobalSettings.java (window enum + refresh choices + listener fan-out)
  • ObservabilitySettingsPanel.../webui/observability/components/ObservabilitySettingsPanel.java (the drawer body)
  • TimeWindowPicker / RefreshIntervalPicker — header chips
  • ObservabilityView.installSettingsDrawer(...) — drawer mount

Host and Web Application ignore the time window: Host shows always-live metrics with rolling history retained by the dedicated SystemMetricsRingBuffer, and Web Application reads MeterRegistry gauges live and counters lifetime. Both still honor the refresh interval.

What feeds each dashboard

Dashboards are scoped by the kind of action that produced the data, not by whether a chat happened. Each surface in the app — Agentic Chat, Tool Studio, MCP Server (Inspector), Vector Database, and the running JVM itself — emits its own observation stream, and the dashboards crop those streams differently:

flowchart LR
    subgraph SRC["Where data comes from"]
        direction TB
        S1["Chat turn"]
        S2["Tool Studio<br/>Run test"]
        S3["MCP Inspector<br/>browse · invoke"]
        S4["Vector Database<br/>index · search"]
        S5["JVM running"]
        S6["Any logger"]
    end
    Trace["TraceRecord"]
    Prim["MCP primitive<br/>observations"]
    Met["MeterRegistry +<br/>SystemMetrics"]
    Log["Rolling app log"]
    S1 --> Trace
    S2 --> Trace
    S4 --> Trace
    S3 --> Prim
    S5 --> Met
    S6 --> Log
    subgraph TDASH["Trace-fed dashboards (8)"]
        direction TB
        D1["Overview"]
        D2["Tokens & Cost"]
        D3["AI Models"]
        D4["Tool Studio"]
        D5["MCP Servers"]
        D6["Vector DB"]
        D7["Agentic Chat"]
        D8["Traces"]
    end
    Trace --> TDASH
    Prim --> MI["MCP Inspector"]
    Met --> RUN["Host ·<br/>Web Application"]
    Log --> LG["Logs"]

The four streams are independent and the dashboards mix them differently:

  • TraceRecord stream — every chat turn becomes one TraceRecord, but so does every Tool Studio test run and every Vector Database operation that fires through Spring AI. That single record carries gen_ai.* / spring.ai.tool / db.vector.client.operation child spans and surfaces across Overview, Tokens & Cost, AI Models, Tool Studio, MCP Servers, Vector Database, Agentic Chat, and Traces. A Tool Studio test that never touches chat still populates the Tool Studio dashboard plus Overview / Traces.
  • MCP primitive observations — when you browse or invoke through the MCP Inspector (list tools, read resources, get prompts, sampling, elicitation), separate observations fire and feed only the MCP Inspector dashboard. Independent of trace.
  • MeterRegistry + system metrics — JVM heap, GC, threads, CPU, HTTP, Tomcat sessions, logback level counts are always live (no user action needed) and feed Host and Web Application.
  • Application log stream — anything any code logs is tailed live and feeds Logs.

So clicking through MCP Inspector primitives, running a Tool Studio test, or uploading a document in Vector Database all generate data on their own dashboards even without sending a single chat message. Conversely, only the chat surface generates the conversation-level aggregates on Agentic Chat.

Reference pages

  • Overview


    Single-page summary of every other dashboard's headline number — eight KPI cards, sixteen charts across five sections, recent activity grid.

  • AI Usage


    Tokens & Cost · AI Models — what the agent spent in tokens and money, and which models and providers it routed through.

  • AI Stack


    Tool Studio · MCP Servers · MCP Inspector · Vector Database · Agentic Chat — what the agent integrated with, split by integration kind.

  • Runtime


    Host · Web Application · Logs · Traces — is the JVM process itself healthy, and the raw trace stream behind every aggregate.

Cross-references