Skip to content

Safe Tool Specification

Version 1.0 · Status: stable for the 0.2.x line.

The Safe Tool Specification (this document) defines the on-disk JSON document format for a tool that Spring AI Playground's Safe Local Execution Layer will load, validate, sandbox, and publish to Model Context Protocol clients. It is the artifact a tool author writes (directly or through Tool Studio's form), the artifact the runtime reads to compute an enforced safety posture, and the artifact the audit log records on every invocation.

This document complements but does not replace:

1. Introduction

1.1 Scope

A Safe Tool Spec is a self-contained JSON document. It declares:

  • Identity the LLM sees (name, description, params)
  • Code the playground executes (code, codeType, staticVariables)
  • Safety posture the sandbox enforces (sandboxOverrides, toolSafety, draft)
  • Cataloging metadata (category, tags, toolId, timestamps)

The spec is not concerned with how a tool is invoked through MCP, only with how a tool is defined. Invocation semantics belong to the MCP tools/list and tools/call schemas.

1.2 Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 and RFC 8174 when, and only when, they appear in all capitals.

Throughout this document:

  • Spec refers to a single conforming JSON document.
  • Resolver refers to the implementation that turns sandboxOverrides into toolSafety. The reference resolver is SandboxPostureCalculator in the Spring AI Playground codebase.
  • Runtime refers to the JavaScript executor that runs the tool's code after the posture is resolved.
  • Catalog refers to the bundled set of specs shipped with the playground (src/main/resources/tool/default-tool-specs-*.json).

1.3 Conformance

A document conforms to this specification if:

  1. It parses as JSON (RFC 8259).
  2. Every field present validates against § 16 JSON Schema.
  3. Every cross-field invariant defined in this document holds (notably the allow/deny disjointness in § 10.1 and the env-var grammar in § 7.2).

A resolver conforms if, given a conforming spec, it produces a toolSafety block that matches § 10.3 and a Risk Level that matches the algorithm in § 10.6.

A runtime conforms if it enforces the policy described by toolSafety — never more permissive, possibly less — and records what was actually enforced (see § 11 audit contract).

1.4 Relationship to existing tool specs

Several schemas exist today to declare a tool an LLM can call: MCP tools/list, OpenAI function calling, Anthropic tool use, Google function declarations, and framework-internal formats like LangChain's BaseTool or LlamaIndex's FunctionTool. They are all narrower than this specification — they declare what the model is allowed to ask for, but leave how the tool runs and what guarantees apply outside the document. The Safe Tool Spec is built to carry both halves in one artifact.

Schema name + JSON-schema args Code body Safety posture Test value Persisted on disk
MCP tools/list — (runtime emission only)
OpenAI function calling
Anthropic tool use
Google function declarations
LangChain BaseTool ✅ (Python class) partial (rate-limit / auth args) — (the code is the spec)
LlamaIndex FunctionTool ✅ (Python callable) — (the code is the spec)
Safe Tool Spec (this doc) ✅ (JS string) ✅ (sandboxOverridestoolSafety) ✅ (testValue + Local Pass) ✅ (JSON file)

The pattern the other formats share: declare a function signature the model invokes, leave the implementation to host application code or framework conventions. The signature is the wire format the LLM consumes; the implementation lives outside the spec — in compiled code, in a framework's registry, or in a hand-written request handler.

The gap they leave open:

  • Where is the tool body? In MCP, OpenAI, Anthropic, and Google function specs, the body is application code, not part of the document. Two engineers receiving the same spec write two different implementations.
  • What enforcement runs around the body? None of the wire-format specs has a field that says "this tool needs filesystem read access" or "this tool's egress is restricted to api.example.com." Safety is something the host application implements separately — if it does at all.
  • How is the tool validated before publish? None of the others define a publish gate. The Safe Tool Spec's testValue + Local Pass turns the spec into its own validation artifact: a spec that does not pass its own declared test does not reach the wire.
  • What does the audit log record? Wire-format specs are silent on this. The Safe Tool Spec writes the resolved toolSafety block into the audit log on every invocation, so "what was actually enforced at this call" is a property of the spec, not of out-of-band instrumentation.

1.4.1 How Safe Tool Spec composes with the wire formats

The Safe Tool Spec is not a replacement for MCP or function-calling schemas. It is a superset that the playground's runtime projects down to those wire formats on the way out:

flowchart LR
    A["Safe Tool Spec<br/>(JSON on disk)"]
    R["SandboxPostureCalculator<br/>+ Local Pass gate"]
    M["MCP tools/list entry<br/>(name · description · JSON Schema)"]
    L["LLM tool call"]
    X["Runtime executes code<br/>under resolved toolSafety"]

    A -- "publish" --> R
    R -- "non-draft only" --> M
    M -- "wire" --> L
    L -- "tools/call" --> X
    X -- "audit toolSafety" --> A

What flows through each boundary:

  • Spec → MCP entry: the playground's MCP server emits each non-draft Safe Tool Spec as an MCP tools/list entry containing exactly the model-visible subset — name, description, and params lowered into JSON Schema. code, staticVariables, sandboxOverrides, toolSafety, testValue, and draft are stripped. The model never sees them.
  • MCP entry → LLM: identical to any other MCP-served tool. The LLM treats it as an opaque named function with typed arguments. A Safe Tool Spec is indistinguishable from any other tool at this layer.
  • tools/call → runtime: when the LLM invokes the tool, the playground executes code under the resolved toolSafety. From the LLM's perspective this is a normal MCP tools/call; from the runtime's perspective it is a sandboxed JS invocation with the audit trail described in § 11.4.
  • Audit ← runtime: every invocation records the resolved toolSafety block alongside the request. Operators reading the audit log can answer "what posture was active when this tool was called" from the spec itself, without re-running the resolver or correlating across logs.

The pattern is the same separation MCP itself draws: protocol vs. execution. MCP standardizes the wire; the Safe Tool Spec standardizes the on-disk artifact that produces the wire output, gates publication on Local Pass, and writes the enforcement record back into the audit log when the wire call returns.

1.4.2 What this specification is not

  • Not a protocol. Safe Tool Spec is a document format; it does not define a transport, a handshake, or a capability-negotiation pass. MCP fills that role.
  • Not a function-calling schema replacement. A Safe Tool Spec is projected to a JSON Schema when published; it does not compete with OpenAI / Anthropic / Google function declarations at the wire layer.
  • Not a framework binding. Unlike LangChain or LlamaIndex tools, a Safe Tool Spec is portable across hosts that implement this specification — the artifact is JSON, the runtime contract is what executes it. A spec written for Spring AI Playground can be loaded by any conformant runtime.

2. Document structure at a glance

A Safe Tool Spec is a JSON object that groups its fields into three conceptual blocks plus bookkeeping. The diagram below shows how the top-level fields cluster; § 3 catalogues them in a single table.

flowchart TB
    SPEC["Safe Tool Spec<br/>(JSON document)"]

    subgraph IDENTITY["① Identity — what the model sees"]
        direction LR
        I1["toolId"]
        I2["name"]
        I3["description"]
        I4["params[]"]
        I5["category · tags[]"]
    end

    subgraph CODE["② Code — what the runtime executes"]
        direction LR
        C1["code"]
        C2["codeType"]
        C3["staticVariables[]<br/>(${ENV_VAR} placeholders)"]
    end

    subgraph SAFETY["③ Safety — what the sandbox enforces"]
        direction LR
        S1["sandboxOverrides<br/>(author intent)"]
        S2["toolSafety<br/>(resolved posture)"]
        S3["draft"]
    end

    BK["createTimestamp · updateTimestamp"]

    SPEC --- IDENTITY
    SPEC --- CODE
    SPEC --- SAFETY
    SPEC --- BK

The three blocks correspond to three of the four product-positioning words from § 1.1: Identity is "for AI Agent Tools" (the model-visible surface), Code is "Execution Layer" (the JS the runtime actually runs), Safety is "Safe" (what the sandbox guarantees). § 4–9 cover Identity and Code, § 10 is the entire Safety block, § 11–12 cover lifecycle and bookkeeping.

The literal JSON shape:

{
  "toolId":            "<UUID v5 derived from name>",
  "name":              "<slug>",
  "description":       "<model-visible description>",
  "category":          "<category enum>",
  "tags":              ["<cohort label>", "..."],
  "params":            [ /* ToolParamSpec, see § 6 */ ],
  "staticVariables":   [ /* {key: value} entries, see § 7 */ ],
  "code":              "<JavaScript action body>",
  "codeType":          "Javascript",
  "sandboxOverrides":  { /* author intent, see § 10.1 */ },
  "toolSafety":        { /* resolved posture, see § 10.3 */ },
  "draft":             true,
  "createTimestamp":   <epoch ms>,
  "updateTimestamp":   <epoch ms>
}

All fields listed above except code, name, and codeType MAY be omitted; defaults are defined per § 3 below.

3. Top-level object

The spec is a JSON object. Each field is defined in its own section. Defaults in this table govern serialization; consumers reading a spec MUST apply the same defaults when a field is absent or null.

Field Type Required Default Section
toolId string (UUID) SHOULD derived (§ 4.1) § 4
name string MUST § 4.2
description string SHOULD empty string § 5
category string SHOULD null § 9.1
tags array of string MAY [] § 9.2
params array of ToolParamSpec MAY [] § 6
staticVariables array of single-entry objects MAY [] § 7
code string MUST § 8
codeType enum string MUST — (only "Javascript" today) § 8.1
sandboxOverrides object MAY empty overrides (baseline) § 10.1
toolSafety object SHOULD empty {} § 10.3
draft boolean MAY true (catalog), false after Local Pass § 11
createTimestamp integer (epoch ms) SHOULD now § 12.2
updateTimestamp integer (epoch ms) SHOULD now § 12.2

Unknown top-level fields MUST be preserved on round-trip (load → save) and MUST NOT cause validation failure. This is the extension point for future minor versions; see § 14.

4. Identity

4.1 toolId

A stable string identifier, normally a UUID v5 derived deterministically from name against a fixed namespace defined by the implementation.

  • toolId MUST be unique within a catalog.
  • toolId SHOULD remain stable across renames so that audit logs, presets, and catalog overrides can refer to a tool by identity rather than by display name.
  • Implementations MAY accept opaque non-UUID strings if internal identity is provided by another mechanism, but UUID v5 from name is the reference scheme.

4.2 name

The MCP tool name. This is what models see in tools/list and what they invoke in tools/call.

  • name MUST be a non-empty string.
  • name SHOULD be a slug: lowercase alphanumeric plus - or _, no whitespace, no path separators.
  • name MUST be unique within the set of published specs that share an MCP server. Drafts (§ 11) are exempt.

Implementations MAY enforce a stricter slug regex; consumers reading a foreign spec MUST NOT reject a non-empty string solely on slug grounds.

5. Description

description is the model-visible prose attached to the tool. It is the primary signal a model uses for tool selection and SHOULD therefore describe (in order of decreasing importance):

  1. What the tool does in one clause.
  2. Which arguments are required and what they mean.
  3. The shape of the response.

Descriptions in the bundled catalog follow conventions worth borrowing:

  • Locale prefix: tools that target a specific non-English locale prefix their description with "<Locale> tool — <description of locale requirements>.". The placeholder takes any ISO-style locale name and the trailing clause describes what the locale binding implies (response language, parameter language, regional API surface, …). Examples: "Korea-locale (KR) tool — Korean responses; some parameters require Korean input.", "Japan-locale (JP) tool — responses in Japanese; queries SHOULD be Japanese for relevance.", "China-locale (CN) tool — Simplified Chinese responses; mainland-China API surface only.". The operational paragraph follows.
  • Return-shape literal: closing the description with a literal JSON-ish sketch of the return value (e.g. Returns an array of { market, tradePrice, openingPrice, … }) measurably improves model tool selection on small open-weight models.
  • Auth signal in prose: tools requiring env-backed credentials describe both injection paths inline ("set NAVER_CLIENT_ID + NAVER_CLIENT_SECRET on the tool's staticVariables, or inject as env var").

A description MUST NOT contain secrets, host names with embedded credentials, or environment-variable values; the audit log captures description verbatim.

6. Parameters

params is an ordered array of ToolParamSpec objects. Order is preserved by the catalog reader, by the persistence layer (see § 12), and on the wire when the MCP server emits the tool's JSON Schema.

6.1 ToolParamSpec shape

{
  "name":        "city",
  "description": "Name of the city",
  "required":    true,
  "type":        "STRING",
  "testValue":   "Seoul"
}
Field Type Required Notes
name string MUST Slug; identifies the argument in the model's tools/call payload
description string SHOULD Model-visible argument hint
required boolean MUST If true, the runtime refuses to execute without this argument
type STRING · INTEGER · NUMBER · BOOLEAN · OBJECT · ARRAY MUST Stored uppercase; see § 6.2
testValue string MUST when required=true Sample value the Local Pass executes the tool with

6.2 Type enum and JSON Schema mapping

The type field is serialized in the spec document in uppercase ("STRING"). When the MCP server emits the tool's JSON Schema for a model, it lowers the value to its JSON Schema spelling ("string"). The asymmetry is intentional: the spec document is the authoring artifact, and uppercase names match the Java enum that backs them; the JSON Schema is the wire format the LLM consumes.

Spec value JSON Schema value
STRING string
INTEGER integer
NUMBER number
BOOLEAN boolean
OBJECT object
ARRAY array

OBJECT and ARRAY MAY be used. Models sometimes serialize an object as a JSON-string into a STRING-typed param when the agent loop does not support nested schemas; tools accepting structured input SHOULD document both call patterns in description.

6.3 testValue contract

testValue is not metadata: it is the value the Local Pass actually runs the tool with. A spec whose testValues are placeholder garbage publishes a tool whose only validated execution path is garbage.

  • testValue MUST be a string. For non-string types the runtime parses the string into the declared type before invoking code.
  • testValue MUST be a representative sample that exercises the same code path the model will hit in production. Pick "Seoul", not "abc".
  • testValue MUST NOT contain a secret. If the tool needs a secret, declare it in staticVariables (§ 7) and let the Local Pass resolve it from the environment.
  • For tools bound to a non-English locale, testValue MAY use that locale's script (e.g. Korean '스프링 AI', Japanese '東京駅', Simplified Chinese '北京天安门', Arabic 'مرحبا') even though the rest of the spec is English. testValue is the only field where non-English content is normative; see § 9.3.

7. Static variables

staticVariables is the spec's mechanism for server-side configuration: values the tool reads at execution time but the model never sees. It is the right place for API keys, account IDs, base URLs, and any other input the author controls but the agent does not.

7.1 Shape and ordering

"staticVariables": [
  { "naverClientId":     "${NAVER_CLIENT_ID}" },
  { "naverClientSecret": "${NAVER_CLIENT_SECRET}" }
]

staticVariables is an ordered list of single-entry objects, not a JSON object. Order is preserved on disk, in memory, and when the runtime constructs the variable bag passed to code. The ordered-list shape exists to permit duplicate keys (rare but legal — later wins on read), to keep deterministic diffs when specs are edited, and to make ${ENV_VAR} audit trails reproducible.

7.2 ${ENV_VAR} placeholder grammar

A value MAY embed environment variable references using the placeholder grammar \$\{([A-Z_]+[A-Z0-9_]*)}:

  • Placeholder names match [A-Z_]+[A-Z0-9_]* — one-or-more uppercase letters or underscores, then any combination of uppercase / digits / underscore. The reference resolver MUST NOT resolve lowercase placeholders.
  • A value MAY mix literal text and placeholders: "https://${API_HOST}/v2" is legal. The reference grammar distinguishes anchored references (the whole value is a single placeholder, e.g. "${API_KEY}") from embedded references (placeholder appears inside literal text).
  • A spec MAY declare more than one placeholder per value; resolution applies to every match.

Resolution order (EnvVarResolver):

  1. System.getenv(name) — process environment.
  2. System.getProperty(name) — JVM system properties (fallback).
  3. Unresolved — the literal ${NAME} is left in place and the spec transitions to MISSING_REQUIREMENTS (§ 11.2).

The resolver MUST treat unset, empty, or whitespace-only values as missing. Implementations MAY layer additional resolution sources (a project-local secret store, a vault) ahead of the OS env, but the contract above is the floor: every conforming resolver MUST consult the OS env at minimum.

7.3 Secret storage

The Safe Tool Spec defines a resolution contract (§ 7.2), not a storage contract. The on-disk storage of resolved static-variable values is constrained to one rule:

Secret surface Storage model Encryption at rest Decryption scope
Static ${ENV_VAR} secrets (this section) OS environment / JVM properties None (the playground does not persist them) n/a — value is only in memory while the process holds it

Static-variable secrets are deliberately not persisted by the playground. The resolution model places trust at the host boundary: if the OS env (or JVM properties) holds the value, the playground reads it for the lifetime of one tool invocation, masks it on output (§ 7.4), and forgets it when the process exits. A spec's staticVariables block records only the placeholder, never the resolved value.

Implementations of this specification SHOULD adopt the same posture: do not persist static-variable secrets at all, and if persisting other credentials (OAuth tokens, MCP-connection bearer tokens, …) on a separate surface, encrypt them with a host-bound or user-bound key so that disk-copy alone is not sufficient to recover plaintext. The reference runtime's OAuth-token storage is documented at safety-architecture → Encrypted OAuth token storage — it is a separate surface and outside this specification.

7.4 Secret masking pipeline

Once resolved, a static-variable value is treated as a secret for the rest of its lifetime in the process. Masking is value-based, not placeholder-based — the runtime tracks the resolved string and substring-replaces every occurrence of it with *** on the way to any text egress.

The contract has two operations:

Operation Behavior
Collect Walk every ${NAME} reference in the template, resolve each via the env-var resolver (§ 7.2), and collect values of length ≥ 4 into a Set<String> of secrets. Values shorter than 4 characters MUST be excluded from the set to avoid masking incidental words.
Mask Substring-replace each member of the secret set with *** on the egress text. The replacement MUST be plain string substitution — no regex, no partial-prefix matching, no structural awareness of the surrounding text.

Properties of this pipeline that implementations MUST preserve:

  • Egress-only: masking is applied at every text-egress point, not at resolution time. The resolved value is what gets passed into code, and code is allowed to use it for outbound network calls / FS writes — the spec does not censor the value while it is still inside the sandbox.
  • Value identity, not placeholder identity: a secret that is set via ${API_TOKEN} and one that is set via ${OTHER_NAME} to the same string are both masked once that string appears in any output. The resolver tracks resolved values, not placeholder names.
  • Minimum length guard: values shorter than 4 characters MUST NOT be added to the mask set. A spec author SHOULD NOT assume a 3-character secret will be masked.
  • No structural understanding of the output: masking is substring replacement on the final text. JSON, YAML, log lines, error messages, MCP tools/call results — all are masked the same way.
  • Per-call collection: the secret set MUST be rebuilt per call from the spec's staticVariables (and equivalents for MCP-connection params). A change to the env between calls is picked up on the next invocation without restart.

Egress points a conformant implementation MUST cover:

  • Every published MCP tool-call log line
  • Every MCP client connection / error / event log line
  • Every UI surface that renders an MCP connection's JSON
  • Every audit log entry (§ 11.4)
  • console.log output from inside the tool's JavaScript code

A resolver-conformant runtime that adds new text-egress channels (Slack notifier, error reporter, telemetry sink) MUST extend the masking call to those channels as well. For the reference runtime's wiring of these egress points (class names, call sites, mermaid), see safety-architecture → Secret masking.

7.5 Both injection paths are first-class

A spec may declare a staticVariables entry with a literal value ("clientId": "12345-abc") for a tool that does not need a secret, or with a ${ENV_VAR} placeholder for a tool that does. Catalog conventions strongly prefer placeholders for any value that looks like a secret — both because of the storage posture above and because masking only applies to values that came through a placeholder. A hard-coded secret literal is not automatically masked, since the masking pipeline has no way to distinguish "secret hard-coded in spec" from "URL fragment hard-coded in spec." Consumers MUST NOT assume the placeholder vs literal distinction beyond what the value itself declares.

8. Code

code is the JavaScript action body. The runtime evaluates it in a sandboxed GraalVM Polyglot Context with all variables from params, staticVariables, and the host-injected safety.* helpers in scope.

8.1 codeType

codeType is an enum with a single accepted value today:

Value Meaning
Javascript The body in code is JavaScript executed by GraalJS, with ECMAScript 2024 syntax support.

codeType is enumerated rather than free-form to leave the door open for future runtimes (Python, Wasm) without ambiguous content sniffing.

8.2 Runtime contract

  • The runtime MUST execute code in a sandboxed context that enforces the resolved toolSafety posture from § 10.3.
  • Within code, params are bound to their declared names as top-level identifiers.
  • staticVariables entries are bound to their declared keys as top-level identifiers (with ${ENV_VAR} placeholders pre-resolved).
  • The host injects (subject to toolSafety.runtime.helpers): console, fetch, URL, URLSearchParams, atob, btoa, crypto, and safety.* helpers.
  • The runtime MUST enforce a wall-clock timeout and a statement-count limit. Defaults: 30 s timeout, 500 000 statements. Implementations MAY tune these.
  • Resource breaches (timeout, statement-limit, helper exception) MUST surface as deterministic errors to the audit log.

8.3 The safety.* helper surface

When the resolved posture grants the corresponding capability, the runtime exposes the following helpers. The version tag in toolSafety.runtime.helpers[] (§ 10.3) records which helpers the spec was authored against; any new major version (e.g. safety.fs/v2) is a breaking change at the helper level and MUST trigger a spec version bump.

Helper Required posture Purpose
safety.fs/v1 (read group) capabilities.fileRead = true readText, list, exists, stat, grep, lineCount, slice, cut, sort, find — all rooted at fsBasePath with path-escape protection
safety.fs/v1 (write) capabilities.fileWrite = true writeText only
safety.parser/v1 (or tool-safety-helpers/v1#parser) always available Jsoup HTML, SnakeYAML load, RFC 4180 CSV, DTD/XXE-hardened XML — see § 8.4 for the per-helper contract and known security caveats
safety.http/v1 capabilities.network.mode != "blocked" Outbound HTTP via fetch with the SSRF four-layer guard active in strict and allowlist modes
tool-safety-helpers/v1#crypto always available The crypto.subtle API and related primitives
tool-safety-helpers/v1#encoding always available atob / btoa plus TextEncoder / TextDecoder

Two helper-string conventions are in active use. Both are normative and may be mixed within a single spec:

  • Namespaced<namespace>/v<n>, e.g. safety.http/v1, safety.fs/v1. Used for helpers that gate on a runtime capability (network, FS).
  • Anchor-suffixed<bundle>/v<n>#<group>, e.g. tool-safety-helpers/v1#crypto. Used for grouped utility helpers that share a single bundle version but expose distinct call surfaces.

Tools authored against v1 MUST list every helper group they use in toolSafety.runtime.helpers; a runtime MAY refuse to publish a spec that references a helper version it cannot provide.

8.4 Parser helpers

The four parser entry points live under safety.parser.* and are exposed whenever the runtime declares safety.parser/v1 (or tool-safety-helpers/v1#parser) in its helper set:

Call Behavior
safety.parser.html(input) Jsoup parse with default settings. ⚠ Returns the host org.jsoup.nodes.Document directly (not wrapped in a plain proxy tree like XML / CSV / YAML); JS code can call jsoup methods on the returned object. Implementations MAY wrap the return to match the proxy-tree convention. See safety-architecture → safety.parser.html returns host Document.
safety.parser.yaml(input) SnakeYAML load. ⚠ Reference runtime uses default Constructor (not SafeConstructor) — !!class.name tags trigger class instantiation; implementations SHOULD use SafeConstructor, and consumers MUST treat untrusted YAML input as security-relevant. See safety-architecture → safety.parser.yaml constructor choice.
safety.parser.csv(input, opts?) RFC 4180 CSV with optional {header, delimiter}
safety.parser.xml(input) DTD/XXE-hardened DocumentBuilder

9. Categorization

9.1 category

category is a single-string label used for UI grouping in the catalog browser. It is not enforced as an enum at the document level — consumers MUST accept arbitrary string values — but the bundled catalog defines and uses the following 13 values:

WEB · FILE · CRYPTO · DATETIME · TEXT · ENCODING · DATA · SECURITY · MATH · NETWORK · SYSTEM · UTILS · OTHER

Catalog-conformant authors SHOULD pick from the list above. Authors publishing private specs MAY introduce new categories; consumers presenting an unknown category MUST render it as a string verbatim.

9.2 tags

tags are cohort labels distinct from category. Where category answers "what does the tool do?" tags answers "what cohort does it belong to?"

  • tags MUST contain at most 2 values per spec. Catalog tooling rejects specs that exceed this on import.
  • tags are drawn from a controlled vocabulary in the bundled catalog: korea · example · util · pipeline · github · search · finance · weather · geo. Future minor versions of this spec MAY enlarge the vocabulary; vocabulary additions are non-breaking.
  • tags MUST NOT carry capability or auth signals. Capability lives in sandboxOverrides; secret-backing lives in staticVariables. Encoding the same fact in two places is a maintenance hazard.

9.3 Locale rule

Specs published in a multilingual catalog MUST follow these locale rules. The rules apply uniformly to every non-English locale (Korean, Japanese, Chinese, Arabic, Hebrew, Thai, …) so that machine-readable fields stay English while human-targeted examples can carry locale-bound content:

  • name, slug-like identifiers, JSON keys, and JSON values that look like identifiers MUST be ASCII English.
  • description is English prose, possibly with quoted non-English fragments inside it. Quote the fragment in the locale the upstream API or end-user actually uses — e.g. "Korean queries typical (e.g. '스프링 AI'); other languages also accepted." or "Japanese station names typical (e.g. '東京駅').". The base prose is English; locale-bound examples are quoted.
  • params[].testValue MAY be in any locale required by the upstream API. This is the only field where non-English content is normative.
  • JavaScript code in code MUST follow English-only naming; // comments MAY be in any locale.

10. Safety

The two safety-related blocks are the core of this specification. They look similar but serve opposite directions:

Block Direction Editable by Stored verbatim
sandboxOverrides Author intent (declarative) Tool Studio's Sandbox & Capabilities pane Yes
toolSafety Runtime enforcement (resolved) Computed by the resolver Yes (informational)

Implementations MUST treat sandboxOverrides as the author's declared widening of the baseline; the resolver MUST compute toolSafety from sandboxOverrides + the configured baseline policy.

10.1 sandboxOverrides shape

"sandboxOverrides": {
  "addAllowClasses":    [],
  "removeAllowClasses": [],
  "addDenyClasses":     [],
  "removeDenyClasses":  [],
  "networkMode":        "allowlist",
  "hostsAllow":         ["api.upbit.com"],
  "fileRead":           null,
  "fileWrite":          null,
  "fsBasePath":         null
}
Field Type Tristate? Meaning of absent / null
addAllowClasses array of Java class names no empty array — baseline allowlist unchanged
removeAllowClasses array of Java class names no empty array — baseline allowlist unchanged
addDenyClasses array of Java class names no empty array — baseline denylist unchanged
removeDenyClasses array of Java class names no empty array — baseline denylist unchanged
networkMode enum (§ 10.4) yes inherit baseline (default = blocked)
hostsAllow array of hostnames no empty — no hosts; ["*"] is the wildcard sentinel
fileRead boolean OR null yes inherit baseline (default = false)
fileWrite boolean OR null yes inherit baseline (default = false)
fsBasePath string OR null yes inherit baseline path

Notes:

  • For networkMode, fileRead, fileWrite, fsBasePath the distinction between null (inherit) and an explicit value (override) is significant. Setting fileRead: false explicitly is different from omitting the field — explicit false MUST clear any baseline that would have granted read access.
  • addAllowClassesaddDenyClasses MUST be empty after merge with baseline. A resolver detecting overlap MUST raise a deterministic resolver error rather than silently picking one.
  • An empty SandboxOverrides block (all fields null/empty) is equivalent to no block at all; consumers MUST treat them interchangeably.

10.2 Resolution algorithm

The reference resolver (SandboxPostureCalculator) computes the enforced posture from sandboxOverrides plus the configured baseline. The two inputs flow through merge and tristate-coalesce steps and emerge as the toolSafety block:

flowchart LR
    OV["sandboxOverrides<br/>(author intent)"]
    BL["baseline policy<br/>(application.yaml)"]
    CALC["Resolver<br/>(compute toolSafety)"]

    subgraph STEPS["Resolution"]
        direction TB
        M1["1 · merge allow/deny<br/>(baseline ∪ add) − remove"]
        M2["2 · disjointness check<br/>allow ∩ deny = ∅"]
        M3["3 · tristate coalesce<br/>networkMode · fileRead · fileWrite · fsBasePath"]
        M4["4 · resolve hosts<br/>(when networkMode = allowlist)"]
    end

    TS["toolSafety block<br/>(audit-logged on every call)"]

    OV --> CALC
    BL --> CALC
    CALC --> STEPS
    STEPS --> TS

Pseudocode:

input:  overrides : SandboxOverrides
        baseline  : { allowClasses, denyClasses, fsBasePath, networkMode, allowedHosts, fileRead, fileWrite }

step 1  effectiveAllow = (baseline.allow ∪ overrides.addAllow) − overrides.removeAllow
step 2  effectiveDeny  = (baseline.deny  ∪ overrides.addDeny ) − overrides.removeDeny
step 3  if effectiveAllow ∩ effectiveDeny ≠ ∅ → reject (resolver error)
step 4  effectiveNetwork = overrides.networkMode ?? baseline.networkMode      (tristate)
step 5  effectiveHosts   = overrides.hostsAllow ∪ baseline.allowedHosts        when network=allowlist; else []
step 6  effectiveFileR   = overrides.fileRead   ?? baseline.fileRead           (tristate)
step 7  effectiveFileW   = overrides.fileWrite  ?? baseline.fileWrite          (tristate)
step 8  effectiveBase    = overrides.fsBasePath ?? baseline.fsBasePath
step 9  populate toolSafety = {
            version: "1.0",
            runtime: { id, minVersion, ecmaVersion, javaInterop, helpers, console },
            category: { source, id },
            capabilities: {
              network: { mode: effectiveNetwork, hosts: effectiveHosts },
              fileRead: effectiveFileR,
              fileWrite: effectiveFileW
            }
        }

The algorithm is monotonic with respect to risk: nothing in sandboxOverrides can make the baseline less permissive than its already-allowed reach (that would be a no-op or a reduction). Removals from the baseline denylist are escalations; removals from the baseline allowlist are restrictions. See § 10.6 for how this drives Risk Level.

10.3 toolSafety shape

"toolSafety": {
  "version": "1.0",
  "runtime": {
    "id":            "spring-ai-playground/polyglot-js",
    "minVersion":    "0.2.0",
    "ecmaVersion":   "2024",
    "javaInterop":   false,
    "helpers":       ["safety.http/v1"],
    "console":       true
  },
  "category": {
    "source": "builtin",
    "id":     "WEB"
  },
  "capabilities": {
    "network": { "mode": "allowlist", "hosts": ["api.upbit.com"] },
    "fileRead":  false,
    "fileWrite": false
  }
}
Path Type Notes
version string Spec-schema version this block was written against. Today: "1.0".
runtime.id string Stable runtime identifier. Today: "spring-ai-playground/polyglot-js".
runtime.minVersion string (semver) Minimum Spring AI Playground version that can execute the tool
runtime.ecmaVersion string "2024" for v1
runtime.javaInterop boolean Whether the tool reaches into host JVM classes
runtime.helpers array of "<namespace>/v<n>" strings Versioned helper surface the spec relies on
runtime.console boolean Whether console.log is bound (output still passes env-var masking)
category.source string "builtin" for catalog specs, "user" for Tool Studio specs, or a custom origin
category.id string Resolved category (see § 9.1)
capabilities.network.mode enum (§ 10.4) Resolved network mode
capabilities.network.hosts array of hostnames Resolved egress allow list
capabilities.fileRead boolean Resolved read capability
capabilities.fileWrite boolean Resolved write capability

toolSafety is the auditable record of what the runtime is committed to enforce. The audit log records this block per invocation; downstream consumers SHOULD treat it as authoritative for "what posture was active at this call."

Implementation note. In the reference Spring AI Playground runtime (v0.2.x), toolSafety is written by Tool Studio at publish-time but is not re-derived on every load — the persisted block is the writer's last snapshot. Downstream consumers that need byte-fresh policy MUST re-run the resolver against sandboxOverrides rather than trusting toolSafety for enforcement decisions on a foreign spec.

10.4 Network mode behavioral table

capabilities.network.mode takes one of four values. Each defines a distinct fetch behavior; the SSRF four-layer guard (DNS pinning, IP-range filter, redirect-chain pinning, response-body size cap) is active in strict and allowlist, and bypassed in open:

Mode fetch exposed? Host gate SSRF guard When to use
blocked no n/a n/a Tool does no network — the safe default; the fetch global is not installed at all.
allowlist yes only hosts in capabilities.network.hosts active Tool talks to one or more known vendor APIs. Recommended for catalog publication.
strict yes any public host active Tool talks to arbitrary public hosts but the playground enforces SSRF guards on every request.
open yes any host including private networks bypassed Strongly discouraged; should never appear in a published catalog spec. Authoring private tools on a trusted host only.

The default at the baseline level is blocked. Authors who do not declare networkMode in sandboxOverrides publish a tool that cannot reach the network.

10.5 File access behavioral table

fileRead fileWrite safety.fs/v1 exposed? Notes
false (or null inheriting false) false not exposed The helper is not even installed in the runtime's safety object.
true false exposed, read-only group writeText throws; all other fs.* work, scoped under fsBasePath.
false true exposed, write-only Only writeText works; all other fs.* throw.
true true exposed, full All fs.* work.

fsBasePath is the root the helper enforces. Any path argument the tool passes to fs.* is resolved relative to fsBasePath and then re-normalized; arguments that escape (.. traversal) MUST be refused with a SECURITY JsHelperException.

10.6 Risk Level

toolSafety is human-readable; Risk Level is the UI-friendly distillation. Levels run from L0 (no detected risk) to L5 (escape-class allowed). The reference resolver computes Risk Level as a monotonic max-merge:

risk := L0
if capabilities.network.mode == "allowlist":
    risk := max(risk, hosts contains "*" ? L4 : L3)
elif capabilities.network.mode == "strict":  risk := max(risk, L3)
elif capabilities.network.mode == "open":    risk := max(risk, L4)
if fileWrite:                                risk := max(risk, L4)
elif fileRead:                               risk := max(risk, L3)

for cls in (baseline.deny − sandboxOverrides.removeDenyClasses):
    if cls matches System|Runtime|Process|ProcessBuilder:  risk := max(risk, L5)
if |removed-from-baseline-deny| ≥ 3:                       risk := max(risk, L4)
elif |removed-from-baseline-deny| ≥ 1:                     risk := max(risk, L3)

for cls in (sandboxOverrides.addAllowClasses − baseline.allow):
    if cls is critical (System / Runtime / Process):       risk := max(risk, L5)
    elif cls is FileWrite-related:                         risk := max(risk, L5)
    elif cls is reflection / network / FileRead-related:   risk := max(risk, L4)
    else:                                                  risk := max(risk, L3)

The Risk Level is computed for UI badging and audit-log decoration. Implementations MUST NOT store the computed level in toolSafety itself — Risk Level is a view on the posture, not a property of it. If the algorithm changes, recomputing yields a different answer from the same toolSafety; this is intentional.

11. Lifecycle

11.1 States

A spec is in exactly one of the following states at any time:

stateDiagram-v2
    [*] --> DRAFT : new spec / import from catalog

    DRAFT --> ACTIVE : Local Pass earned<br/>+ env vars resolved
    DRAFT --> MISSING_REQUIREMENTS : draft cleared<br/>but env vars missing
    DRAFT --> TEST_FAILED : Local Pass attempted<br/>and failed *(reserved)*

    MISSING_REQUIREMENTS --> ACTIVE : env vars set
    MISSING_REQUIREMENTS --> DRAFT : draft flag re-raised

    ACTIVE --> DRAFT : draft flag re-raised<br/>(e.g. spec edit)
    ACTIVE --> MISSING_REQUIREMENTS : env var unset at runtime

    TEST_FAILED --> DRAFT : edit + retry

    DRAFT : not exposed via MCP
    MISSING_REQUIREMENTS : not exposed via MCP
    TEST_FAILED : not exposed via MCP
    ACTIVE : exposed via built-in MCP server
State Condition MCP exposure
DRAFT draft == true (or spec == null) not exposed
MISSING_REQUIREMENTS any ${ENV_VAR} referenced by staticVariables resolves to unset / empty / whitespace-only not exposed
ACTIVE draft == false AND every env-var reference resolves exposed via the built-in MCP server
TEST_FAILED reserved not exposed

TEST_FAILED is reserved for future use; the reference resolver never returns it from the current calculator. Drafts MAY exist with arbitrary or empty toolSafety — the runtime does not enforce posture invariants until the spec is published.

11.2 Env-var requirement check

Before publishing, the runtime walks every staticVariables value, extracts each ${VAR} placeholder, and verifies the OS environment defines a non-blank value for it. The check uses the placeholder grammar from § 7.2.

  • Implementations MUST treat unset, empty, and whitespace-only environment values as missing.
  • Implementations MAY consult a project-local secret store before the OS env; the result of the lookup is what the requirement check inspects.
  • A spec with any missing requirement is transitioned to MISSING_REQUIREMENTS and is not exposed.

11.3 Local Pass — the publish gate

draft flips from true to false only when the spec earns its Local Pass: a successful test run with the declared testValues, executed in the same sandbox the published tool will run in, with the resolved toolSafety posture in effect.

  • The Local Pass MUST execute code with every required param set from its testValue.
  • The Local Pass MUST be repeatable; non-deterministic tools (random, time-sensitive) MUST choose testValues that exercise the deterministic path.
  • A passing Local Pass updates the in-memory state and persists draft: false.
  • A failing Local Pass leaves draft: true and surfaces a structured error to the audit log.

11.4 Audit contract

Every invocation MUST record (at minimum):

  • toolId, name, category.id
  • Resolved toolSafety block (verbatim)
  • The Risk Level computed from § 10.6
  • Parameters as received (post-validation, pre-execution); secrets MUST be masked
  • Outcome: OK / ERROR with structured cause
  • Elapsed duration

The audit record is the source of truth for "what was actually enforced." Implementations MAY append additional fields (cid, request id, MCP client metadata).

12. Persistence

12.1 File layout

The reference implementation persists user-authored specs into a single bundle file under the playground's home directory:

~/spring-ai-playground/tool/save/toolSpecsMcpSetting.json

The bundle file contains both the spec list and the MCP server settings:

{
  "toolSpecs": [ /* spec, spec, ... */ ],
  "toolMcpServerSetting": { /* MCP transport + autoAdd flag */ }
}

Specs that originate from the bundled catalog (src/main/resources/tool/default-tool-specs-*.json) are excluded from the bundle on save — they are reloaded from the classpath on startup, with user overrides matched by toolId and merged on top.

Implementations are free to choose a different file layout (one file per spec, sharded by category, database-backed) as long as the round-trip JSON shape of each spec conforms to this specification.

12.2 Atomic write contract

Writers MUST commit changes atomically:

  1. Serialize the bundle to a sibling temp file (toolSpecsMcpSetting.json.tmp).
  2. renameSync the temp file over the target. POSIX rename guarantees atomicity within the same filesystem.

Readers MUST read after the rename completes — a writer that crashes mid-rename leaves the previous bundle intact.

createTimestamp is set once when the spec is first written; updateTimestamp is updated on every subsequent persist. Both are epoch milliseconds.

12.3 Catalog mirror invariant (build-time)

The Spring AI Playground build ships the catalog twice:

  • src/main/resources/tool/default-tool-specs-*.json — the JVM resource classpath
  • electron/resources/catalog/default-tool-specs-*.json — the Electron-bundled mirror

The two MUST be byte-identical. The build is responsible for enforcing this (the reference build uses prepare-resources.mjs); the spec format itself is silent on it. Catalog publishers consuming this spec independently MAY omit the mirror requirement.

13. Versioning policy

The version namespace lives in toolSafety.version (today: "1.0"). The bump rules:

  • Patch (1.0 → 1.0.1): editorial clarification, additional examples, new tag vocab entries, new category enum values. No behavioral change. Patch bumps are not visible in the version field — the field captures major + minor only.
  • Minor (1.0 → 1.1): backward-compatible additive change. New optional fields, new network mode values, new helper versions added to the vocabulary (e.g. safety.http/v2 alongside v1). Existing conforming specs continue to parse and resolve identically.
  • Major (1.0 → 2.0): backward-incompatible change. Field renames, removed enum values, semantics changes to existing fields. Documents written against v1 MUST continue to be readable for at least one major-version transition window.

Helper-level versions (safety.fs/v1safety.fs/v2) are independent of the spec version; they bump the helper-namespace number when their JS API surface changes. A spec MAY mix v1 and v2 helpers from different namespaces.

14. Extension points

Unknown top-level fields MUST be preserved on round-trip. This is the dedicated extension surface — a future minor version can introduce new fields without invalidating today's documents.

Implementations adding their own fields SHOULD:

  • Prefix custom field names with a vendor identifier (x-acme-cost-cap) to avoid collision with future standard fields.
  • Document the field's semantics in their own docs and link this spec for the surrounding shape.
  • Treat unknown vendor-prefixed fields with the same round-trip rule — do not strip them on save.

Custom additions inside sandboxOverrides, toolSafety, or params[] are out of scope for this version — those blocks have closed shapes today. Future minor versions may open named extension sub-objects within them.

15. Validation and error model

Validation has three layers:

  1. Document validation — does the spec parse and conform to the JSON Schema (§ 16)?
  2. Cross-field validation — do the invariants in § 6 (requiredtestValue present), § 7 (env-var grammar), § 10.1 (allow/deny disjointness) hold?
  3. Runtime validation — does the resolver accept the spec, and does the Local Pass succeed?

Validation errors SHOULD be reported with at least:

  • A stable error code (SPEC_PARSE, SPEC_INVARIANT, RESOLVER_REJECT, LOCAL_PASS_FAILED)
  • A pointer to the offending field (params[2].testValue)
  • A human-readable message
  • For runtime errors, the resolved toolSafety block under which the failure occurred

The reference runtime classifies helper errors as INVALID_INPUT, HELPER_RUNTIME, or SECURITY; the spec layer adds the four codes above.

16. JSON Schema

A normative JSON Schema 2020-12 document is bundled alongside this page:

safe-tool-spec.schema.json

Validate any candidate spec by loading the schema and checking it with a 2020-12 compatible validator (ajv, jsonschema, python-jsonschema).

17. Canonical examples

The bundled catalog ships every example variant below. Each is shown abbreviated; the full version is in src/main/resources/tool/default-tool-specs-*.json.

17.1 Pure compute — base64

No network, no filesystem, no env. The baseline sandboxOverrides (all-null) is sufficient.

{
  "toolId": "e30d037d-20cf-55f2-b43a-1b89560417da",
  "name": "base64",
  "description": "Encodes UTF-8 text to base64, or decodes base64 back to UTF-8 text. Use mode='encode' (default) or 'decode'.",
  "category": "ENCODING",
  "tags": ["util"],
  "params": [
    { "name": "text", "type": "STRING", "required": true,  "testValue": "hello world", "description": "Text to encode/decode" },
    { "name": "mode", "type": "STRING", "required": false, "testValue": "encode",      "description": "encode | decode" }
  ],
  "staticVariables": [],
  "code": "/* ... */",
  "codeType": "Javascript",
  "sandboxOverrides": {},
  "toolSafety": {
    "version": "1.0",
    "runtime": { "id": "spring-ai-playground/polyglot-js", "javaInterop": false, "helpers": [], "console": true },
    "capabilities": { "network": { "mode": "blocked", "hosts": [] }, "fileRead": false, "fileWrite": false }
  },
  "draft": false
}

Risk Level: L0.

17.2 Single-host network — getUpbitTicker

allowlist mode with one host. No env-backed secret; the upstream API is unauthenticated.

{
  "name": "getUpbitTicker",
  "category": "WEB",
  "tags": ["korea"],
  "params": [{ "name": "markets", "type": "STRING", "required": true, "testValue": "KRW-BTC,KRW-ETH",
              "description": "Comma-separated KRW markets (e.g. 'KRW-BTC,KRW-ETH')" }],
  "staticVariables": [],
  "sandboxOverrides": {
    "networkMode": "allowlist",
    "hostsAllow":  ["api.upbit.com"]
  },
  "toolSafety": {
    "version": "1.0",
    "runtime":  { "id": "spring-ai-playground/polyglot-js", "javaInterop": false, "helpers": ["safety.http/v1"], "console": true },
    "capabilities": { "network": { "mode": "allowlist", "hosts": ["api.upbit.com"] }, "fileRead": false, "fileWrite": false }
  }
}

Risk Level: L3 (non-wildcard allowlist).

17.3 Env-backed multi-secret — searchNaver

Two env-backed credentials, allowlist mode.

{
  "name": "searchNaver",
  "category": "WEB",
  "tags": ["korea"],
  "params": [
    { "name": "query", "type": "STRING", "required": true, "testValue": "스프링 AI",
      "description": "Korean queries typical (e.g. '스프링 AI'); other languages also accepted." }
  ],
  "staticVariables": [
    { "naverClientId":     "${NAVER_CLIENT_ID}" },
    { "naverClientSecret": "${NAVER_CLIENT_SECRET}" }
  ],
  "sandboxOverrides": {
    "networkMode": "allowlist",
    "hostsAllow":  ["openapi.naver.com"]
  }
}

Without both env vars set, the spec sits in MISSING_REQUIREMENTS and is not exposed (§ 11.2).

17.4 Strict-mode external HTTP — extractPageContent

Tool fetches arbitrary user-supplied URLs; SSRF guard runs in strict mode.

{
  "name": "extractPageContent",
  "category": "WEB",
  "tags": ["util"],
  "params": [
    { "name": "url", "type": "STRING", "required": true, "testValue": "https://example.com" }
  ],
  "sandboxOverrides": { "networkMode": "strict" }
}

Risk Level: L3 (strict).

17.5 Filesystem read — readTextFile

No network, scoped read access to the configured fsBasePath.

{
  "name": "readTextFile",
  "category": "FILE",
  "params": [
    { "name": "path", "type": "STRING", "required": true, "testValue": "README.md" }
  ],
  "sandboxOverrides": { "fileRead": true }
}

Risk Level: L3 (read only).

17.6 Filesystem write — writeTextFile

Write access. Highest risk level the bundled catalog ships.

{
  "name": "writeTextFile",
  "category": "FILE",
  "params": [
    { "name": "path",    "type": "STRING", "required": true, "testValue": "notes.txt" },
    { "name": "content", "type": "STRING", "required": true, "testValue": "hello" }
  ],
  "sandboxOverrides": { "fileWrite": true }
}

Risk Level: L4 (write).

17.7 Object-typed argument — evalExpression

Demonstrates OBJECT parameter type. Models that cannot pass nested JSON drop down to STRING and pre-serialize.

{
  "name": "evalExpression",
  "category": "MATH",
  "params": [
    { "name": "expr",      "type": "STRING", "required": true,  "testValue": "x + 2 * y" },
    { "name": "variables", "type": "OBJECT", "required": false, "testValue": "{\"x\":3,\"y\":4}",
      "description": "Variable bindings (JSON-stringified object: {\"x\":3,\"y\":4})" }
  ]
}

17.8 Draft — unpublished

A spec freshly imported from the catalog ships with draft: true. Until activation (by preset + rules), it remains invisible to MCP.

{ "name": "experimentalThing", "code": "/* ... */", "codeType": "Javascript", "draft": true }

18. References

19. Document history

Version Date Notes
1.0 2026-05-20 Initial publication. Codifies the shape shipping in Spring AI Playground 0.2.0-M7.