Skip to content

SmartWebFetchTool

AI-powered web content fetching and summarization tool with intelligent caching and safety features. Fetches web pages, converts HTML to Markdown, and uses AI to extract relevant information based on a user prompt.

Features: - HTML to Markdown conversion for clean content processing - 15-minute TTL cache for faster repeated access to the same URLs - Automatic retry with exponential backoff on network failures and 5xx errors - Optional domain safety checking via Claude's domain info API - Configurable content length limits with automatic truncation - Fail-open/fail-closed security modes for safety check errors - Proper charset detection and handling - Thread-safe concurrent cache access

Overview

The SmartWebFetchTool retrieves content from URLs and processes it using an AI model for intelligent summarization. Unlike simple HTTP clients, it: 1. Fetches HTML content using HTTP GET 2. Converts HTML to clean Markdown format 3. Uses AI to extract information relevant to your specific prompt 4. Caches results for 15 minutes to avoid redundant requests 5. Automatically retries on transient failures

This tool implements AutoCloseable for proper resource cleanup.

Basic Usage

// Build with required ChatClient
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();

// Fetch and summarize web content
String result = webFetch.webFetch(
    "https://docs.spring.io/spring-ai/reference/",
    "What are the key features of Spring AI?"
);

System.out.println(result);
// Output: "Spring AI provides integration with various AI models including..."

Builder Configuration

Required Parameters

chatClient - The ChatClient instance used for AI-powered summarization

SmartWebFetchTool.builder(chatClient)
    .build();

Optional Parameters

Option Default Description
maxContentLength 100,000 Maximum characters to process; content is truncated with warning
domainSafetyCheck true Enable domain safety verification before fetching
failOpenOnSafetyCheckError true Allow fetch if safety check fails (true) or block (false)
maxCacheSize 100 Maximum number of URL+prompt combinations to cache
maxRetries 2 Maximum retry attempts for transient network failures

Example with all options:

SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
    .maxContentLength(150_000)           // Process up to 150KB
    .domainSafetyCheck(true)             // Check domain safety
    .failOpenOnSafetyCheckError(true)    // Allow fetch if safety check errors
    .maxCacheSize(200)                   // Cache up to 200 entries
    .maxRetries(3)                       // Retry up to 3 times
    .build();

Configuration Details

Max Content Length

Controls the maximum number of characters processed from the fetched content. Content exceeding this limit is truncated with a warning logged.

SmartWebFetchTool.builder(chatClient)
    .maxContentLength(50_000)  // For small articles
    .build();

SmartWebFetchTool.builder(chatClient)
    .maxContentLength(200_000)  // For long documentation
    .build();

Use Cases: - Smaller limits (50K-100K): Blog posts, news articles - Medium limits (100K-150K): Technical documentation - Larger limits (150K-200K): Comprehensive guides, API references

Domain Safety Check

Verifies domain safety using Claude's domain info API before fetching content.

// Enable safety checks (default)
SmartWebFetchTool.builder(chatClient)
    .domainSafetyCheck(true)
    .build();

// Disable for trusted internal URLs
SmartWebFetchTool.builder(chatClient)
    .domainSafetyCheck(false)
    .build();

When to disable: - Internal company documentation - Localhost development servers - Known trusted domains in controlled environments

Fail-Open vs Fail-Closed

Controls behavior when domain safety check encounters an error (not a failed check, but an error performing the check).

// Fail-open: Allow fetch if safety check errors (default, more permissive)
SmartWebFetchTool.builder(chatClient)
    .failOpenOnSafetyCheckError(true)
    .build();

// Fail-closed: Block fetch if safety check errors (more secure)
SmartWebFetchTool.builder(chatClient)
    .failOpenOnSafetyCheckError(false)
    .build();

Security Trade-offs: - Fail-open (true): Better availability, accepts risk if safety service is down - Fail-closed (false): Better security, blocks all fetches if safety service fails

Max Retries

Configures automatic retry attempts for network failures and 5xx server errors with exponential backoff.

SmartWebFetchTool.builder(chatClient)
    .maxRetries(0)  // No retries, fail immediately
    .build();

SmartWebFetchTool.builder(chatClient)
    .maxRetries(2)  // Default: retry twice (3 total attempts)
    .build();

SmartWebFetchTool.builder(chatClient)
    .maxRetries(5)  // Aggressive retries for unreliable networks
    .build();

Backoff Strategy: - Attempt 1: Immediate - Attempt 2: Wait 1 second - Attempt 3: Wait 2 seconds - Attempt 4: Wait 4 seconds - Attempt N: Wait 2^(N-1) seconds

Caching Behavior

The tool implements a sophisticated caching system to improve performance and reduce redundant network requests.

Cache Key Structure

Cache keys include both the URL and the prompt:

url::prompt::promptHashCode

Example:

// These create DIFFERENT cache entries
webFetch.webFetch("https://example.com", "What is the main topic?");
webFetch.webFetch("https://example.com", "List all features");

// This reuses the FIRST cache entry (same URL + prompt)
webFetch.webFetch("https://example.com", "What is the main topic?");

Time-To-Live (TTL)

  • TTL: 15 minutes per cache entry
  • Cleanup: Automatic when cache size exceeds maxCacheSize
  • Thread Safety: Concurrent access is safe

Cache Management

// Configure cache size
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
    .maxCacheSize(500)  // Cache up to 500 URL+prompt combinations
    .build();

// Cache is automatically cleared on close
try (SmartWebFetchTool tool = SmartWebFetchTool.builder(chatClient).build()) {
    // Use tool
}
// Cache cleared here

Error Handling

The tool provides comprehensive error handling with descriptive messages.

Common Error Scenarios

Invalid URL:

webFetch.webFetch("not-a-url", "Summarize");
// Returns: "Error: Invalid URL format. Please provide a fully-formed URL (e.g., https://example.com)"

Empty URL:

webFetch.webFetch("", "Summarize");
// Returns: "Error: URL cannot be empty or null"

Network Error:

webFetch.webFetch("https://nonexistent-domain-xyz123.com", "Summarize");
// Returns: "Error fetching URL: Network error while fetching URL: ..."

HTTP Error:

webFetch.webFetch("https://example.com/404-page", "Summarize");
// Returns: "Error: Failed to fetch URL. HTTP status code: 404"

Domain Safety Failure:

SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
    .domainSafetyCheck(true)
    .build();

webFetch.webFetch("https://unsafe-domain.com", "Summarize");
// Returns: "Domain safety check failed for URL 'https://unsafe-domain.com': The domain is not safe to fetch content from."

Retry Behavior

The tool automatically retries on: - Network errors (IOException) - Server errors (5xx status codes)

It does NOT retry on: - 4xx client errors (bad request, not found, unauthorized, etc.) - Invalid URL format - Failed domain safety checks - Interrupted requests

Resource Management

The tool implements AutoCloseable for proper cleanup.

try (SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build()) {
    String result = webFetch.webFetch(url, prompt);
    System.out.println(result);
}
// Cache automatically cleared, resources released

Manual Cleanup

SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();
try {
    String result = webFetch.webFetch(url, prompt);
} finally {
    webFetch.close();  // Clear cache
}

Integration Examples

Spring Boot Configuration

@Configuration
public class ToolsConfig {

    @Bean
    public SmartWebFetchTool smartWebFetchTool(ChatClient.Builder chatClientBuilder) {
        ChatClient chatClient = chatClientBuilder.build();

        return SmartWebFetchTool.builder(chatClient)
            .maxContentLength(150_000)
            .domainSafetyCheck(true)
            .failOpenOnSafetyCheckError(true)
            .maxCacheSize(100)
            .maxRetries(2)
            .build();
    }
}

ChatClient Integration

ChatClient chatClient = chatClientBuilder
    .defaultTools(SmartWebFetchTool.builder(chatClient)
        .domainSafetyCheck(false)  // Disable for internal docs
        .maxRetries(3)             // More retries for reliability
        .build())
    .build();

// AI can now use web fetch automatically
String response = chatClient.prompt()
    .user("Search for Spring AI documentation and tell me about vector stores")
    .call()
    .content();

Custom Prompts

SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();

// Extract specific information
String features = webFetch.webFetch(
    "https://spring.io/projects/spring-ai",
    "List all supported AI model providers"
);

// Compare content
String comparison = webFetch.webFetch(
    "https://example.com/product-a",
    "What are the pricing tiers and features for each tier?"
);

// Technical analysis
String analysis = webFetch.webFetch(
    "https://github.com/spring-projects/spring-ai",
    "What programming languages and frameworks are used in this project?"
);

Advanced Use Cases

Multiple Sources

SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
    .maxCacheSize(500)  // Large cache for multiple URLs
    .build();

String[] urls = {
    "https://docs.spring.io/spring-ai/reference/",
    "https://docs.spring.io/spring-boot/reference/",
    "https://docs.spring.io/spring-framework/reference/"
};

for (String url : urls) {
    String summary = webFetch.webFetch(url, "What are the main features?");
    System.out.println("Summary for " + url + ":\n" + summary + "\n");
}

Different Prompts Same URL

String url = "https://example.com/api-docs";

// Cache miss - fetches and caches
String overview = webFetch.webFetch(url, "Provide an overview");

// Cache miss - different prompt, fetches again
String endpoints = webFetch.webFetch(url, "List all API endpoints");

// Cache hit - same URL and prompt
String overview2 = webFetch.webFetch(url, "Provide an overview");

Internal Documentation

// Optimized for internal trusted sources
SmartWebFetchTool internalWebFetch = SmartWebFetchTool.builder(chatClient)
    .domainSafetyCheck(false)        // Skip safety for internal URLs
    .maxRetries(1)                   // Fewer retries for fast network
    .maxContentLength(200_000)       // Large docs expected
    .build();

String docs = internalWebFetch.webFetch(
    "http://internal-docs.company.local/api-spec",
    "Summarize the authentication requirements"
);

Security Considerations

Domain Safety API

The tool uses Claude's domain info API (https://claude.ai/api/web/domain_info) to verify domain safety before fetching.

Safety Check Process: 1. Extract domain from URL 2. Query Claude's API with domain 3. Receive can_fetch boolean response 4. Block or allow based on response and configuration

Disable if: - Fetching from trusted internal domains - Behind corporate firewall with controlled access - Using allowlist of known-safe domains

Read-Only Operations

The tool only performs HTTP GET requests and does not: - Modify any local files - Send data to fetched URLs (except HTTP headers) - Execute JavaScript or active content - Store credentials or sensitive data

User-Agent and Headers

Standard browser headers are sent for compatibility: - User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)... - Accept: text/html,application/xhtml+xml,application/xml - Accept-Language: en-US,en;q=0.5

Performance Tips

Optimize Cache Size

// Small application, limited URLs
SmartWebFetchTool.builder(chatClient).maxCacheSize(50).build();

// Large application, many URLs/prompts
SmartWebFetchTool.builder(chatClient).maxCacheSize(500).build();

Content Length Limits

// Fast responses for short content
SmartWebFetchTool.builder(chatClient).maxContentLength(50_000).build();

// Comprehensive extraction for long content
SmartWebFetchTool.builder(chatClient).maxContentLength(200_000).build();

Retry Strategy

// Fast-fail for time-sensitive operations
SmartWebFetchTool.builder(chatClient).maxRetries(0).build();

// Resilient for unreliable networks
SmartWebFetchTool.builder(chatClient).maxRetries(5).build();

Limitations

  • Read-only: Only HTTP GET requests supported
  • No authentication: Basic auth, OAuth, or API keys not supported in headers
  • No cookies: Stateless requests, no session management
  • No JavaScript: Static HTML only, no dynamic content rendering
  • No redirects to different hosts: Automatically follows same-host redirects only
  • Text content: Optimized for HTML/text, binary content not supported
  • English-focused: AI summarization works best with English content

Troubleshooting

"Domain safety check failed"

  • Disable safety checks if fetching internal/trusted URLs
  • Set failOpenOnSafetyCheckError(true) to allow fetch on check errors

"Content too long, truncating"

  • Increase maxContentLength if you need more content
  • Or refine your prompt to extract specific information

"Failed after N attempts"

  • Check network connectivity
  • Verify URL is accessible
  • Increase maxRetries for unreliable connections

Cache not working as expected

  • Remember cache includes both URL AND prompt
  • Check if 15-minute TTL has expired
  • Verify cache hasn't exceeded maxCacheSize (causing eviction)

See Also