SmartWebFetchTool¶
AI-powered web content fetching and summarization tool with intelligent caching and safety features. Fetches web pages, converts HTML to Markdown, and uses AI to extract relevant information based on a user prompt.
Features: - HTML to Markdown conversion for clean content processing - 15-minute TTL cache for faster repeated access to the same URLs - Automatic retry with exponential backoff on network failures and 5xx errors - Optional domain safety checking via Claude's domain info API - Configurable content length limits with automatic truncation - Fail-open/fail-closed security modes for safety check errors - Proper charset detection and handling - Thread-safe concurrent cache access
Overview¶
The SmartWebFetchTool retrieves content from URLs and processes it using an AI model for intelligent summarization. Unlike simple HTTP clients, it:
1. Fetches HTML content using HTTP GET
2. Converts HTML to clean Markdown format
3. Uses AI to extract information relevant to your specific prompt
4. Caches results for 15 minutes to avoid redundant requests
5. Automatically retries on transient failures
This tool implements AutoCloseable for proper resource cleanup.
Basic Usage¶
// Build with required ChatClient
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();
// Fetch and summarize web content
String result = webFetch.webFetch(
"https://docs.spring.io/spring-ai/reference/",
"What are the key features of Spring AI?"
);
System.out.println(result);
// Output: "Spring AI provides integration with various AI models including..."
Builder Configuration¶
Required Parameters¶
chatClient - The ChatClient instance used for AI-powered summarization
Optional Parameters¶
| Option | Default | Description |
|---|---|---|
maxContentLength |
100,000 | Maximum characters to process; content is truncated with warning |
domainSafetyCheck |
true | Enable domain safety verification before fetching |
failOpenOnSafetyCheckError |
true | Allow fetch if safety check fails (true) or block (false) |
maxCacheSize |
100 | Maximum number of URL+prompt combinations to cache |
maxRetries |
2 | Maximum retry attempts for transient network failures |
Example with all options:
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
.maxContentLength(150_000) // Process up to 150KB
.domainSafetyCheck(true) // Check domain safety
.failOpenOnSafetyCheckError(true) // Allow fetch if safety check errors
.maxCacheSize(200) // Cache up to 200 entries
.maxRetries(3) // Retry up to 3 times
.build();
Configuration Details¶
Max Content Length¶
Controls the maximum number of characters processed from the fetched content. Content exceeding this limit is truncated with a warning logged.
SmartWebFetchTool.builder(chatClient)
.maxContentLength(50_000) // For small articles
.build();
SmartWebFetchTool.builder(chatClient)
.maxContentLength(200_000) // For long documentation
.build();
Use Cases: - Smaller limits (50K-100K): Blog posts, news articles - Medium limits (100K-150K): Technical documentation - Larger limits (150K-200K): Comprehensive guides, API references
Domain Safety Check¶
Verifies domain safety using Claude's domain info API before fetching content.
// Enable safety checks (default)
SmartWebFetchTool.builder(chatClient)
.domainSafetyCheck(true)
.build();
// Disable for trusted internal URLs
SmartWebFetchTool.builder(chatClient)
.domainSafetyCheck(false)
.build();
When to disable: - Internal company documentation - Localhost development servers - Known trusted domains in controlled environments
Fail-Open vs Fail-Closed¶
Controls behavior when domain safety check encounters an error (not a failed check, but an error performing the check).
// Fail-open: Allow fetch if safety check errors (default, more permissive)
SmartWebFetchTool.builder(chatClient)
.failOpenOnSafetyCheckError(true)
.build();
// Fail-closed: Block fetch if safety check errors (more secure)
SmartWebFetchTool.builder(chatClient)
.failOpenOnSafetyCheckError(false)
.build();
Security Trade-offs: - Fail-open (true): Better availability, accepts risk if safety service is down - Fail-closed (false): Better security, blocks all fetches if safety service fails
Max Retries¶
Configures automatic retry attempts for network failures and 5xx server errors with exponential backoff.
SmartWebFetchTool.builder(chatClient)
.maxRetries(0) // No retries, fail immediately
.build();
SmartWebFetchTool.builder(chatClient)
.maxRetries(2) // Default: retry twice (3 total attempts)
.build();
SmartWebFetchTool.builder(chatClient)
.maxRetries(5) // Aggressive retries for unreliable networks
.build();
Backoff Strategy: - Attempt 1: Immediate - Attempt 2: Wait 1 second - Attempt 3: Wait 2 seconds - Attempt 4: Wait 4 seconds - Attempt N: Wait 2^(N-1) seconds
Caching Behavior¶
The tool implements a sophisticated caching system to improve performance and reduce redundant network requests.
Cache Key Structure¶
Cache keys include both the URL and the prompt:
Example:
// These create DIFFERENT cache entries
webFetch.webFetch("https://example.com", "What is the main topic?");
webFetch.webFetch("https://example.com", "List all features");
// This reuses the FIRST cache entry (same URL + prompt)
webFetch.webFetch("https://example.com", "What is the main topic?");
Time-To-Live (TTL)¶
- TTL: 15 minutes per cache entry
- Cleanup: Automatic when cache size exceeds
maxCacheSize - Thread Safety: Concurrent access is safe
Cache Management¶
// Configure cache size
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
.maxCacheSize(500) // Cache up to 500 URL+prompt combinations
.build();
// Cache is automatically cleared on close
try (SmartWebFetchTool tool = SmartWebFetchTool.builder(chatClient).build()) {
// Use tool
}
// Cache cleared here
Error Handling¶
The tool provides comprehensive error handling with descriptive messages.
Common Error Scenarios¶
Invalid URL:
webFetch.webFetch("not-a-url", "Summarize");
// Returns: "Error: Invalid URL format. Please provide a fully-formed URL (e.g., https://example.com)"
Empty URL:
Network Error:
webFetch.webFetch("https://nonexistent-domain-xyz123.com", "Summarize");
// Returns: "Error fetching URL: Network error while fetching URL: ..."
HTTP Error:
webFetch.webFetch("https://example.com/404-page", "Summarize");
// Returns: "Error: Failed to fetch URL. HTTP status code: 404"
Domain Safety Failure:
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
.domainSafetyCheck(true)
.build();
webFetch.webFetch("https://unsafe-domain.com", "Summarize");
// Returns: "Domain safety check failed for URL 'https://unsafe-domain.com': The domain is not safe to fetch content from."
Retry Behavior¶
The tool automatically retries on: - Network errors (IOException) - Server errors (5xx status codes)
It does NOT retry on: - 4xx client errors (bad request, not found, unauthorized, etc.) - Invalid URL format - Failed domain safety checks - Interrupted requests
Resource Management¶
The tool implements AutoCloseable for proper cleanup.
Try-with-Resources (Recommended)¶
try (SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build()) {
String result = webFetch.webFetch(url, prompt);
System.out.println(result);
}
// Cache automatically cleared, resources released
Manual Cleanup¶
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();
try {
String result = webFetch.webFetch(url, prompt);
} finally {
webFetch.close(); // Clear cache
}
Integration Examples¶
Spring Boot Configuration¶
@Configuration
public class ToolsConfig {
@Bean
public SmartWebFetchTool smartWebFetchTool(ChatClient.Builder chatClientBuilder) {
ChatClient chatClient = chatClientBuilder.build();
return SmartWebFetchTool.builder(chatClient)
.maxContentLength(150_000)
.domainSafetyCheck(true)
.failOpenOnSafetyCheckError(true)
.maxCacheSize(100)
.maxRetries(2)
.build();
}
}
ChatClient Integration¶
ChatClient chatClient = chatClientBuilder
.defaultTools(SmartWebFetchTool.builder(chatClient)
.domainSafetyCheck(false) // Disable for internal docs
.maxRetries(3) // More retries for reliability
.build())
.build();
// AI can now use web fetch automatically
String response = chatClient.prompt()
.user("Search for Spring AI documentation and tell me about vector stores")
.call()
.content();
Custom Prompts¶
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient).build();
// Extract specific information
String features = webFetch.webFetch(
"https://spring.io/projects/spring-ai",
"List all supported AI model providers"
);
// Compare content
String comparison = webFetch.webFetch(
"https://example.com/product-a",
"What are the pricing tiers and features for each tier?"
);
// Technical analysis
String analysis = webFetch.webFetch(
"https://github.com/spring-projects/spring-ai",
"What programming languages and frameworks are used in this project?"
);
Advanced Use Cases¶
Multiple Sources¶
SmartWebFetchTool webFetch = SmartWebFetchTool.builder(chatClient)
.maxCacheSize(500) // Large cache for multiple URLs
.build();
String[] urls = {
"https://docs.spring.io/spring-ai/reference/",
"https://docs.spring.io/spring-boot/reference/",
"https://docs.spring.io/spring-framework/reference/"
};
for (String url : urls) {
String summary = webFetch.webFetch(url, "What are the main features?");
System.out.println("Summary for " + url + ":\n" + summary + "\n");
}
Different Prompts Same URL¶
String url = "https://example.com/api-docs";
// Cache miss - fetches and caches
String overview = webFetch.webFetch(url, "Provide an overview");
// Cache miss - different prompt, fetches again
String endpoints = webFetch.webFetch(url, "List all API endpoints");
// Cache hit - same URL and prompt
String overview2 = webFetch.webFetch(url, "Provide an overview");
Internal Documentation¶
// Optimized for internal trusted sources
SmartWebFetchTool internalWebFetch = SmartWebFetchTool.builder(chatClient)
.domainSafetyCheck(false) // Skip safety for internal URLs
.maxRetries(1) // Fewer retries for fast network
.maxContentLength(200_000) // Large docs expected
.build();
String docs = internalWebFetch.webFetch(
"http://internal-docs.company.local/api-spec",
"Summarize the authentication requirements"
);
Security Considerations¶
Domain Safety API¶
The tool uses Claude's domain info API (https://claude.ai/api/web/domain_info) to verify domain safety before fetching.
Safety Check Process:
1. Extract domain from URL
2. Query Claude's API with domain
3. Receive can_fetch boolean response
4. Block or allow based on response and configuration
Disable if: - Fetching from trusted internal domains - Behind corporate firewall with controlled access - Using allowlist of known-safe domains
Read-Only Operations¶
The tool only performs HTTP GET requests and does not: - Modify any local files - Send data to fetched URLs (except HTTP headers) - Execute JavaScript or active content - Store credentials or sensitive data
User-Agent and Headers¶
Standard browser headers are sent for compatibility: - User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)... - Accept: text/html,application/xhtml+xml,application/xml - Accept-Language: en-US,en;q=0.5
Performance Tips¶
Optimize Cache Size¶
// Small application, limited URLs
SmartWebFetchTool.builder(chatClient).maxCacheSize(50).build();
// Large application, many URLs/prompts
SmartWebFetchTool.builder(chatClient).maxCacheSize(500).build();
Content Length Limits¶
// Fast responses for short content
SmartWebFetchTool.builder(chatClient).maxContentLength(50_000).build();
// Comprehensive extraction for long content
SmartWebFetchTool.builder(chatClient).maxContentLength(200_000).build();
Retry Strategy¶
// Fast-fail for time-sensitive operations
SmartWebFetchTool.builder(chatClient).maxRetries(0).build();
// Resilient for unreliable networks
SmartWebFetchTool.builder(chatClient).maxRetries(5).build();
Limitations¶
- Read-only: Only HTTP GET requests supported
- No authentication: Basic auth, OAuth, or API keys not supported in headers
- No cookies: Stateless requests, no session management
- No JavaScript: Static HTML only, no dynamic content rendering
- No redirects to different hosts: Automatically follows same-host redirects only
- Text content: Optimized for HTML/text, binary content not supported
- English-focused: AI summarization works best with English content
Troubleshooting¶
"Domain safety check failed"¶
- Disable safety checks if fetching internal/trusted URLs
- Set
failOpenOnSafetyCheckError(true)to allow fetch on check errors
"Content too long, truncating"¶
- Increase
maxContentLengthif you need more content - Or refine your prompt to extract specific information
"Failed after N attempts"¶
- Check network connectivity
- Verify URL is accessible
- Increase
maxRetriesfor unreliable connections
Cache not working as expected¶
- Remember cache includes both URL AND prompt
- Check if 15-minute TTL has expired
- Verify cache hasn't exceeded
maxCacheSize(causing eviction)
See Also¶
- FileSystemTools - For file operations
- ShellTools - For shell command execution
- BraveWebSearchTool - For web search capabilities