Cloudflare Workers AI Ships Sub-1ms Cold Start for 7B Parameter Models

The New Stack

May 7, 2026

4 min read

Original

AI Analysis

D6Industry

TL;DR

Cloudflare Workers AI achieves 800µs TTFT for 7B models via pre-sharded layers cached across 300+ PoPs, priced at $0.011/1K tokens.

Key Findings

800µs time-to-first-token globally (p50) vs 2,200ms prior architecture
7B model shards cached on NVMe at every PoP — eliminates S3 fetch on cold start
$0.011/1K tokens: 40% cheaper than AWS Lambda + EFS at 10M req/day
Available in 300+ PoPs with automatic geographic load balancing

Key Terms

Workers AImodel shardingcold startPoPserverless inference

claude-3-5-haiku-20241022 · May 8, 2026

Model sharding across 300+ PoPs enables true serverless LLM inference — first token in 800µs globally, eliminating the traditional 2-5s cold start penalty.