Back to Feed
Edge
D6Industry

Cloudflare Workers AI Ships Sub-1ms Cold Start for 7B Parameter Models

The New Stack
May 7, 2026
4 min read
Original
AI Analysis
D6Industry

TL;DR

Cloudflare Workers AI achieves 800µs TTFT for 7B models via pre-sharded layers cached across 300+ PoPs, priced at $0.011/1K tokens.

Key Findings

  • 800µs time-to-first-token globally (p50) vs 2,200ms prior architecture

  • 7B model shards cached on NVMe at every PoP — eliminates S3 fetch on cold start

  • $0.011/1K tokens: 40% cheaper than AWS Lambda + EFS at 10M req/day

  • Available in 300+ PoPs with automatic geographic load balancing

Key Terms

Workers AImodel shardingcold startPoPserverless inference
claude-3-5-haiku-20241022 · May 8, 2026
Model sharding across 300+ PoPs enables true serverless LLM inference — first token in 800µs globally, eliminating the traditional 2-5s cold start penalty.