Back to Feed
Edge

Google Announces GKE Agent Sandbox and Hypercluster at Next '26, Positioning Kubernetes as AI Agent

InfoQ
Steef-Jan Wiggers
May 7, 2026
3 min read
Original
Google has announced several major updates to Google Kubernetes Engine (GKE) at Cloud Next '26, headlined by GKE Agent Sandbox for secure agent code execution and GKE hypercluster for managing up to a million accelerator chips from a single control plane. Drew Bradstock, senior director of orchestration and Kubernetes product management, and Gari Singh, GKE group product manager, write: Kubernetes has rapidly become the operating system for the AI era, with GKE now powering AI workloads for all of our top 50 customers on the platform, including the largest frontier model builders. The framing reflects a broader industry trend. Multi-agent AI workflows have surged 327% in recent months, according to Databricks, and 66% of organizations now rely on Kubernetes to power generative AI applications and agents, per CNCF data. GKE Agent Sandbox provides kernel-level isolation for untrusted agent code execution using gVisor, the same sandboxing technology that secures Gemini. Google claims 300 sandboxes per second at sub-second latency and up to 30% better price-performance when running on Axion compared to other hyperscale clouds. Agent Sandbox launched as a Kubernetes SIG Apps subproject at KubeCon NA 2025 and introduces three new Kubernetes primitives: Sandbox (the core workload resource), SandboxTemplate (the security blueprint), and SandboxClaim (a transactional resource for requesting execution environments from higher-level frameworks like ADK or LangChain). Warm pools of pre-provisioned pods reduce cold start latency to under one second. Lovable, whose platform supports 200,000+ new AI-generated projects daily, is running production workloads on Agent Sandbox. Fabian Hedin, co-founder of Lovable, noted: GKE's cutting-edge sandboxing capabilities allow us to reliably scale to hundreds of secure sandboxes per second, ensuring we can seamlessly empower builders, even during massive, unpredictable demand. The agent sandbox space is becoming a three-way competition between approaches. Cloudflare recently shipped Sandboxes GA using container-based isolation on its edge network, alongside V8 isolate-based Dynamic Workers for lighter workloads. E2B uses Firecracker microVMs. Notably, as Alex Gkiouros, a Google Cloud Ambassador and staff architect, observed, GKE Agent Sandbox is currently the only native agent sandbox offering among the three major hyperscalers. Google's broader bet is that Kubernetes itself should be the agent runtime, with gVisor providing isolation as an open-source Kubernetes primitive rather than a proprietary platform feature. That open-source angle is the key differentiator: any Kubernetes cluster can run Agent Sandbox, not just GKE. GKE hypercluster, now in private GA, addresses a different scaling problem. As AI training demands grow, organizations fragment their infrastructure into hundreds of disconnected clusters, creating operational overhead. Hypercluster lets a single, conformant GKE control plane manage a million chips distributed across 256,000 nodes spanning multiple regions. Security relies on Google's Titanium Intelligence Enclave, a hardware-attested, "no-admin-access" model where proprietary model weights and prompts remain cryptographically sealed from platform administrators. Gkiouros noted a practical concern worth weighing in: A single GKE control plane managing a million chips across regions sounds wonderful until you think through blast radius and change management. Private GA is the right place for it. On the inference side, two improvements ship concrete performance gains. Predictive Latency Boost in GKE Inference Gateway uses ML-driven routing to reduce time-to-first-token latency by up to 70%, replacing heuristic guesswork with real-time capacity-aware scheduling. This capability is built on llm-d, which recently became an official CNCF Sandbox project. Automatic KV Cache storage tiering across RAM, Local SSD, and Google Cloud Storage solves long-context memory bottlenecks, with Google reporting a 50% throughput gain for 10K prompts offloaded to RAM and a nearly 70% throughput improvement for 50K prompts offloaded to SSD. Additional updates include RL Scheduler for optimizing reinforcement learning workloads, RL Sandbox for kernel-isolated reward evaluation, and intent-based autoscaling on custom metrics, which reduces HPA reaction times from 25 seconds to 5 seconds by sourcing metrics directly from pods rather than external monitoring stacks.