12 January 2026
Shwartz predicts a shift from AI model training to large-scale inference, along with increased adoption of autonomous cloud operations and a rising demand for specialized site reliability engineering (SRE) practices tailored to AI.
Shwartz explains that traditional web and microservices infrastructure is now being challenged by the unique demands of AI in production environments. AI models require sustained, predictable access to GPU resources, higher throughput, and tighter cost management. These pressures are already manifesting within large Kubernetes clusters at major cloud providers, especially as enterprises layer AI and generative AI services onto existing systems.
The operational impact, Shwartz notes, will primarily fall on SRE and platform engineering teams. As large Kubernetes estates scale to support AI workloads, talent shortages and competitive pressures are prompting organizations to develop "AI SRE" roles—small teams working alongside automated systems and ML models to manage routine operations. This approach aims to maintain agility and innovation while reducing human overhead.
To succeed, organizations will need standardized operational data, including consistent telemetry, shared event formats, and APIs that enable safe automation. Shwartz anticipates a gradual shift from human-in-the-loop automation towards more autonomous cloud operations, as trust in AI-assisted tooling grows. Enterprises that embrace automation policies, audit trails, and governance frameworks will be better positioned to scale these capabilities responsibly.
New job scheduling systems are also on the horizon. Existing distributed schedulers are ill-equipped for the bursty, GPU-centric workloads of AI, HPC, and emerging quantum computing tasks. Shwartz highlights the rise of cloud-native job queueing solutions like Kueue, which are designed to handle high-performance, multi-tenant environments more effectively, and expects their adoption to accelerate as demand for AI/ML and HPC resources intensifies.
The Kubernetes scheduler itself will need to evolve to support complex AI workflows. Currently, pods are scheduled individually, but AI training and inference often require groups of pods to start simultaneously and share GPU and network resources—a concept known as "gang scheduling." Ongoing community efforts, such as Kubernetes Enhancement Proposal 4671, aim to embed native support for this in future Kubernetes versions, enabling more efficient and workload-specific scheduling.
Shwartz also points to increasing pressure on GPU capacity and utilization. As organizations seek greater efficiency, GPU overprovisioning will become a more visible operational challenge. He recommends that platform teams treat GPU efficiency as a reliability concern, setting clear service level objectives, monitoring fragmentation and saturation, and integrating these metrics into autoscalers and admission controls to optimize resource utilization.
Finally, Shwartz foresees consolidation across cloud infrastructure tooling. As with cloud security, where multiple point products are merging into unified platforms, he predicts similar trends in operational tools for observability, cost management, tracing, and troubleshooting. This consolidation aims to reduce cognitive load for teams, streamline workflows, and improve overall operational coherence.
Looking ahead, organizations that standardize telemetry, experiment with innovative scheduling approaches, and embed GPU efficiency into their SRE practices will be better equipped to manage the explosive growth of AI workloads across Kubernetes and cloud environments by 2026.



