SUNYA Energy

OpenAI partners with Cerebras

January 15, 2026
SUNYA SUMMARY
- OpenAI partners with Cerebras to add 750MW of ultra low-latency AI compute to its platform. - Cerebras builds purpose-built AI systems with massive compute, memory, and bandwidth on a single chip, eliminating traditional hardware bottlenecks. - The integration aims to significantly speed up AI responses for tasks like question answering, code generation, image creation, and AI agent operations. - Low-latency capacity will be incorporated into OpenAI’s inference stack gradually, expanding across various workloads. - This partnership enhances real-time AI responsiveness, enabling more natural interactions and higher-value applications. - OpenAI’s strategy involves building a resilient compute portfolio tailored to different workloads, adding Cerebras as a dedicated low-latency inference solution. - Cerebras CEO compares this advancement to how broadband transformed the internet, emphasizing real-time inference’s potential to revolutionize AI. - Capacity deployment will occur in multiple phases through 2028, reinforcing OpenAI’s scalability efforts.
PRESS RELEASE
OpenAI partners with Cerebras

OpenAI is partnering with Cerebras to add 750MW of ultra low-latency AI compute to our platform.

January 14, 2026

Cerebras builds purpose-built AI systems to accelerate long outputs from AI models. Its unique speed comes from putting massive compute, memory, and bandwidth together on a single giant chip and eliminating the bottlenecks that slow inference on conventional hardware.

Integrating Cerebras into our mix of compute solutions is all about making our AI respond much faster. When you ask a hard question, generate code, create an image, or run an AI agent, there is a loop happening behind the scenes: you send a request, the model thinks, and it sends something back. When AI responds in real time, users do more with it, stay longer, and run higher-value workloads.

We will integrate this low-latency capacity into our inference stack in phases, expanding across workloads.

“OpenAI’s compute strategy is to build a resilient portfolio that matches the right systems to the right workloads. Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people,” said Sachin Katti of OpenAI.

“We are delighted to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor. Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models,” said Andrew Feldman, co-founder and CEO of Cerebras.

The capacity will come online in multiple tranches through 2028.