OpenAI's Codex hits 1,000 tokens per second

OpenAI's Codex-Spark hits 1,000+ tokens per second on Cerebras' wafer-scale chips. Plus: China's AI labs unleash a Spring Festival model blitz, Samsung ships the first commercial HBM4, and xAI loses half its founding team.

william murray

14 Feb 2026 • 5 min read

From The Bit Baker newsletter — February 14, 2026

PLUS: China's AI labs launch a model blitz, and xAI loses half its founding team

Good morning, Dave. OpenAI just showed what happens when you ditch the Nvidia playbook. Its new Codex-Spark model, running on Cerebras' wafer-scale chips, fires out code at over 1,000 tokens per second — fast enough to build a working snake game before you finish reading this sentence.

But the speed is only half the story. OpenAI is branching out its chip strategy, and Cerebras just landed its biggest customer yet. Meanwhile, China's AI labs went on a tear during Lunar New Year, Samsung started shipping next-gen memory, and Elon Musk's xAI is hemorrhaging talent faster than it's shipping products.

In today's Bit Baker:

OpenAI's Codex-Spark hits 1,000 tokens/sec on Cerebras chips
China's AI labs unleash a Spring Festival model blitz
Samsung ships the world's first commercial HBM4
xAI loses half its founding team

OpenAI's Codex-Spark Hits 1,000 Tokens Per Second on Cerebras Chips

The Bit Baker: OpenAI dropped GPT-5.3-Codex-Spark, a stripped-down coding model running on Cerebras' Wafer-Scale Engine 3. It cranks out over 1,000 tokens per second — enough to build a snake game in nine seconds flat.

Unpacked:

Codex-Spark is a distilled version of GPT-5.3-Codex that trades reasoning depth for raw speed, hitting 50% faster time-to-first-token and 30% lower per-token costs compared to the full model.
This is OpenAI's first custom chip partnership — Cerebras' WSE-3 keeps everything in massive on-chip SRAM instead of shuttling data to external GPU memory, which is what makes the latency so absurdly low for targeted edits and syntax fixes.
The research preview is rolling out to ChatGPT Pro subscribers through the Codex app, CLI, and IDE extensions — 128k context window, text only.

Bottom line: OpenAI just proved it can build outside the Nvidia box. And at 1,000 tokens per second, AI coding stops being a "generate and wait" workflow and starts looking like real-time pair programming. Cerebras calls this a "first milestone," which means more custom silicon deals are coming.

Read the full deep dive →

China's AI Labs Unleash a Spring Festival Model Blitz

The Bit Baker: Chinese tech giants turned the Lunar New Year holiday into a product launch window, led by Alibaba's RynnBrain — a 30-billion-parameter robotics AI that set 16 new records on embodied AI benchmarks, beating Google's Gemini Robotics-ER and Nvidia's Cosmos-Reason2.

Unpacked:

RynnBrain runs on a mixture-of-experts architecture with only 3 billion active parameters, fusing vision, language, and physical action so robots can build spatial memory and learn from watching — and the whole thing is open-source on Hugging Face.
ByteDance shipped Seedance 2.0 for video generation alongside a massive AI giveaway blitz tied to the CCTV Spring Festival Gala, while Zhipu AI released GLM-5, an open-source model touting record-low hallucination rates.
iFlytek's Spark X2 was trained entirely on Chinese-made chips for education and healthcare, and Alibaba's Qwen-Image 2.0 is powering the IOC's coverage of the 2026 Milan Winter Olympics.

Bottom line: A year after DeepSeek rattled the industry, China's AI labs are shipping at a pace you can't wave away. An open-source robotics model from Alibaba that beats Western benchmarks? That tells you something about both ambition and execution.

Read the full deep dive →

Samsung Ships the Industry's First Commercial HBM4

The Bit Baker: Samsung began mass production and shipping of the world's first commercial HBM4 memory chips — delivering 3.3 TB/s of bandwidth per stack, 2.7 times more than the HBM3E powering most AI data centers today.

Unpacked:

Each stack runs at 11.7 Gbps transfer speed (scalable to 13 Gbps) with 24-48GB capacity through 12-to-16-layer stacking, plus 40% better power efficiency and 10% improved thermal resistance over HBM3E.
The chips target Nvidia's upcoming Vera Rubin GPUs, due in Q2 2026, and Samsung expects its HBM sales to triple this year versus 2025.
Samsung isn't running unopposed — Micron started shipping its own HBM4 within days, creating a two-horse race in AI memory while SK Hynix readies its own entry.

Bottom line: AI models keep getting bigger and hungrier, and memory bandwidth has been the chokepoint. HBM4 shipping ahead of Nvidia's next-gen GPUs means the infrastructure is finally keeping pace with what the models need. And with three memory makers competing, pricing pressure should benefit everyone building AI systems.

Read the full deep dive →

xAI's Founding Team Is Falling Apart

The Bit Baker: Elon Musk's xAI has lost half its original cofounders, with at least 11 engineers walking out in the past week — including reasoning team lead Tony Wu and research/safety head Jimmy Ba.

Unpacked:

Six of xAI's original 12 cofounders are now gone, with Wu and Ba announcing exits on February 10 and 11; other departures include former OpenAI researcher Vahid Kazemi plus engineers Simon Zhai, Shayan Salehian, and Andrew Ma.
At an all-hands meeting, Musk framed the exits as a deliberate reorganization "to improve speed of execution." Some departing staff painted a different picture, pointing to friction over autonomy and strategic direction.
The wave follows xAI's all-stock merger with SpaceX, a deal valuing SpaceX at roughly $1 trillion and xAI at $250 billion — widely viewed as prep work for a potential $1.5 trillion IPO.

Bottom line: Losing half your founding team right after a mega-merger doesn't square with "planned reorganization." Whether these exits clear the path for faster execution or signal something more troubling, the timing throws a wrench into Musk's ambitions for the biggest IPO in history.

Read the full deep dive →

The Shortlist

Meta broke ground on a $10 billion AI data center in Indiana as it scrambles to match OpenAI and Google's compute capacity for training and serving large models.

Germany approved its implementation of the EU AI Act, becoming one of the first big economies to formally adopt the regulation — setting up enforcement across industries that deploy high-risk AI.

ByteDance is building its own AI inference chips and aims to produce 100,000 units this year, joining Alibaba and Baidu in a broader drive to cut Chinese tech's dependence on foreign chipmakers.

Software companies are scrambling to rebrand as AI companies — SaaStr changed its name, executives are swapping titles to "Chief AI Officer," and the entire enterprise software industry is pivoting hard.

The Trump administration is drafting voluntary pacts with OpenAI, Microsoft, Amazon, Google, Meta, and Oracle to make sure AI data centers cover their full power and water costs without squeezing household utility supplies.

OpenAI's Codex-Spark Hits 1,000 Tokens Per Second on Cerebras Chips

China's AI Labs Unleash a Spring Festival Model Blitz

Samsung Ships the Industry's First Commercial HBM4

xAI's Founding Team Is Falling Apart

The Shortlist

References

Sign up for more like this.