Anthropic may explore custom AI inference chips with UK startup Fractile

Justin Nelon
Published on May 4, 2026

news

Anthropic may explore custom AI inference chips with UK startup Fractile

Anthropic is reportedly in early talks with UK startup Fractile, a company working on a new chip architecture designed to speed up AI inference while reducing cost. The talks are still early, and Fractile has not yet produced test chips, so this should be treated as a developing story rather than a finished product.

Fractile’s technology is called Memory Compute Fusion Architecture. The basic idea is to reduce how much data needs to move back and forth to external DRAM. Instead, more of the work happens inside the chip using SRAM, which can reduce latency and improve efficiency.

AI companies want faster inference without depending on one chip supplier

Anthropic already uses chips from NVIDIA, Google, and Amazon to run Claude. It has also been linked to Broadcom and AMD as AI compute demand grows. That strategy makes sense because relying on one chipmaker can create supply, pricing, and scaling problems.

Fractile is interesting because it is focused on inference, which is the part of AI that runs models after training. As AI apps become more widely used, inference cost becomes a major issue. Every prompt, tool call, and generated response needs compute.

Company or tech	Role in the report
Anthropic	Reportedly in early talks with Fractile
Fractile	UK startup working on SRAM based inference architecture
Memory Compute Fusion	Designed to reduce DRAM movement and speed up inference
NVIDIA Groq	Current comparison point for SRAM based inference acceleration
Claude	Anthropic’s AI model family that needs large scale compute

Fractile claims its design could deliver up to 100x faster AI inference and reduce costs by 10x compared with NVIDIA’s Groq technology. Those numbers are very ambitious, but they remain claims until silicon exists and independent testing confirms them.

The comparison with NVIDIA Groq matters because Groq style inference accelerators use large amounts of SRAM and high bandwidth to reduce latency. NVIDIA’s Groq 3 LPU is described with 500MB of SRAM, 150TB per second of SRAM bandwidth, and 2.5TB per second scale up bandwidth. Fractile appears to be pursuing a similar memory close to compute idea, but with its own architecture.

The bigger trend is clear. AI companies are looking beyond general purpose GPUs because inference costs are becoming huge. Training gets most of the attention, but serving models to millions of people every day can become even more important financially.

For Anthropic, a custom or semi custom inference path could help lower long term operating costs and reduce dependence on external suppliers. But this will take time. Fractile still needs test chips, validation, software support, and real deployment data.

For now, this is a sign of where the AI hardware race is heading. The next big fight may not only be who trains the largest model. It may be who can run powerful models cheaply, quickly, and at massive scale.

Discover: News

AI companies want faster inference without depending on one chip supplier

Thank you!

Thank you!

Related articles

Netflix May Require Every Profile to Have Its Own Email Address

BitLocker YellowKey Flaw Raises Concerns, but Most Windows Users Do Not Need to Panic

Microsoft Store Office Apps Will Stop Receiving Security Updates in December 2026