Anthropic is reportedly in early talks with UK startup Fractile, a company working on a new chip architecture designed to speed up AI inference while reducing cost. The talks are still early, and Fractile has not yet produced test chips, so this should be treated as a developing story rather than a finished product.
Fractile’s technology is called Memory Compute Fusion Architecture. The basic idea is to reduce how much data needs to move back and forth to external DRAM. Instead, more of the work happens inside the chip using SRAM, which can reduce latency and improve efficiency.
AI companies want faster inference without depending on one chip supplier
Anthropic already uses chips from NVIDIA, Google, and Amazon to run Claude. It has also been linked to Broadcom and AMD as AI compute demand grows. That strategy makes sense because relying on one chipmaker can create supply, pricing, and scaling problems.
Fractile is interesting because it is focused on inference, which is the part of AI that runs models after training. As AI apps become more widely used, inference cost becomes a major issue. Every prompt, tool call, and generated response needs compute.
| Company or tech | Role in the report |
|---|---|
| Anthropic | Reportedly in early talks with Fractile |
| Fractile | UK startup working on SRAM based inference architecture |
| Memory Compute Fusion | Designed to reduce DRAM movement and speed up inference |
| NVIDIA Groq | Current comparison point for SRAM based inference acceleration |
| Claude | Anthropic’s AI model family that needs large scale compute |
Fractile claims its design could deliver up to 100x faster AI inference and reduce costs by 10x compared with NVIDIA’s Groq technology. Those numbers are very ambitious, but they remain claims until silicon exists and independent testing confirms them.

The comparison with NVIDIA Groq matters because Groq style inference accelerators use large amounts of SRAM and high bandwidth to reduce latency. NVIDIA’s Groq 3 LPU is described with 500MB of SRAM, 150TB per second of SRAM bandwidth, and 2.5TB per second scale up bandwidth. Fractile appears to be pursuing a similar memory close to compute idea, but with its own architecture.
The bigger trend is clear. AI companies are looking beyond general purpose GPUs because inference costs are becoming huge. Training gets most of the attention, but serving models to millions of people every day can become even more important financially.
For Anthropic, a custom or semi custom inference path could help lower long term operating costs and reduce dependence on external suppliers. But this will take time. Fractile still needs test chips, validation, software support, and real deployment data.
For now, this is a sign of where the AI hardware race is heading. The next big fight may not only be who trains the largest model. It may be who can run powerful models cheaply, quickly, and at massive scale.



Discussion (0)
Be the first to comment.