Skymizer’s HTX301 PCIe AI card claims 700B model inference with 384GB memory at 240W

Justin Nelon
Published on May 8, 2026

news

Skymizer’s HTX301 PCIe AI card claims 700B model inference with 384GB memory at 240W

Skymizer has announced the HTX301, a PCIe AI accelerator card designed to run large language models locally without needing a large GPU cluster.

The Taiwan based company says the HTX301 can run inference for 700B parameter models on a single PCIe card. If the claims hold up in real testing, that could make the card interesting for companies that want on premises AI without the cost, power draw, and complexity of large accelerator systems.

The HTX301 is built on Skymizer’s HyperThought platform and uses the company’s next generation LPU IP. It is aimed at LLM inference rather than broad GPU compute, with a focus on decode acceleration, prefill and decode orchestration, low latency, and fixed infrastructure costs.

Each PCIe card uses six HTX301 chips and includes up to 384GB of memory. Skymizer is not using HBM, GDDR, or LPDDR5X here. The card uses standard LPDDR4 and LPDDR5 memory, which helps keep power and cost lower.

The power figure is one of the biggest claims. Skymizer says the card runs at around 240W, which is much lower than high end PCIe AI accelerators such as AMD’s Instinct MI350P or NVIDIA’s RTX PRO 6000 Blackwell server card.

Here is a quick look at the HTX301:

Feature	Details
Product	Skymizer HTX301
Type	PCIe AI accelerator
Main use	Local LLM inference
Maximum model class	Up to 700B parameters
Chips per card	6 HTX301 chips
Memory	Up to 384GB
Memory type	LPDDR4 and LPDDR5
Power	Around 240W
Platform	HyperThought with next generation LPU IP
Target market	On premises AI and enterprise inference

Skymizer also says its LPU design is efficient enough to reach 30 tokens per second with only 0.5 TOPS and 100GB/s of bandwidth. For Llama 2 7B prefill, the company claims an octa core LPU can reach 240 tokens per second, with multi chip scaling reaching up to 1,200 tokens per second.

The company is also using compression to reduce memory and bandwidth pressure. Its weight compression is claimed to perform better than llama.cpp by 9 percent to 17.8 percent, while KV cache compression is said to keep perplexity loss low.

The main appeal is clear. Many businesses want to run AI locally for privacy, predictable latency, and control over data. A single PCIe card that can handle very large models at 240W would be much easier to deploy than a rack full of high power GPUs.

There is still reason to be cautious. These are company claims, and the HTX301 needs independent testing before it can be judged against established GPU based systems. It also appears focused on inference, not full scale model training.

Skymizer plans to show the HTX301 at Computex. Until real benchmarks arrive, it is best viewed as a promising on premises AI accelerator that could matter if its 700B model, memory, and power claims are proven in practice.

Discover: News

Thank you!

Thank you!

Related articles

Tears of Metal Brings Scottish Roguelike Battles to PC Game Pass on July 22

Three More Xbox Games Have Quietly Disappeared From the Store

Xbox Play Anywhere Passes 2,000 Games as Cross Platform Buying Becomes More Valuable