Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
Shh you’ll pop the bubble if you start talking sensibly. It’s not an ASIC—it’s a specialized piece of hardware optimized to execute a model with unparalleled performance. Now buy my entire stock of them and all the supply for the next two years please.
(Figuring out the compose combination for an emdash took longer than I’d like to admit lol)