Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 hours ago

    Shh you’ll pop the bubble if you start talking sensibly. It’s not an ASIC—it’s a specialized piece of hardware optimized to execute a model with unparalleled performance. Now buy my entire stock of them and all the supply for the next two years please.

    (Figuring out the compose combination for an emdash took longer than I’d like to admit lol)