Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • boonhet@sopuli.xyz
    link
    fedilink
    arrow-up
    1
    ·
    2 hours ago

    Can’t be that cheap unfortunately if they maxed out the die area. Though it is an older node so maybe not as expensive as flagship GPU chips and shit