Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • BlameThePeacock@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    3
    ·
    11 hours ago

    A lot of ai hallucinations can be resolved by simply running the results through additional prompts automatically, then checking the various results against each other or against reference material.

    Many agentic systems already do that with a limited number of follow up/check steps, but they’re often restricted by acceptable response times or just sheer costs.

    I managed to get copilot in excel to run a 43 prompt chain in just a little under 10 minutes the other day. The result was exactly what I needed.

    If you have 73 times the output, you can potentially afford to do that kind of processing in an acceptable time frame and cost level.