• nitroemdash@lemmy.wtf
    link
    fedilink
    English
    arrow-up
    16
    ·
    18 hours ago

    Tokens are well-defined groups of bytes ranged by frequency of occurrence in texts to efficiently translate them into a sequence of 32 or 64-bit binary integers, an LLM-optimised form if compression. They are well-known, you can play with them here: https://gpt-tokenizer.dev/