• Blue_Morpho@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    23 hours ago

    have undoubtedly improved but LLMs are using the same open source libraries and tools available to anyone…

    I read a surprising article on Lemmy just a week ago that explained that that is not how LLM’s do OCR. LLM’s convert images into tokens and then treat them like text input. I can’t see how it works but it does. It’s why they are better than classic OCR neural nets but at the trade off of enormously larger computation cost.