• Captain Aggravated@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        Than an entire word?

        Take “cactus” for example. Each letter in the word “cactus” is one unicode character, for a total of six. 🌵 is one unicode character, U+1F335.

        Unicode characters are 4 bytes long, so “cactus” takes 24 bytes to transmit, where “🌵” takes 4. Unless something something UTF_8?

        • onlyhalfminotaur@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          2 hours ago

          You’re close, Unicode characters don’t imply a number of bytes, it’s how they’re encoded that does (utf-8 most commonly). Utf-8 can be as little as one byte or as many as four, depending on the specific character. I don’t know about emojis but I imagine they’re in the four bytes section. Whereas “asdf” is also four bytes in utf-8.

          • Captain Aggravated@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            So I just looked it up, the UTF-8 encoding for the cactus emoji is 4 bytes long: 0xF0 0x9F 0x8C 0xB5

            Where the Latin alphabet is in the 1-byte region.

            So it takes 6 bytes to transmit “cactus” in UTF-8, and only 4 to transmit “🌵”. So any emoji that replaces 5 or more letters is more efficient. 🍆 breaks even with “dick” or “cock”, more efficient than “penis”, more than twice as compact as “eggplant” or “aubergine”.

          • Captain Aggravated@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            Yes, to be clear I meant the example I gave where the word was replaced with the emoji was compression, not where they give the word and its emoji. That’s as long-handed as possible.