• phiresky@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    3 days ago

    compressed size is more important than speed of compression

    Yes, but decompression speed is even more important, no? My internet connection gets 40MByte/s and my ssd 500+MB/s, so if my decompressor runs at <40MB/s it’s slowing down my updates / boot time and it would be better to use a worse compression.

    Arch - since 2021 for kernel images https://archlinux.org/news/moving-to-zstandard-images-by-default-on-mkinitcpio/ and since 2019 for packages https://lists.archlinux.org/pipermail/arch-dev-public/2019-December/029739.html

    brotli is mainly good because it basically has a huge dictionary that includes common http headers and html structures so those don’t need to be part of the compressed file. I would assume without testing that zstd would more clearly win against brotli if you’d train a similar dictionary for it or just include a random WARC file into --patch-from.

    Cloudflare started supporting zstd and is using it as the default since 2024 https://blog.cloudflare.com/new-standards/ citing compression speed as the main reason (since it does this on the fly). It’s been in chrome since 2021 https://chromestatus.com/feature/6186023867908096

    The RFC mentions dictionaries but they are not currently used:

    Actually this is already considered in RFC-8878 [0]. The RFC reserves zstd frame dictionary ids in the ranges: <= 32767 and >= (1 << 31) for a public IANA dictionary registry, but there are no such dictionaries published for public use yet. [0]: https://datatracker.ietf.org/doc/html/rfc8878#iana_dict

    And there is a proposed standard for how zstd dictionaries could be served from a domain https://datatracker.ietf.org/doc/rfc9842/

    it’s better in every metric

    Let me revise that statement to - it’s better in every metric (compression speed, compressed size, feature set, most importantly decompression speed) compared to all other compressors I’m aware of, apart from xz and bz2 and potentially other non-lz compressors in the best compression ratio aspect. And I’m not sure whether it beats lzo/lz4 in the very fast levels (negative numbers on zstd).

    that struck me as weird about what you were saying

    What struck me as weird about what you were kind of calling it AI hype crap, when they are developing this for their own use and publishing it (not to make money). I’m kind of assuming this based on how much work they put into open sourcing the zstd format and how deeply it is now used in much FOSS which does not care at all for facebook. The format they are introducing uses explicitly structured data formats to guide a compressor - a structure which can be generated from a struct or class definition, and yes potentially much easier by an LLM, but I don’t think that is hooey. So I assumed you had no idea what you were talking about.

    • PhilipTheBucket@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      Let me revise that statement to - it’s better in every metric (compression speed, compressed size, feature set, most importantly decompression speed) compared to all other compressors I’m aware of, apart from xz and bz2 and potentially other non-lz compressors in the best compression ratio aspect.

      Your Cloudflare post literally says “a new compression algorithm that we have found compresses data 42% faster than Brotli while maintaining almost the same compression levels.” Yes, I get that in some circumstances where compression speed is important, this might be very useful. I don’t see the point in talking further in circles anymore, thank you for the information.

      • phiresky@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        3 days ago

        like I said, brotli contains a large dictionary for web content / http which means you can’t compare it directly to other compressors when looking at web content. the reason they do a comparison like that is because hardcoded dictionaries are not part of the zstd compression content-encoding because it is iffy.