Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called “Searchcord” based on a different data set that shows non-anonymized chat histories.

  • Gibibit@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    ·
    1 day ago

    Yeah this being just as easy on bb forums or literally any webpage with a public comment section was my first thought as well…

    Isn’t most of the internet scraped anyways, by the internet archive? The concerning part is that this is 100% going to be used to train some coomer brained AI. Scraping, botting, scamming: all those things are going to happen on large public communities.

    • Melvin_Ferd@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      23 hours ago

      Yeah, a lot of this push is about ushering in new laws to prevent data scraping.

      Propaganda spreads easily through fake accounts—but how do we detect large-scale operations if they’re constantly creating and deleting accounts or trying to blend in with the rest of us? We’d need access to massive data sets to mine for patterns and expose coordinated behavior.

      But the powers that benefit from shaping the narrative are the same ones pushing the idea that all scraping is bad. They want people to hate it, so they can justify laws that lock down access. That’s the end game.