• Yeller_king@reddthat.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 day ago

    It’s more of a model over-fitting the data than p-hacking. But yeah if you introduce enough variables to your model, your explained variation in y shoots up. Problem is, none of it is causal.

  • Sibbo@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    138
    ·
    3 days ago

    P-hacking is the academically problematic practice of attempting to come up with a question for which the data offers a significant p-value (probability value), as opposed to correct scientific analysis in which a question is formulated clearly and then answered with data.

    It took a while to parse this comic, but with the explanation it’s probably much easier to understand for anyone who doesn’t know what P-hacking is.

    • Agent641@lemmy.world
      link
      fedilink
      English
      arrow-up
      30
      ·
      2 days ago

      Reduce the sample size by increasing qualifying parameters until you find a dataset that matches your hypothesis in such a way that the research grant will be approved.

      • psycotica0@lemmy.ca
        link
        fedilink
        English
        arrow-up
        14
        ·
        edit-2
        2 days ago

        Sometimes even worse, which is to collect a raft of data testing one hypothesis, and then realize it all came up empty, and so go looking for any data you can form a new hypothesis from that matches the data you already have.

    • Grail@multiverse.soulism.net
      link
      fedilink
      English
      arrow-up
      18
      ·
      2 days ago

      One thing you can use p-hacking for is that if you want to prove vaccines are bad, give a bunch of kids vaccines and measure 20 different vital indicators. Then theorise that the vital indicator which got worse was caused by the vaccines.

    • Windex007@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 days ago

      Thanks for that. I’d never heard the term before.

      It sounds a little subjective though? Are there features that can be used to quantity how “P-Hacky” something is?

      I feel like a sports state of “a team tends to lose if thier top scoring player in the first quarter is injured before the end of the first half” has a lot of specific weirdness, but my intuition drives that this specifically could be a very legitimate observation.

      How do you draw the line?

      • 42firehawk@fedinsfw.app
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 days ago

        Usually p hacking doesn’t come from 1 constraint, especially a well explained one, but instead comes from adding a couple or completely unexplained constraints (like a team losing more if their coaches wife is in one section of the stands or another) because at that point it’s decreasing the number of samples (times you have as a reference) to force a significant result.

        So usually for sports p hacking is stats about 1 team only, rather than a general stat about the sport. Preferably a restriction on the other team, then a follow up game based restriction so it seems plausible to the viewer.

        • Windex007@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          Ok, that helps. I think you’re saying the issue arises when the set of constraints limit the observed events to a number too small to draw appropriate conclusions from.

          I’m hesitant to shy away from “bizzare” constraints. If there are enough data points for that scenario to draw some statistical correlation… then that just is the reality even if we can’t explain it (yet).

          If the coaches wife sits in a different section for 20% of the games, and they disproportionately lose when she sits there… that’s the correlation.

          Could be she sits in a further away section if she’s pissed after a fight with her husband the night before, which is a signal the coach also had a bad night, and is fatigued and unfocused during the game now.

          But yeah, you need enough observed instances.

  • shoo@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    ·
    2 days ago

    The reaction to sports pseudo-stats is what really separates casual viewers from real fans. It’s the only way to raise stakes on otherwise forgettable games.

    “This team is on a 5 game win streak”: 🥱

    “This player has never lost an away game in June”: 😯🍿

  • schnurrito@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    34
    ·
    2 days ago

    Here on Austrian TV, there was an excellent example of this just yesterday during the match against Argentina, where the commentary helpfully told us at some random point in the second half that in the last four world cup matches Austria played in, they scored a goal during extra time, the implication being that that would probably happen again now (it didn’t)…

    The last four world cup matches Austria played in were… one in 2026 and three in 1998.

  • Crusty@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    2 days ago

    Cricket stats are getting so stupidly specific it has become a meme at this point.

    It basically boils down to something like "Most runs scored by an Indian batsman on a Tuesday while batting first in overcast conditions.

    • Captain Aggravated@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 days ago

      Reminds me of that show Air Crash Investigations, you might know it as Mayday.

      They did the episode on the collision of two 747s on Tenerife fairly early in the show’s run, so they’d shot the “worst aviation accident in history” wad. And yet the format demanded they quantify the subject’s exact place in history. So they start going “It was the worst aviation accident involving an American-manufactured plane flown by a non-American crew in American airspace to occur on a summer Tuesday.”

        • Captain Aggravated@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          Correct.

          The episode on Tenerife proclaimed it to be “The worst aviation accident in history.”

          Later episodes of the show about different, unrelated accidents had to start undershooting that mark, like “The worst aviation accident in American history” or “The worst aviation accident in Swiss history” until they start talking about a rough landing in a Beech 1900 where everyone got hangnails and it’s “The worst aviation accident at an uncontrolled field to take place during the daytime over a federal holiday weekend involving one twin-engine propeller plane.”

      • Protoknuckles@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        Not a perfect example, but stuff like this: Tony Gwynn did not strike out in back to back games during the 1992 season

        If you watch a minor league baseball game, the announcers will discuss the game and give out hilariously specific information relevant to the game. When a batter comes up, they’ll talk about the last time he encountered this pitcher, or how he does against left handed pitchers in general, or how this pitcher does against this team, or how this batter does later in the week. All of it could be relevant, but it becomes hyper specific.

        BTW, I mention minor league baseball because you can watch it for free, as opposed to major league baseball, which costs a crapton of money, but you get the same experience in the major leagues. If you want to try a game, download the milb app and check it out. https://play.google.com/store/apps/details?id=com.bamnetworks.mobile.android.gameday.milb

  • tyler@programming.dev
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    3
    ·
    3 days ago

    I’ve complained about this for years. This is only in America btw. In other countries they just watch and don’t care about the statistics.

    • SeductiveTortoise@piefed.social
      link
      fedilink
      English
      arrow-up
      18
      ·
      3 days ago

      That’s not entirely true, us Germans love statistics but more like goal to attempt rate or something like this. Received passes, etc. Not whatever they’re trying to do here.

      • yermaw@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        8
        ·
        3 days ago

        Here in the UK we get a bunch of stats every game, but its pretty much the same stats every time with no fuckery. Shots on target (that would have gone in if it wasnt for that pesky keeper) and stuff.

      • tyler@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        2 days ago

        Sorry, what I said was an exaggeration, but yeah keeping track of shots on goal, passes, etc are just basic parts of the game. Essentially the same as keeping score. Americans don’t do that. They literally keep track of exactly stuff like the xkcd shows. It’s not even an exaggeration.

        • gajahmada@awful.systems
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          Americans don’t do that. They literally keep track of exactly stuff like xkcd shows

          Did they do it with most sports? I only seen it in baseball and american football, probably also basketball huh.

    • Krauerking@lemy.lol
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      2 days ago

      This is only in America btw.

      Literally comments all over this thread talking about Austria and Australian Cricket and Argentina.

      Must be that it is only countries that begin with A and the stats are just working through alphabetically.

          • tyler@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            16 hours ago

            … how were there “literally comments all over this thread” talking about other countries when I first commented, if I was the first to comment.

            • Krauerking@lemy.lol
              link
              fedilink
              English
              arrow-up
              1
              ·
              8 hours ago

              Sure. But also it wasnt only in America. You still came in with an assumption that was incorrect and have been proven so and have made no comment towards that.

              You are still trying to keep your opinion while being presented with anecdotes that dispell it.

              I ask if there is more because I dont think its a conversation for you to just double down on the part that restricts its relatability artificially to something that only your select group does when others are pointing out their personal experience with it. Humans do human things. Its nice to compare rather than pretend its exclusive.

  • Postmortal_Pop@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 days ago

    This kind of data recognition is wasted on sports. I’d kill to have a commentary recognizing a player’s top deck:whiff ratio during a mid game dig in commander. I don’t care about optimized turns to win, I wanna see storm/cascade failure rate and scoop on sight %.