• Sibbo@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    138
    ·
    3 days ago

    P-hacking is the academically problematic practice of attempting to come up with a question for which the data offers a significant p-value (probability value), as opposed to correct scientific analysis in which a question is formulated clearly and then answered with data.

    It took a while to parse this comic, but with the explanation it’s probably much easier to understand for anyone who doesn’t know what P-hacking is.

    • Agent641@lemmy.world
      link
      fedilink
      English
      arrow-up
      30
      ·
      2 days ago

      Reduce the sample size by increasing qualifying parameters until you find a dataset that matches your hypothesis in such a way that the research grant will be approved.

      • psycotica0@lemmy.ca
        link
        fedilink
        English
        arrow-up
        14
        ·
        edit-2
        2 days ago

        Sometimes even worse, which is to collect a raft of data testing one hypothesis, and then realize it all came up empty, and so go looking for any data you can form a new hypothesis from that matches the data you already have.

    • Grail@multiverse.soulism.net
      link
      fedilink
      English
      arrow-up
      18
      ·
      2 days ago

      One thing you can use p-hacking for is that if you want to prove vaccines are bad, give a bunch of kids vaccines and measure 20 different vital indicators. Then theorise that the vital indicator which got worse was caused by the vaccines.

    • Windex007@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 days ago

      Thanks for that. I’d never heard the term before.

      It sounds a little subjective though? Are there features that can be used to quantity how “P-Hacky” something is?

      I feel like a sports state of “a team tends to lose if thier top scoring player in the first quarter is injured before the end of the first half” has a lot of specific weirdness, but my intuition drives that this specifically could be a very legitimate observation.

      How do you draw the line?

      • 42firehawk@fedinsfw.app
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 days ago

        Usually p hacking doesn’t come from 1 constraint, especially a well explained one, but instead comes from adding a couple or completely unexplained constraints (like a team losing more if their coaches wife is in one section of the stands or another) because at that point it’s decreasing the number of samples (times you have as a reference) to force a significant result.

        So usually for sports p hacking is stats about 1 team only, rather than a general stat about the sport. Preferably a restriction on the other team, then a follow up game based restriction so it seems plausible to the viewer.

        • Windex007@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          Ok, that helps. I think you’re saying the issue arises when the set of constraints limit the observed events to a number too small to draw appropriate conclusions from.

          I’m hesitant to shy away from “bizzare” constraints. If there are enough data points for that scenario to draw some statistical correlation… then that just is the reality even if we can’t explain it (yet).

          If the coaches wife sits in a different section for 20% of the games, and they disproportionately lose when she sits there… that’s the correlation.

          Could be she sits in a further away section if she’s pissed after a fight with her husband the night before, which is a signal the coach also had a bad night, and is fatigued and unfocused during the game now.

          But yeah, you need enough observed instances.