In grad school I worked with MRI data (hence the username). I had to upload ~500GB to our supercomputing cluster. Somewhere around 100,000 MRI images, and wrote 20 or so different machine learning algorithms to process them. All said and done, I ended up with about 2.5TB on the supercomputer. About 500MB ended up being useful and made it into my thesis.
Don’t stay in school, kids.
Agreed, seems like a no-brainer. Typically this stuff is handled at an institutional level, with bad professors losing/ failing to achieve tenure. But some results have much bigger implications than just “Uh oh, I cited that paper and it was a bad one.” Often, entire clinical pipelines are developed off of bad research, which wastes millions of dollars.
See also, the recent scandals in Alzheimer’s research. https://www.science.org/content/article/potential-fabrication-research-images-threatens-key-theory-alzheimers-disease