Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

xodoh74984@lemmy.world · edit-2 6 months ago

Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

susadmin@lemmy.world · edit-2 6 months ago

I’m in the process of downloading both dataset 9 torrents (45.63 GB + 86.74 GB). I will then compare the filenames in both versions (the 45.63GB version has 201,358 files alone), note any duplicates, and merge all unique files into one folder. I’ll upload that as a torrent once it’s done so we can get closer to a complete dataset 9 as one file.

Edit 31Jan2026 816pm EST - Making progress. I finished downloading both dataset 9s (45.6 GB and the 86.74 GB). The 45.6GB set is 200,000 files and the 86GB set is 500,000 files. I have a .csv of the filenames and sizes of all files in the 45.6GB version. I’m creating the same .csv for the 86GB version now.

Edit 31Jan2026 845pm EST -
- dataset 9 (45.63 GB) = 201357 files
- dataset 9 (86.74 GB) = 531257 files
I did an exact filename combined with an exact file size comparison between the two dataset9 versions. I also did an exact filename combined with a fuzzy file size comparison (tolerance of +/- 1KB) between the two dataset9 versions. There were:
- 201330 exact matches
- 201330 fuzzy matches (+/- 1KB)
Meaning there are 201330 duplicate files between the two dataset9 versions.

These matches were written to a duplicates file. Then, from each dataset9 version, all files/sizes matching the file and size listed in the duplicates file will be moved to a subfolder. Then I’ll merge both parent folders into one enormous folder containing all unique files and a folder of duplicates. Finally, compress it, make a torrent, and upload it.

Edit 31Jan2026 945pm EST -

Still moving duplicates into subfolders.

Edit 31Jan2026 1027pm EST -

Going off of xodoh74984’s comment (https://lemmy.world/post/42440468/21884588), I’m increasing the rigor of my determination of whether the files that share a filename and size between both version of dataset9 are in fact duplicates. This will be identical to rsync --checksum to verify bit-for-bit that the files are the same by calculating their MD5 hash. This will take a while but is the best way.

Edit 01Feb2026 1227am EST -

Checksum comparison complete. 73 files found that have the same file name and size but different content. Total number of duplicate files = 201257. Merging both dataset versions now, while keeping one subfolder of the duplicates, so nothing is deleted.

Edit 01Feb2026 1258am EST -

Creating the .tar.zst file now. 531285 total files, which includes all unique files between dataset9 (45.6GB) and dataset9 (86.7GB), as well as a subfolder containing the files that were found in both dataset9 versions.

Edit 01Feb2026 215am EST -

I was using wayyyy to high a compression value for no reason (ztsd --ultra --22). Restarted the .tar.zst file creation (with ztsd -12) and it’s going 100x faster now. Should be finished ~~within the hour~~

Edit 01Feb2026 311am EST -

.tar.zst file creation is taking very long. I’m going to let it run overnight - will check back in a few hours. I’m tired boss.

EDIT 01Feb2026 831am EST -

COMPLETE!

And then I doxxed myself in the torrent. One moment please while I fix that…

Final magnet link is HERE. GO GO GOOOOOO

I’m seeding @ 55 MB/s. I’m also trying to get into the new r/EpsteinPublicDatasets subreddit to share the torrent there.

Kindly_District9380@lemmy.world · edit-2 6 months ago

deleted by creator

kongstrong@lemmy.world · edit-2 6 months ago

Would love to help still from my PC on dataset 9 specifically. Any way we can exchange progress so I won’t start with downloading files you already have downloaded?

E: just started scraping starting from page 18330 (as you mentioned you ended around 18333), hoping I can fill in the remaining 4000-ish pages

Update 2 (1715UTC): just finished scraping up until the page 20500 limit you set in the code. There are 0 new files in the range between 18330-20500 compared to the ones you already found. So unless I did something wrong, either your list is complete or the DOJ has been scrambling their shit (considering the large number of duplicate pages, I’m going with the second explanation).

Either way, I’m gonna extract the 48GB and 100GB torrent directories now and try to mark down which of the files already exist within those torrents, so we can make an (intermediate) list of which files are still missing from them

epstein_files_guy@lemmy.world · 6 months ago

looking forward to your torrent, will seed.

I have several incomplete sets of files from dataset 9 that I downloaded with a scraped set of urls - should I try to get them to you to compare as well?

susadmin@lemmy.world · 6 months ago

Yes! I’m not sure the best way to do that - upload them to MEGA and message me a download link?

epstein_files_guy@lemmy.world · 6 months ago

maybe archive.org? that way they can be torrented if others want to attempt their own merging techniques? either way it will be a long upload, my speed is not especially good. I’m still churning through one set of urls that is 1.2M lines, most are failing but I have 65k from that batch so far.

susadmin@lemmy.world · 6 months ago

archive.org is a great idea. Post the link here when you can!

epstein_files_guy@lemmy.world · edit-2 6 months ago

I’ll get the first set (42k files in 31G) uploading as soon as I get it zipped up. it’s the one least likely to have any new files in it since I started at the beginning like others but it’s worth a shot

edit 01FEB2026 1208AM EST - 6.4/30gb uploaded to archive.org

edit 01FEB2026 0430AM EST - 13/30gb uploaded to archive.org; scrape using a different url set going backwards is currently at 75.4k files

edit 01FEB2026 1233PM EST - had an internet outage overnight and lost all progress on the archive.org upload, currently back to 11/30gb. the scrape using a previous url set seems to be getting very few new files now, sitting at 77.9k at the moment

thetrekkersparky@startrek.website · 6 months ago

I’m downloading 8-11 now, I’m seeding 1-7+12 now. I’ve tried checking up on reddit, but every other time i check in the post is nuked or something. My home server never goes down and I’m outside USA. I’m working on the 100GB+ #9 right now and I’ll seed whatever you can get up here too.

donmega@lemmy.world · 6 months ago

Thank you so much for keeping us updated!!

helpingidiot@lemmy.world · 6 months ago

Have a good night. I’ll be waiting to download it, seed it, make hardcopies and redistribute it.

Please check back in with us

xodoh74984@lemmy.world · edit-2 6 months ago

When merging versions of Data Set 9, is there any risk of loss with simply using rsync --checksum to dump all files into one directory?

susadmin@lemmy.world · 6 months ago

rsync --checksum is better than my file name + file size comparison, since you are calculating the hash of each file and comparing it to the hash all other files. For example, if there is a file called data1.pdf with size 1024 bytes in dataset9-v1, and another file called data1.pdf with size 1024 bytes in dataset9-v2, but their content is different, my method will still detect them as identical files.

I’m going to modify my script to calculate and compare the hashes of all files that I previously determined to be duplicates. If the hashes of the duplicates in dataset9 (45GB torrent) match the hashes of the duplicates in dataset9 (86GB torrent), then they are in fact duplicates between the two datasets.

xodoh74984@lemmy.world · edit-2 6 months ago

Amazing, thank you. That was my thought, check hashes while merging the files to keep any copies that might have been modified by DOJ and discard duplicates even if the duplicates have different metadata, e.g. timestamps.

bay400@thelemmy.club · 6 months ago

deleted by creator

bile@lemmy.world · edit-2 6 months ago

here is the file contents w/ SHA-256 hashes: deleted this

the original post on reddit was deleted after sharing this https://old.reddit.com/r/DataHoarder/comments/1qsfv3j/epstein_9_10_11_12_reddit_keeps_nuking_thread_we/o2vqgoc/

GorillaCall@lemmy.world · 6 months ago

anyone have the original 186gb magnet link from that thread? someone said reddit keeps nuking it because it implicates reddit admins like spez

idiomaddict@lemmy.world · 6 months ago

This is it, encoded in base 64 format, according to the comment:

bWFnbmV0Oj94dD11cm46YnRpaDo3YWM4Zjc3MTY3OGQxOWM3NWEyNmVhNmMxNGU3ZDRjMDAzZmJmOWI2JmRuPWRhdGFzZXQ5LW1vcmUtY29tcGxldGUudGFyLnpzdCZ4bD05NjE0ODcyNDgzNyZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLm9wZW50cmFja3Iub3JnJTNBMTMzNyUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRm9wZW4uZGVtb25paS5jb20lM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGZXhvZHVzLmRlc3luYy5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9aHR0cCUzQSUyRiUyRm9wZW4udHJhY2tlci5jbCUzQTEzMzclMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnplcjBkYXkuY2glM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGd2Vwem9uZS5uZXQlM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlcjEubXlwb3JuLmNsdWIlM0E5MzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci50b3JyZW50LmV1Lm9yZyUzQTQ1MSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIudGhlb2tzLm5ldCUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnNydjAwLmNvbSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnF1LmF4JTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIuZGxlci5vcmclM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5iaXR0b3IucHclM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5hbGFza2FudGYuY29tJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXItdWRwLmdiaXR0LmluZm8lM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnJ1bi5wdWJsaWN0cmFja2VyLnh5eiUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVudHJhY2tlci5pbyUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLmRzdHVkLmlvJTNBNjk2OSUyRmFubm91bmNlJnRyPWh0dHBzJTNBJTJGJTJGdHJhY2tlci56aHVxaXkuY29tJTNBNDQzJTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5maWxlbWFpbC5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdC5vdmVyZmxvdy5iaXolM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGbWFydGluLWdlYmhhcmR0LmV1JTNBMjUlMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZldmFuLmltJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRmQ0MDk2OS5hY29kLnJlZ3J1Y29sby5ydSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkY2YWhkZHV0YjF1Y2MzY3AucnUlM0E2OTY5JTJGYW5ub3VuY2U

ModernSimian@lemmy.world · 6 months ago

Be prepared to wait a while… idk why this person chose xz, it is so slow. I’ve been just trying to get the tarball out for an hour.

Xenom0rph@lemmy.world · 6 months ago

Thank you for the final link, downloading now. Will seed forever if needed.

WhatCD@lemmy.world · edit-2 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

this method is not working for me anymore

WhatCD@lemmy.world · edit-2 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

I messaged you on the other site; I’m currently getting a Could not determine Content-Length (got None) error

WhatCD@lemmy.world · 6 months ago

deleted by creator

WorldlyBasis9838@lemmy.world · 6 months ago

I also was getting the same error. Going to the link successfully downloads.

Updating the cookies fixed the issue.

WorldlyBasis9838@lemmy.world · edit-2 6 months ago

Can also confirm, receiving more chunks again.

EDIT: Someone should play around with the retry and backoff settings to see if a certain configuration can avoid being blocked for a longer period of time. IP rotating is too much trouble.

WhatCD@lemmy.world · 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

age gate > page not found

WhatCD@lemmy.world · 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

alrighty, I’m currently in the middle of the archive.org upload but I can transfer the chunks I already have over to a different machine and do it there with a new IP

WhatCD@lemmy.world · 6 months ago

deleted by creator

WorldlyBasis9838@lemmy.world · 6 months ago

Nor I. I got a single chunk back before never getting anything again.

epstein_files_guy@lemmy.world · 6 months ago

I’m using a partial download I already had and not the 48gb version but I will be gathering as many chunks as I can as well. Thanks for making this

WhatCD@lemmy.world · 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

about 25gb

kongstrong@lemmy.world · 6 months ago

Awesome, I don’t really understand what’s happening but I’m also running it (also doing it for the presumably exact same 48GB torrent, but I’m supposed to do that right?)

Wild_Cow_5769@lemmy.world · 6 months ago

Is anyone able to get this working again? It seemed to stop. I have updated cookies. If I remove the chunks it seems to start connecting again but when I put them back it runs for a few mins and then kicks the bucket.

Boomtown1873@lemmy.world · 6 months ago

Funny how a rag-tag ad-hoc group can seed data so much better than the DOJ. Beautiful to see in action.

hector@lemmy.today · 6 months ago

The doj could do better, they are ordered not to.

jankscripts@lemmy.world · 6 months ago

Heads up that the DOJ site is a tar pit, it’s going to return 50 files on the page regardless of the page number your on seems like somewhere between 2k-5k pages it just wraps around right now.

Testing page 2000... ✓ 50 new files (out of 50)
Testing page 5000... ○ 0 new files - all duplicates
Testing page 10000... ○ 0 new files - all duplicates
Testing page 20000... ○ 0 new files - all duplicates
Testing page 50000... ○ 0 new files - all duplicates
Testing page 100000... ○ 0 new files - all duplicates

WorldlyBasis9838@lemmy.world · 6 months ago

I saw this too; yesterday I tried manually accessing the page to explore just how many there are. Seems like some of the pages are duplicates (I was simply comparing the last listed file name and content between some of the first 10 pages, and even had 1-2 duplications.)

Far as maximum page number goes, if you use the query parameter ?page=200000000 it will still resolve a list of files. — actually crazy.

https://www.justice.gov/epstein/doj-disclosures/data-set-9-files?page=200000000

jankscripts@lemmy.world · 6 months ago

The last page I got a non-duplicate URL from was 10853 which curiously only had 36 URLs on page. When I browsed directly to page 10853 36 URLs were displayed but then moving back and forth in the page count the tar pit logic must have re-looped there and it went back to 50 Displayed. I ended with 224751 URLs

hYcG68caGB7WvLX67@lemmy.world · 6 months ago

I was quick to download dataset 12 after it was discovered to exist, and apparently my dataset 12 contains some files that were later removed. Uploaded to IA in case it contains anything that later archivists missed. https://archive.org/details/data-set-12_202602

Specifically doc number 2731361 and others around it were at some point later removed from DoJ, but are still within this early-download DS12. Maybe more, unsure

susadmin@lemmy.world · edit-2 6 months ago

The files in this (early) dataset 12 are identical to the dataset 12 here, which is the link in the OP. The MD5 hashes are identical.

I shared a .csv file of the calculated MD5 hashes here

epstein_files_guy@lemmy.world · 6 months ago

I’ve got that one too, maybe we should compare dataset 12 versions too

Sundoen@lemmy.world · edit-2 6 months ago

deleted by creator

bay400@thelemmy.club · edit-2 6 months ago

deleted by creator

helpingidiot@lemmy.world · edit-2 6 months ago

What version of dataset 9 is this?

bay400@thelemmy.club · 6 months ago

deleted by creator

helpingidiot@lemmy.world · 6 months ago

Ah I see now! Sorry, I’m new to this platform and I need to get used to the structure of it.

Thanks

bay400@thelemmy.club · edit-2 6 months ago

deleted by creator

helpingidiot@lemmy.world · 6 months ago

They’re probably too dumb to understand “>” means “greater then” or in your sentence: People are worth more then property / people over property.

They probably read it like “People are property” which would obviously be “=” or “->” instead of “>”.

bay400@thelemmy.club · edit-2 6 months ago

deleted by creator

bile@lemmy.world · 6 months ago

reposting a full magnet list (besides 9) of all the datasets that was on reddit with healthy seeds:

Dataset 1 (2.47GB)

magnet:?xt=urn:btih:4e2fd3707919bebc3177e85498d67cb7474bfd96&dn=DataSet+1&xl=2658494752&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 2 (631.6MB)

magnet:?xt=urn:btih:d3ec6b3ea50ddbcf8b6f404f419adc584964418a&dn=DataSet+2&xl=662334369&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 3 (599.4MB)

magnet:?xt=urn:btih:27704fe736090510aa9f314f5854691d905d1ff3&dn=DataSet+3&xl=628519331&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 4 (358.4MB)

magnet:?xt=urn:btih:4be48044be0e10f719d0de341b7a47ea3e8c3c1a&dn=DataSet+4&xl=375905556&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 5 (61.5MB)

magnet:?xt=urn:btih:1deb0669aca054c313493d5f3bf48eed89907470&dn=DataSet+5&xl=64579973&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 6 (53.0MB)

magnet:?xt=urn:btih:05e7b8aefd91cefcbe28a8788d3ad4a0db47d5e2&dn=DataSet+6&xl=55600717&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 7 (98.2MB)

magnet:?xt=urn:btih:bcd8ec2e697b446661921a729b8c92b689df0360&dn=DataSet+7&xl=103060624&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 8 (10.67GB)

magnet:?xt=urn:btih:c3a522d6810ee717a2c7e2ef705163e297d34b72&dn=DataSet%208&xl=11465535175&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 10 (78.64GB)

magnet:?xt=urn:btih:d509cc4ca1a415a9ba3b6cb920f67c44aed7fe1f&dn=DataSet%2010.zip&xl=84439381640

Dataset 11 (25.55GB)

magnet:?xt=urn:btih:59975667f8bdd5baf9945b0e2db8a57d52d32957&xt=urn:btmh:12200ab9e7614c13695fe17c71baedec717b6294a34dfa243a614602b87ec06453ad&dn=DataSet%2011.zip&xl=27441913130&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=http%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.srv00.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.filemail.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker-udp.gbitt.info%3A80%2Fannounce&tr=udp%3A%2F%2Frun.publictracker.xyz%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.dstud.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fleet-tracker.moe%3A1337%2Fannounce&tr=https%3A%2F%2Ftracker.zhuqiy.com%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.pmman.tech%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.moeblog.cn%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.alaskantf.com%3A443%2Fannounce&tr=https%3A%2F%2Fshahidrazi.online%3A443%2Fannounce&tr=http%3A%2F%2Fwww.torrentsnipe.info%3A2701%2Fannounce&tr=http%3A%2F%2Fwww.genesis-sp.org%3A2710%2Fannounce

Dataset 12 (114.0MB)

magnet:?xt=urn:btih:EE6D2CE5B222B028173E4DEDC6F74F08AFBBB7A3&dn=DataSet%2012.zip&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce

xodoh74984@lemmy.world · 6 months ago

Thank you for this!

I’ve added all magnet links for sets 1-8 to the original post. Magnet links for 9-11 match OP. Magnet link for 12 is different, but we’ve identified that there are at least two versions. DOJ removed files before the second version was downloaded. OP contains the early version of data set 12.

MachineFab812@discuss.tchncs.de · edit-2 6 months ago

Does no-one have a CSAM-removed torrent for Data-sets 9 or 10 yet? … and why is no-one seeding data set 11?

Xenom0rph@lemmy.world · 6 months ago

deleted by creator

Dessalines@lemmy.ml · 6 months ago

Thx for posting, seed if you can ppl.

donmega@lemmy.world · 6 months ago

DOJ Just Removed Epstein Flight Log + Contact book in the last 30 minutes

MIRRORS:

This should be the flight log: https://epsteinfilez.com/?doc=f2f8c7628ddc9cbc1cf6a7532b847ae45eec1165cb400fccea19213982956d3d.pdf&p=1

and

https://epsteinfilez.com/?doc=bccc4a7953c0834228a422f1c3fede4d361dc8240c191d2e2b4616259ccdb4e9.pdf&p=1

and this is the contact book: https://epsteinfilez.com/?doc=b395c578ed9394206eaae4f724f99b094d81a5fce45006b247150433b38016c6.pdf&p=1

MachineFab812@discuss.tchncs.de · 6 months ago

deleted by creator

ModernSimian@lemmy.world · edit-2 6 months ago

Does anyone have an index of filenames/links from the DOJ website scraped?

Edit, specifically for DataSet 9.

Kindly_District9380@lemmy.world · 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · 6 months ago

I’m waiting for /u/Kindly_District9380 's version but I’ve been slowly working backwards on this in the meantime https://archive.org/details/dataset9_url_list

Kindly_District9380@lemmy.world · edit-2 6 months ago

deleted by creator

epstein_files_guy@lemmy.world · edit-2 6 months ago

No worries, thank you!

edit: I’ll start on that url list (randomized) tomorrow, my run from the previously generated url list is still going (currently 75.6k files)

kongstrong@lemmy.world · 6 months ago

I’ve been checking your URLs but it seems you’ve got a lot without a downloadable document attached?

epstein_files_guy@lemmy.world · 6 months ago

yeah I’m not the one who generated the url list but I’ve also been getting a lot without a downloadable document. I’m going to start on one of the url lists posted here soon

kongstrong@lemmy.world · 6 months ago

nice. Kinda feeling like we can’t be sure whether our URL lists are ever exhaustive enough or that the DOJ might just let a large part of the dataset go dark

epstein_files_guy@lemmy.world · 6 months ago

yep, impossible to know

Wild_Cow_5769@lemmy.world · 6 months ago

As far as CSAM and the “don’t go looking for data set 9”…

Look I’ll be straight up.

If I find any CSAM it gets deleted…

But if you believe for 1 second that DOJ didn’t remove delete relevant files because they are protecting people then I have a time share to sell you at a cheap price on a beautiful scenic swamp in Florida…

MachineFab812@discuss.tchncs.de · edit-2 6 months ago

It’s literally left-in on purpose to try to have something over people that download and/or seed the torrents. We need a file-list to know what not to dl/seed, or a new torrent for that set.

Nomad64@lemmy.world · edit-2 6 months ago

I am seeding sets 1-8, 10-12, and the larger set 9. Seedbox is outside the US and has a very fast connection.

I will keep an eye on this post for other sets. 👍

shithawk@lemmy.world · 6 months ago

deleted by creator

PeoplesElbow@lemmy.world · 6 months ago

Ok everyone, I have done a complete indexing of the first 13,000 pages of the DOJ Data Set 9.

KEY FINDING: 3 files are listed but INACCESSIBLE

These appear in DOJ pagination but return error pages - potential evidence of removal:

EFTA00326497

EFTA00326501

EFTA00534391

You can try them yourself (they all fail):

https://www.justice.gov/epstein/files/DataSet 9/EFTA00326497.pdf

The 86GB torrent is 7x more complete than DOJ website

DOJ website exposes: 77,766 files

Torrent contains: 531,256 files

Page Range Min EFTA Max EFTA New Files

0-499 EFTA00039025 EFTA00267311 21,842

500-999 EFTA00267314 EFTA00337032 18,983

1000-1499 EFTA00067524 EFTA00380774 14,396

1500-1999 EFTA00092963 EFTA00413050 2,709

2000-2499 EFTA00083599 EFTA00426736 4,432

2500-2999 EFTA00218527 EFTA00423620 4,515

3000-3499 EFTA00203975 EFTA00539216 2,692

3500-3999 EFTA00137295 EFTA00313715 329

4000-4499 EFTA00078217 EFTA00338754 706

4500-4999 EFTA00338134 EFTA00384534 2,825

5000-5499 EFTA00377742 EFTA00415182 1,353

5500-5999 EFTA00416356 EFTA00432673 1,214

6000-6499 EFTA00213187 EFTA00270156 501

6500-6999 EFTA00068280 EFTA00281003 554

7000-7499 EFTA00154989 EFTA00425720 106

7500-7999 (no new files - all wraps/redundant)

8000-8499 (no new files - all wraps/redundant)

8500-8999 EFTA00168409 EFTA00169291 10

9000-9499 EFTA00154873 EFTA00154974 35

9500-9999 EFTA00139661 EFTA00377759 324

10000-10499 EFTA00140897 EFTA01262781 240

10500-12999 (no new files - all wraps/redundant)

TOTAL UNIQUE FILES: 77,766

Pagination limit discovered: page 184,467,440,737,095,516 (2^64/100)

I searched random pages between 13k and this limit - NO new documents found. The pagination is an infinite loop. All work at: https://github.com/degenai/Dataset9

PeoplesElbow@lemmy.world · 6 months ago

DOJ Epstein Files: I found what’s around those 3 missing files (Part 2)

Follow-up to my Dataset 9 indexing post. I pulled the adjacent files from my local copy of the torrent. What I found is… notable.

TLDR

The 3 missing files aren’t random corruption. They all cluster around one event: Epstein’s girlfriend Karyna Shuliak leaving St. Thomas (the island) in April 2016. And one of the gaps sits directly next to an email where Epstein recommends her a novel about a sympathetic pedophile—two days before the book was publicly released.

The Big Finding: Duplicate Processing Batches

Two of the missing files (326497 and 534391) are the same document processed twice—once with redactions, once without—208,000 files apart in the index.

Redacted Batch	Unredacted Batch	Content
326494-326496	534388-534390	AmEx travel booking, staff emails
326497 - MISSING	534391 - MISSING	???
326498-326500	—	Email chain continues
326501 - MISSING	—	???
326502-326506	—	Reply + Invoice
—	534392	Epstein personal email

Random file corruption hitting the same logical document in two separate processing runs, 208,000 positions apart? That’s not how corruption works. That’s how removal works.

What’s Actually In These Files

I pulled everything around the gaps. It’s all one email chain from April 10, 2016:

The event: Karyna Shuliak (Epstein’s girlfriend) booked on Delta flight from Charlotte Amalie, St. Thomas → JFK on April 13, 2016.

St. Thomas is where you fly in/out to reach Little St. James. She was leaving the island.

The chain:

11:31 AM — AmEx Centurion (black card) sends confirmation to [email protected]
11:33 AM — Lesley Groff (Epstein’s executive assistant) forwards to Shuliak, CC’s staff
11:35 AM — Shuliak replies “Thanks so much”
3:52 PM — Epstein personally emails Shuliak
Next day — AmEx sends invoice

The unredacted batch (534xxx) reveals the email addresses that are blacked out in the redacted batch (326xxx):

Lesley Groff: [email protected]
Ann Rodriquez: [email protected]
Bella Klein: [email protected]
Karyna Shuliak: [email protected]

The Epstein Email (EFTA00534392)

The document immediately after missing file 534391:

From: "jeffrey E." <jeevacation@gmail.com>
To: Karyna Shuliak
Date: Sun, 10 Apr 2016 19:52:13 +0000

order http://softskull.com/dd-product/undone/

He’s telling her to buy a book. The same day she’s being booked to leave his island.

The Book

“Undone” by John Colapinto (Soft Skull Press)

On-sale date: April 12, 2016
Epstein’s email: April 10, 2016

He recommended it two days before public release.

Publisher’s description:

“Dez is a former lawyer and teacher—an ephebophile with a proclivity for teenage girls, hiding out in a trailer park with his latest conquest, Chloe. Having been in and out of courtrooms (and therapists’ offices) for a number of years, Dez is at odds with a society that persecutes him over his desires.”

The protagonist is a pedophile who resents society for judging him.

The author (John Colapinto) is a New Yorker staff writer, former Vanity Fair and Rolling Stone contributor. Exactly the media circles Epstein cultivated.

What’s Missing

So now we know the context:

EFTA00326497 — Between AmEx confirmation and Groff’s forward. Probably the PDF ticket attachment referenced in the emails.
EFTA00326501 — Between the forward chain and Shuliak’s reply. Unknown.
EFTA00534391 — Immediately before Epstein’s personal email about the pedo book. Unknown, but its position is notable.

Open Questions

How did Epstein have this book before release? Advance copy? Knows the author?
What is 534391? It sits between staff logistics emails and Epstein’s direct correspondence. Another Epstein email? An attachment?
Are there other Shuliak travel records with similar gaps? Is April 2016 unique or part of a pattern?
What else is in the corpus from [email protected]?

Verify It Yourself

Try the DOJ links (all return errors):

Check the torrent: Pull the EFTA numbers I listed. Confirm the gaps. Confirm the adjacencies.

Grep the corpus: Search for “QWURMO” (booking reference), “Shuliak”, “jeevacation”, “Colapinto”

Summary

Three files missing from 531,256. All three cluster around one girlfriend’s April 2016 departure from St. Thomas. Same gaps appear in two processing batches 208,000 files apart. One gap sits adjacent to Epstein personally recommending a novel about a sympathetic pedophile, sent before the book was even publicly available.

This isn’t random corruption.

Full analysis + all code: https://github.com/degenai/Dataset9

If anyone has the torrent and wants to grep for Colapinto connections or other Shuliak trips, please do. This is open source for a reason.

sherbeticecream@lemmy.world · 6 months ago

Just skimming through and I have file 534391 but it shows ‘No Images Produced’ not sure if that was your reason as well and apologies in advance! Heres an image of said file (https://lemmy.world/pictrs/image/d840f280-5e32-4417-a92e-ff281582080a.png)

PeoplesElbow@lemmy.world · 6 months ago

That is new information! I wasnt even able to get that ‘no images produced’ page, good to know thank you. I just hit a file corruption error when I tried to dl from the DOJ. Thank you for the information. I guess this means the content is still missing in a way but at least accounted for.

sherbeticecream@lemmy.world · 6 months ago

Yeah for sure! I may be going over my head with this but I want to believe that there were a few different dataset 9 zips that the DOJ uploaded. My theory is that each time they ‘uploaded’, for instance, the first upload, they just pushed it out w/o checking most of the unredacted files, then another set with more redacted files while simultaneously removed files. Next set, like mine, where there are files with ‘no images produced’ shown and finally, those that are missing the files completely like yourself. Then you got the news outlet folks where they probably have the complete dataset. With the DOJ taking everything down due to CSAM, they could possibly try to dox/charge those for even having it and that’s possibly one of the many reasons OP left/stopped. Again, just a theory, call it a conspiracy lol

Wild_Cow_5769@lemmy.world · 6 months ago

Just like I said… In NO way do I trust DOJ… Our only hope is if someone drops the full data set 9 somewhere.

PeoplesElbow@lemmy.world · 6 months ago

My question is, why is the total download size so large and the range of displayed documents so little? Only 15% of the known documents are individually served on the site, and some arent seen until page 10,000

Moonsurfer_1@lemmy.world · 6 months ago

It’s an effort to obscure for sure.

Wild_Cow_5769@lemmy.world · 6 months ago

Yup… hopefully someone is able to get the full zip

Wild_Cow_5769@lemmy.world · 6 months ago

That’s why you need the full zip…

kongstrong@lemmy.world · 6 months ago

ysk the page limit has been fixed, it caps out around 9600 for a total of ~197k file entries. Way less than the largest torrent’s 530k. Scraping now to get a list of the files they kept on the DOJ so we can determine which files they don’t want out there. Would be a good lead to further investigate the torrent

PeoplesElbow@lemmy.world · 6 months ago

Oh no…I didn’t know this, on one hand now i need to run another scan, but on the other it could reveal something, the torrent has 500k+ files so there is still a gap. I will run the scraper again and do a new analysis in the next day or two.

TheBobverse@lemmy.world · 6 months ago

Is there any grunt work that needs to be done? I would like to help out but I’m not sure how to make sure my work isn’t redundant. I mean like looking through individual files etc. Is there an organized effort to comb through everything?

kongstrong@lemmy.world · 6 months ago

DM me your matrix account, we’re looking to get more people to uncover what’s missing from dataset 9, see https://lemmy.world/post/42440468/21884671

TheBobverse@lemmy.world · 6 months ago

I don’t have a matrix account currently, but would be willing to get one.

kongstrong@lemmy.world · 6 months ago

yea lmk

TheBobverse@lemmy.world · 6 months ago

Do you have a recommendation on provider choice?

kongstrong@lemmy.world · 6 months ago

we’re on element