I’m currently using a pagination, link extraction, and Python filtering process before feeding links to fichub-cli to download all stories from a specific forum. The workflow is detailed in this post: https://piefed.zip/post/1151173 . Looking for a more streamlined, possibly one-command solution that could crawl the forum, extract thread links, and download them automatically. Any suggestions?

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    14 hours ago
    • Start with the comprehensive link collection from Cyb3rNexus’s GitHub Gist – it already contains hundreds of pre-filtered thread links!
    • For more recent stories, navigate to NSFW Creative Writing

    If your interest is in bulk download of erotic stories and you don’t specifically care about that forum (which I assume is the case, if you just want to dump the entire thing) — like, you’re looking for a training corpus to fine-tune an LLM to generate material along those lines or something in that neighborhood — I suspect that there are considerably-more-substantial archives than “hundreds”.

    checks

    It looks like ftp.asstr.org is still running an anonymous-access public FTP server. They’ll have years of archives from the relevant text erotica Usenet groups. You won’t need to screen-scrape that; just use any client that can recursively download from an FTP server.