Is anyone aware of an existing project that can do something like this:
- Access an RSS feed.
- Parse the contents of the items in the feed, and fetch linked images.
- Take the new feed elements and add them to previously fetched elements.
- Store all of the content in a merged RSS/XML file, or something like a SQLite DB.
Context: I’d like to archive Mastodon posts of an account automatically. I’d prefer it to be a script/binary I could run on Linux as I’d likely throw it in a GitHub action and save the resulting output in the git repo.
I could probably whip something together but I’m lazy and I’d prefer to use something that already exists.
I use miniflux, and you can configure it to modify feed items. As far as I know it does not purge anything by default.
Really, pulling an RSS feed and parsing it, storing stuff is probably 50 lines of bash, and less in a general purpose scripting language.
https://github.com/mreid/feed-bag
Not sure if it does all you want, but the basics are there, and it wouldn’t be beyond the pale to make something like this do what you want. The code is pretty clean