tl;dr - By using this very strange file format, you can functionally have access to the vast power of a vector database, but with the local simplicity of sqlite.
If I’m understanding this correctly: if you wanted to do a simple search for exact text strings, and that was all that you needed, then yes, you should probably use something like an sqlite database to index and query from.
However, if you are working with massively large data sets, and you need a vector database (for contextual or semantic searches) - well, that’s a next level tier of complexity. At that point, you need a vector database server.
What this thing does, however, is format your data into what they call “video” (but realistically would probably look like static if you were to actually play it in VLC). Then…
… I think it’s hooking into some similarities between vector databases and video processing, and then using the mature video processing technology to process the “video” at lightning-fast speeds. And you get all of that contextual power without relying on a cloud-based vector database server.
(To be clear, I’m doing a lot of hand-waving over the “similarities between vector databases and video processing” here - perhaps somebody with a computer science degree, or an autistic savant, can explain why this works the way that it does.)
Thank you for that, but I still don’t understand where the benefits come from.
Realistically what you wrote sounds like hogwash.
This dude is trying perf checking the solution https://github.com/Olow304/memvid/issues/51 , findings he has (so far) are interesting, like data getting malformed during processing (byte size comparison). Or that 16KB of text is 640KB of video.
Oh, and right now people are calling out the bullshit of the repo
tl;dr - By using this very strange file format, you can functionally have access to the vast power of a vector database, but with the local simplicity of sqlite.
If I’m understanding this correctly: if you wanted to do a simple search for exact text strings, and that was all that you needed, then yes, you should probably use something like an sqlite database to index and query from.
However, if you are working with massively large data sets, and you need a vector database (for contextual or semantic searches) - well, that’s a next level tier of complexity. At that point, you need a vector database server.
What this thing does, however, is format your data into what they call “video” (but realistically would probably look like static if you were to actually play it in VLC). Then…
… I think it’s hooking into some similarities between vector databases and video processing, and then using the mature video processing technology to process the “video” at lightning-fast speeds. And you get all of that contextual power without relying on a cloud-based vector database server.
(To be clear, I’m doing a lot of hand-waving over the “similarities between vector databases and video processing” here - perhaps somebody with a computer science degree, or an autistic savant, can explain why this works the way that it does.)
Thank you for that, but I still don’t understand where the benefits come from.
Realistically what you wrote sounds like hogwash.
This dude is trying perf checking the solution https://github.com/Olow304/memvid/issues/51 , findings he has (so far) are interesting, like data getting malformed during processing (byte size comparison). Or that 16KB of text is 640KB of video.
Oh, and right now people are calling out the bullshit of the repo
https://github.com/janekm/retrieval_comparison/blob/main/memvid_critique.md
https://github.com/Olow304/memvid/issues/52
https://github.com/Olow304/memvid/issues/49
Thank you for taking the time to look into it!
Um… Yeah… Using 3 gigs to store 19 megs of text is… suboptimal.
Maybe something neat will come out of this down the road, but right now it doesn’t seem very practical.
TBH this looks like a scam (false claims) made for clout. Especially given how advertised the repo is on all social medias.
Edit: author of it is an uni student, so nevermind, I guess he’s just inexperienced not malevolent.