ai oopsie

big_spoon@lemmygrad.ml · 2 months ago

ai oopsie

CriticalResist8@lemmygrad.ml · 2 months ago

The way these tools are being marketed by tech companies is completely wrong and prone to making disasters like this. It’s a tool; it’s like selling a fruit-only knife then leading customers into thinking it can only cut fruit and nothing else (until inevitably someone cuts themselves on it). I agree google has some responsibility there if this happened (his story seems a bit fishy tbh but that’s not really the point) and this is also why OSes bake some protective measures in such as user permissions. It’s also why everyone has been telling everyone to make backups for years even though nobody does it lol. 10 years ago steam introduced a bug that could wipe linux drives.

I see from his video that anti-gravity obfuscates the chain-of-thought and the outputs - it’s a proprietary model so they don’t want to share that, but it makes troubleshooting impossible. He also had it set on ‘turbo’ mode which bypasses requesting permissions to run commands - there should be heavy discouragement to users doing that,including making them actually edit config files imo, it shouldn’t just be a nice-sounding toggle because then people think “turbo means it goes fast of course I want it to go fast”.

They want to market agents as a do-everything app but it’s still software under the hood. And I don’t trust google to ship any good product anyway, but obviously that’s not how google markets itself. And of course you’re stuck with expensive google models if you use anti-gravity.

People are also right that this should run in a container with no way to escape it, and even crush (the one I use) is not great about this - though it should be possible to containerize it yourself. Coming from a company like google this kind of stuff should come out of the box with the software and set up for you. This is also one of the many reasons I switched away from Windows, the moment they announced integrated agentic I knew you would never be able to fully remove it.

I can believe what happened is possible – if anything it serves as a PSA not to trust software blindly. When I was a kid the most hilarious thing you could do on the internet is tell someone to delete system32 so. From one of OP’s comments it seems the problem was the space in a folder name that windows parsed incorrectly because of the OS’s rmdir command? No way to tell for sure since gemini obfuscates the output, and of course that’s just what OP thinks the problem was.

Someone tried to reproduce with more locked down perms and the output (pic) was just as concerning from anti-gravity. It said its “instructions” prevented it from running the command, when it should say “the agent prevents the command from being run” (and deepseek does say this in crush). I.e. this should be hard-coded but it seems to be passed to the LLM instead.

And as much as it sucks, you live and learn. People have been accidentally wiping their drives for decades at this point, I’ve probably done it too before when I was younger. If anything software was better about preventing this sort of thing in the 2010s, the 2000s were wild lol they gave you access to buttons that could reformat everything without even a confirmation button or an explanation of what the button was for.

Munrock ☭@lemmygrad.ml · 2 months ago

This is why I always branch a repo before letting AI anywhere near it. Sometimes you get fantastic results (like a day’s worth of code monkey grind in 5 mins) and sometimes the results are just preposterous. You always want to be able to review the results before anything touches main.

CriticalResist8@lemmygrad.ml · edit-2 2 months ago

I think this is the way yeah. For extra protection you can also do physical backups of the project (copy pastes) at various points, because even if the LLM doesn’t know you have gitted your project, it may still run the command. The newer deepseek is much more biased towards doing this, I wrote “commit your findings to a file” and it wanted to git it. There’s always the possibility it can squash all commits or erase them (much like someone can write rm -rf in any terminal!) but this is why we invented prod/dev redundancy and RAID backups lol. You don’t necessarily have to be this paranoid when using agentic AI but it’s an extra security and some peace of mind.

I also checked and crush is completely able to write and run bash commands (incl. rm) on files not in the folder you opened it on. Definitely something to look into, I’ll check if there’s a way to containerize it better and make a post for [email protected]. Yog and I brainstormed the idea of making another linux user just for crush, then putting your main account in that user group along with the crush user, but not the crush user in your main account’s group. That way it only has perms to act on the files belonging to crush/crush, though it can still try to run any bash command it wants. And you would also have access to crush’s files with your main account so it’s more convenient. But I don’t know much yet about how linux users work, I’ll have to look into it and will make a post about it if I find something.

I think crush also has config files you can edit to blacklist or auto deny some commands.