Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

Pro@programming.dev · 9 hours ago

Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

dhork@lemmy.world · edit-2 5 hours ago

It is an interesting article, even if it’s conclusions are entirely too rosy. The “storefront” was a single vending machine, and the bot was instructed to interact with Anthropic employees (with an hourly cost attached) to do all physical interactions. While the bot did a decent job managing the stock most of the time, it made a lot of bad decisions based on trying to be too helpful to it’s customers. It also frequently hallucinated, with some hilarious results I wont spoil here. But as anyone who owns a small business knows, one bad decision could put it under, so saying that an AI can manage a vending machine well “most of the time” is equivalent to saying it cant do the job at all.

Their conclusion is that with a bit more work, Claude might be able to perform as a middle-manager. To me, that says more about how useless middle-management is than how capable their AI is.

Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

Project Vend: Can Claude run a small shop? (And why does that matter?)