There is a limit to how long an AI company can keep subsidizing the tokens, eventually the financial ruin ensues. Every chat request that goes to the LLM has an actual physical compute and RAM cost, which scales to hilarious levels as you keep chatting in the same thread and the context size widens. It scales to astronomical levels when the query requires high-level analytical or reasoning skills like thinking..., pondering..., bloviating..., etc. - which is exactly what enterprise users intend on doing. The Uber story made it quite clear - even some of the big techs don’t have the stomach for that kind of unlimited resource drain.
There is a limit to how long an AI company can keep subsidizing the tokens, eventually the financial ruin ensues. Every chat request that goes to the LLM has an actual physical compute and RAM cost, which scales to hilarious levels as you keep chatting in the same thread and the context size widens. It scales to astronomical levels when the query requires high-level analytical or reasoning skills like
thinking...,pondering...,bloviating..., etc. - which is exactly what enterprise users intend on doing. The Uber story made it quite clear - even some of the big techs don’t have the stomach for that kind of unlimited resource drain.