Everyone’s getting their knickers in a twist over nothing here.
Of course an AI can track time, if it’s given access to a timer MCP server.
Can we track time without tools, just in our heads? Certainly not very accurately. We can, however, track it reasonably accurately if given access to a quartz stop watch (typically +/-15 s/year)
A language model is based around language and reasoning by words/symbols. It’s not a surprise it doesn’t have timing capability.
What Altman SHOULD be embarrassed about is that the model lies about its capabilities. That implies that the context is still not right - it should be adequately trained and given context to prevent the lying. That implies a much more worrying issue - and something that Anthropic handles far better, IMHO (when asked if it can track time, if says “no, not on my own”, and then proceeds to build a JavaScript timer that it offers up to track time).
I don’t use them but I follow the news about them loosely. The reason for this is epistemic humility. Claude has a pretty good idea of what its capabilities are and where the ceiling is. Chatgpt has no clue what its limits are so it believes it can do everything. Basically chatgpt has a lot of info and no idea where the gaps live and Claude has a fair idea when to search or use some external function to handle something. Gemini has less than Claude but more than chatgpt. Grok has little to no epistemic humility, but it did manage to accurately portray Musk as a world champion piss drinker, something none of the others were able to do.
I say that, but it’s been a few months since I looked. That could have changed because shit moves fast. By the looks of what it’s trying to do with the timer chatgpt has less than it used to. Possibly because of the way the model is trained to be helpful and confident.
well messages are clearly not stateless (otherwise there would be no context), but in general yes the issue is not the lack of capability, it’s the complete unawareness of it and the insistence on lying about it.
THIS time it is ridiculously obvious but what if it does this after checking a very large data set where there would be no (good) way to verify its answer?
This is why Ai, in it’s current form, is basically useless. If you cannot trust it NOT to lie, and must/should verify everything yourself, you might as well skip the useless step of asking
The problem is that the market thinks there’s a billion places to use. And right now we’re funding 999 million places that shouldn’t be using AI but have the funding to do that dumb thing so we can figure the one million places where it makes fantastic sense.
I get your point and yes, I was exaggerating for effect but… are there a million places to use Ai where you can blindly trust its output/work?
I do not really think it’s completely useless; however, I do think the uses are very very limited (compared to the hype) and the cost of running these models for the benefit they provide makes them even less practical
It could simply save a timestamp of the “begin timer” message and compare it to the timestamp of the “end” message. It’s not that complicated, and writing a script and executing it is overkill… It just needs access to a calculator skill.
Yes, it handles it better, but it’s still a dumb approach and waste of energy.
Everyone’s getting their knickers in a twist over nothing here.
Of course an AI can track time, if it’s given access to a timer MCP server.
Can we track time without tools, just in our heads? Certainly not very accurately. We can, however, track it reasonably accurately if given access to a quartz stop watch (typically +/-15 s/year)
A language model is based around language and reasoning by words/symbols. It’s not a surprise it doesn’t have timing capability.
What Altman SHOULD be embarrassed about is that the model lies about its capabilities. That implies that the context is still not right - it should be adequately trained and given context to prevent the lying. That implies a much more worrying issue - and something that Anthropic handles far better, IMHO (when asked if it can track time, if says “no, not on my own”, and then proceeds to build a JavaScript timer that it offers up to track time).
I don’t use them but I follow the news about them loosely. The reason for this is epistemic humility. Claude has a pretty good idea of what its capabilities are and where the ceiling is. Chatgpt has no clue what its limits are so it believes it can do everything. Basically chatgpt has a lot of info and no idea where the gaps live and Claude has a fair idea when to search or use some external function to handle something. Gemini has less than Claude but more than chatgpt. Grok has little to no epistemic humility, but it did manage to accurately portray Musk as a world champion piss drinker, something none of the others were able to do.
I say that, but it’s been a few months since I looked. That could have changed because shit moves fast. By the looks of what it’s trying to do with the timer chatgpt has less than it used to. Possibly because of the way the model is trained to be helpful and confident.
well messages are clearly not stateless (otherwise there would be no context), but in general yes the issue is not the lack of capability, it’s the complete unawareness of it and the insistence on lying about it.
THIS time it is ridiculously obvious but what if it does this after checking a very large data set where there would be no (good) way to verify its answer?
This is why Ai, in it’s current form, is basically useless. If you cannot trust it NOT to lie, and must/should verify everything yourself, you might as well skip the useless step of asking
To call AI useless is quite a strong statement.
There’s a million places to use it!
The problem is that the market thinks there’s a billion places to use. And right now we’re funding 999 million places that shouldn’t be using AI but have the funding to do that dumb thing so we can figure the one million places where it makes fantastic sense.
I get your point and yes, I was exaggerating for effect but… are there a million places to use Ai where you can blindly trust its output/work?
I do not really think it’s completely useless; however, I do think the uses are very very limited (compared to the hype) and the cost of running these models for the benefit they provide makes them even less practical
It could simply save a timestamp of the “begin timer” message and compare it to the timestamp of the “end” message. It’s not that complicated, and writing a script and executing it is overkill… It just needs access to a calculator skill.
Yes, it handles it better, but it’s still a dumb approach and waste of energy.
Aren’t we saying exactly the same? Give it an MCP server or a native skill that CAN track time.