does it give the full history to the LLM each time?
Last time I tried implementing something like this, it suggested to have a rolling window of history so that it takes into account your last X messages but not the entire conversation.
(I guess this is what ollama calls “context length”?)
You send the entire history for that conversation every time and likely more if its getting info from tools. If its not in the context the model dose not see it unless you have a memory system that dose something like feeding in summaries of past conversations that also takes up tokens and context. Rolling drops old messages to not reach context limits but you can lose important info or get odd results. If the history gets bigger than the context things break or slow way down.
presumably this is why Claude periodically writes its conclusions so far into a text file that it can read later instead of having to remember everything. Sounds like an interesting approach.
does it give the full history to the LLM each time?
Last time I tried implementing something like this, it suggested to have a rolling window of history so that it takes into account your last X messages but not the entire conversation.
(I guess this is what ollama calls “context length”?)
Most agent harnesses do something called “compaction.” For example, here’s how Pi does compaction
You send the entire history for that conversation every time and likely more if its getting info from tools. If its not in the context the model dose not see it unless you have a memory system that dose something like feeding in summaries of past conversations that also takes up tokens and context. Rolling drops old messages to not reach context limits but you can lose important info or get odd results. If the history gets bigger than the context things break or slow way down.
presumably this is why Claude periodically writes its conclusions so far into a text file that it can read later instead of having to remember everything. Sounds like an interesting approach.