Folks,
I’m setting up Hermes Agent on my Mac with Ollama hosting a local model. But I’m on the fence on whether I should go with Hermes or OpenClaw. Hermes makes some pretty bold claims about “growing with you” and “self improvement”.
Anyone have any insight into whether it’s as good as promised?
IMO it’s too eager to change its own code.
On multiple occasions I’ve just asked questions like, “what checks do you do on work from delegated workers,” and it’ll decide to rewrite some of its code and break itself.
it just happened to me! I’ve been using telegram to chat with it all this time. Finally downloaded their desktop app and tried it out. It worked for one day and now it’s throwing a “No LLM provider configured” error for Ollama local after an update.
I asked the telegram chat to figure this out and it’s currently reading it’s own internal code to “figure out when No LLM provider configured is called”. 😂😂
😂😂😂😂
hermes has some built in agent orchestration layer which seems cool on paper. never tried it. other small nice things that are unique to hermes which other agents really don’t have, which I have actually tried, include: switchable agent personalities, pretty decent thread suspension mechanism, decent webhook subscriptions, and human delay mode. The biggest thing, at least in my opinion, is certainly: Self-improving skills with patching - with an entire slew of caveats… In my opinion, this is useful but I strongly recommend using a manual review process. Otherwise, the agent has the potential to “teach itself wrong”. Human review.
Yeah I’ve been trying for hours to get it to make a simple token counting skill. It keeps getting it wrong.
It has webhook subscriptions? That’s cool I guess. How do you use the switchable agent personalities?
you can try to ask the agent to talk like a pirate for example
I have only very recently set it up and I’m running it with a cloud model (Gemini 3.5 Flash) but I do think its marquee feature (learning skills based on what you ask it to do and then being able to use those in the future) is pretty valuable. I haven’t used OpenClaw but my understanding is it has a massive library of skills (essentially repeatable functions or tasks) while Hermes has a somewhat smaller library of shared skills but crucially you can teach it your own skills by just asking it to do something and giving it notes on how to do it.
What’s the secret sauce? How it writes the skills? When it decides to do it within a conversation? Sounds like it creates skill files (markdown?) and saves them to a library, then an agent pulls in relevant skills during future prompts?
I can see this being obsolete very soon. Couldn’t you make Codex or Code do this for you. Build your own agent/s even. Maybe it’ll be included in future releases of major commercial LLMs.
Thanks for your reply. Yeah, that seems to be a powerful feature, specially if it’s actually able to make skills easily.
I’ve been using it with Opencode Go, Ollama, and Claude Code (it can delegate tasks to models through all those, so you can have Claude plan and Deepseek Flash build); I really like it.
I ran into that problem with the agent reporting that subagents succeeded, or work had been done, where it hadn’t (“I said I tested that, but I didn’t. That’s on me. Won’t happen again”), so I built a self-check enforcement system for it. You or your agent can set up the system by reading this: https://github.com/obelisk-complex/hermes-agent/blob/main/self-check-enforcement-system-v15.md
It includes the source patch which adds a hook
on_output; this allows you to intercept text sent directly from the LLM to the user, which in vanilla is unblockable. So, this system ensures that if something remains unfinished, the LLM can’t say it’s done; it has to acknowledge what it didn’t do before it can send you a message to close the conversation loop. I’ve built the fork to automatically merge upstream changes around this patch daily at 0400 Pacific time, so I should stay up to date (ish).I also put in a feature request to get this added upstream. Feature request here: https://github.com/NousResearch/hermes-agent/issues/45881
what’s opencode go and how is it different than opencode?
I’ll check out the subagent reporting issue. I did run into it with Gemma-4 but Qwen3.5 and 3.6 both work well in completing tasks. Local models aren’t perfect, but they’re damn close!
The harness helps a lot even with local models. In fact, I just found this this morning and cherrypicked it: https://github.com/DietrichGebert/ponytail
Recommend doing the same, and for superpowers if you don’t have 'em already: https://github.com/obra/superpowers
Opencode Go is the $10/month cloud model subscription from the same group maintaining the OpenCode software. Opencode Zen is a pay-as-you-go version which gives you access to Claude models as well. Keeping pay-as-you-go to subagents only (e.g. telling your agent to launch an opus subagent via your opencode zen key) is actually surprisingly economical - when you’re not going turn after turn with hundreds of thousands of tokens of context, claude is pretty reasonably priced.
What I’m doing is spreading out my usage over multiple cheap subscriptions, and augmenting with the occasional pay-as-you-go frontier agent, to get quality in line with what you get out of Claude, at usage that would require the $200/month level, for a lot less money than that.
I’m a little surprised to hear you say PAYG for Opus sub agents is economical. Maybe the superpowers and ponytail really do have a massive impact on things. I’ll send these to three people I know building heavy production apps right now. And integrate them into my own Hermes setup.
Thank you for the recommendations!
Any time, I hope they’re helpful! (☞゚ヮ゚)☞
I’m a little surprised to hear you say PAYG for Opus sub agents is economical
I did say it was surprising! 😂 To give you an idea what I mean by “economical”, it’s never more than a few bucks a day, even on days of heavy use and development with “loop until clean” instructions on QA (for which I use Opus). I accidentally blew through my opencode go quota really early in the first month, so I ended up on PAYG; here’s the usage graph:

And here’s the numbers breakdown for the highest day (I was evaluating GLM5.1 for general tasks - don’t use it for that, it’s really token hungry)

That includes a lot of experimentation too while I figured out which models were best for what. I hid Fable because it crushed the rest of the table - really expensive, but worth it for one-shotting very long tasks on the Anthropic subscription is what I found.
Damn those are good numbers!


