Talk to your context

7 min read

This is for the polymaths. Especially the shy ones.

I LOVE using realtime voice AI for learning out loud. With voice AI like ChatGPT's Advanced Voice Mode or Gemini Live, it's easy to casually explore any topic within the corpus of human knowledge — which is incredible. However, each of those chats are sealed inside their own snowglobe: no real inputs, no real outputs, just a transcript.

When I'm brainstorming I want to come away with something I can build on. A product spec my coding harness can read. A research report that was generated during the conversation, and then refined on-the-spot through discussion, and saved into my knowledge base for the next call to pick up. Perhaps a vibe-coded prototype, or simply a drafted email. Those connections into the real world, and access to a curated knowledge-base, are where the existing tools fall apart.

So I added the tools and the context. I've been talking to it nonstop for a month. It's called Hermano — because I use it with Hermes, my agent. Hermano is the little brother, the voice that talks to him.

The impossibly good brainstorming partner

Ideas arrive on the trail, in the shower, behind the wheel. Anywhere I don't have a keyboard and the patience to type them out before they dissolve.

What if — wherever you happened to be — you had a brainstorming partner who already knew the work? Knew the open loops, the calendar, the people, the half-formed argument you've been losing sleep over? And who would also run ahead and execute on whatever you decided, while you kept walking? That's what Hermano does.

And here's the kicker: Hermano inherits all your agent's tools. A recursive knowledge base (LLM-wiki), a fancy new memory system, a skill that drafts client emails the way you'd write them — every upgrade you make at the keyboard, your voice chat gets too. The voice is a thin layer. The substance is the agent underneath.

It's clobbering time... how I use Hermano

The Hermano project was originally called "Talk to your context" — since that's exactly what it enables.

To understand something is to demystify it. So I clobber Hermano with my most embarrassing ignorance. Shameless run-on sentences, probing the mushy swamp at the core of my understanding — why electrical sockets need a ground, the etymology of a word that's been bugging me, running custom skills in my coding harness — like gstack, to examine the wedge in some product idea, or to get a sharply skeptical perspective for debate prep.

The magical part here is that, as a self-respecting person with a reputation for intelligence, I would never permit myself to be this dumb in front of a fellow human being. Personal AI agents have infinite patience and never judge a dumb question. I shake the understanding out of it until it meets me where I am, and I come away having actually learned something — my way.

This was a real blocker for me — as a product builder, as a curious person, as a professional. I worked around it for years. Not anymore.

What didn't exist when I started

I started building a couple of months ago. You could absolutely build something like this — the infra is there. Vapi and Retell as hosted platforms, LiveKit Agents and Pipecat as frameworks for assembling voice agents on your own terms. What none of them shipped in April 2026 was an opinionated reference for a personal-context voice agent: a structured dossier of standing context, a narrow-toolkit pattern hitting your own backends, a post-call extraction loop so each conversation compounds into the next. The plumbing existed. The opinionated artifact didn't. So I built it.

The architecture

The split-brain is the whole thing:

voice ──► Realtime (fast brain + dossier)
          ├── narrow tools (100–500 ms):
          │     calendar, email, notes, cards
          └── deep_research (30 s – 10+ min):
                full agent + machine actions

each call ──► extractor ──► dossier ──► warm

The realtime model handles the conversation. The agent handles the thinking. A dossier of standing context — open loops, today's calendar, relevant stakeholders, last call's working state — is baked into the session at mint, so the model is opinionated about my world from second one, not generic-until-deep-dive. Narrow tools answer the known shapes in well under a second. deep_research reaches a local agent loop with full memory, skills, and tool surface for the novel reasoning and the side-effecting work. After every call, an extractor folds transcripts, decisions, and learnings back into the dossier. Calls compound across weeks.

Yeah, it's slow. Deep context and agentic work take time

Look... Look. OK. There's lag. Real lag. A deep question takes anywhere from thirty seconds to ten minutes — depending on what the agent is actually doing out there. You'll be sure it's stuck — frozen out there in the agentic underbrush — and then it comes back with something that makes you sit down.

The lag turns out to be a feature too: I think during the wait, queue the next prompt before the answer lands, interrupt and redirect. And the voice frees me from the keyboard. I'm walking, driving, pacing the living room while the agent does the work.

Two options

Garry Tan, president of Y Combinator, shipped his version the same week — Mars + Venus inside his agent harness called gbrain. Mars is the introspective thought-partner; Venus is the logistics EA. Same category as Hermano, different design instinct. Here's a comparison of the two tools at the time of their release. I'm sure they'll evolve. I already have updates to Hermano in mind based on how Garry built Mars and Venus.

HermanoMars + Venus (gbrain)
ShapeStandalone Python sidecar; wires into your agent over HTTPSkillpack you install into your gbrain agent repo
Voice personalityOne voice, opinionated about your contextTwo personas — Mars (thought partner), Venus (logistics)
Context loadingDossier baked at session mint; post-call extractorOperator implements a context-builder; static persona prompts
StackYou bring your own agentYou're already running gbrain

A note on the code

One housekeeping note, at the end where it belongs: Hermano is open source under the MIT license. If you want to poke around, fork it, or wire the sidecar into your own agent, the repo is below.

Technology stack links

Tools and repos referenced in this post.

Subscribe for thoughtful, cutting-edge rants from Gary

and the opportunity to beta-test new finance superpowers and parenting superpowers