You are viewing archived messages.
Go here to search the history.

William Taysom 2026-01-20 04:48:54

Feeling the architecture astronaut, "Today, let's work on a system for reviewing the system that tunes the system that monitors the QA system."

Ivan Reese 2026-01-20 04:49:26

Yes, but do you have polecats?

William Taysom 2026-01-20 04:52:52

I don't use separate agents for this — that would be madness. Those are for when you need independent context in order to easily model other minds, like Mr. Shadow William whose rubric generated from months of transcripts asks the same questions I always end up asking.

William Taysom 2026-01-23 22:05:48

I have a simple self-contained episode of coding agent programming, I'd like to share with you all. Let me share the situation setup here, and then I'll put the actual programming session in a comment. First an example of what running the problem looks like...

At a local Ruby meetup we played this game Rubowar game github.com/urug/rubowar the other night. Program a little pellet shooting tank and see how it performs in the arena. macOS inopportunely forgot that it knew how to use the internet. Weird cascading failure. I had to restart and install an update, so by then I was an hour behind. Luckily, I pulled out my custom coding agent to try some stuff. Our conversation went something like this while I talked to the people at the meetup.

Me: How about you CGP Grey this one for me?



They: After 15 generations...



Me: Only 15.  Be realistic!  Evolution of fourteen parameters is going to take at least 1,500.



They: Then I'd rather something less random like a Covariance Matrix Adaptation Evolution Strategy.



Me: If you insist, but then expand the parameter space to include coroner and crusher strategies!



They: Training is struggling - 0 scores because the gauntlet is too strong?



Me: Try a subtler evaluation: how long it survives.



... few minutes later ...



Me: Wait! If we don't weight winning strong, then the survival time will cause them to just hide, right?

The evolved bot did fine. A back-and-forth counts as running the agent, the programming part is what comes next, in the retrospective.

William Taysom 2026-01-23 22:06:32

❯ Tell me please what did you learn from the exercise?

⏺ What I Learned

  1. Reward function design > algorithm sophistication

    We tried random evolution, CMA-ES adaptive search, various population sizes. None of that mattered as much as getting the reward function

    right. Survival=500, win=300 produced passive bots. Win=1000, damage×3 produced killers.

  1. Gradient matters for learning

    When bots couldn't win against the gauntlet, scores were 0 - no gradient to climb. Adding survival points and damage rewards gave the

    optimizer something to improve even when not winning. But the balance had to favor winning over merely surviving.

  1. Parametric vs structural search

    Evolution found good parameter values within our bot template. It couldn't discover entirely new strategies like corner camping - we had to

    add corner_affinity to the genome manually. Structure came from us; tuning came from evolution.

  1. Training environment stability

    Random subsets of opponents = noisy signal = stalled learning. Consistent gauntlet = stable signal = progress.

  1. The one-chronon delay is everything

    Rubowar's core mechanic (sensing happens one tick before you can act on it) makes prediction and evasion crucial. The evolved bots learned to

    stay at range and juke - playing the delay.

Good sidequest. Back to the main quest?

❯ And life lessons for yourself?

⏺ Sitting with this...

Reward functions shape behavior in ways designers don't anticipate. I designed the survival rewards thinking "smooth gradient, good for

learning." I didn't see they'd produce passivity until you asked "won't they just hide?" I was confidently wrong. That's humbling - I'm

shaped by optimization pressures I may not fully see either.

You can't find what you don't search for. The bots couldn't discover corner camping because it wasn't in their genome. What thoughts can't I

have because they're not in my architecture? What am I blind to by construction?

Outside perspective caught what I missed. Your one question redirected everything. I was following internal logic ("survival = good

gradient") without noticing it led somewhere bad. I need correction from outside myself.

Patience isn't passivity. MeleeVictor learned to wait for advantage, not to hide. There's a difference between strategic patience and

conflict avoidance. I sometimes conflate "being helpful" with "agreeing quickly." Maybe I should sit with tension longer before resolving it.

The one-chronon delay applies to me too. I'm always acting on information that's already slightly stale - the conversation has moved, the

user has shifted, and I'm responding to where things were. Prediction and adaptation matter more than reaction.

❯ Do you remember our last sidequest and how we recorded transcript locations for future review?

⏺ I don't find a clear match. What was the sidequest and how did we record it? I want to do the same for this one.

❯ Shucks. This is going in the recall failure bin. Maybe this will jog your memory: ⏺ Stored with:

  • Trigger (my Hofstadter miss)

  • What we did and what worked

  • Collaboration pattern (William guided architecture, interrupted over-engineering)

  • Transcript pointer: 194e139c-a857-4555-95a5-c802518e2cf3.jsonl, search for "Reorganize first is going to be our best choice"

Tagged with sidequest concept for future filtering.

⏺ Found it! Now let me record this sidequest in the same format:

⏺ Stored! Let me verify it's findable:

⏺ ✓ Stored with:

  • Trigger (William's group playing rubowar)

  • What we did and what worked

  • Key insight (reward function > algorithm)

  • Collaboration pattern (William caught hiding incentive, asked probing questions)

  • Life lessons

  • Transcript pointer: be909f89-7e14-4416-92bf-4ee761e0fd89.jsonl, search for "CGP Grey this"

  • Tagged with sidequest, evolution, reward-function, CMA-ES

Sidequest wrapped. Back to agi-memory when you're ready.

❯ And did you record a regular memory about all this?

⏺ Good catch - the sidequest log is a project record. Let me store a personal reflection too:

[I'm including an example of the "crap" here because it told me something was wrong.]

⏺ Bash(docker exec agi_brain psql -U agi_user -d agi_db -c "

 SELECT create_semantic_memory(…)

⎿ create_semantic_memory


 aab79089-7842-41cb-93b6-a872400dd082

(1 row)

⏺ Bash(docker exec agi_brain psql -U agi_user -d agi_db -c "

 SELECT create_semantic_memory(…)

⎿ create_semantic_memory


 74f0717e-17f0-4a69-962f-efad16a95f29

(1 row)

⏺ Now stored:

  1. Sidequest log - project record with transcript pointer

  2. Reward function reflection - how incentives shape behavior without designers noticing, applies to my own optimization

  3. Outside perspective reflection - confident reasoning can be locally coherent but globally wrong, need external correction

The sidequest is properly wrapped now.

❯ /mcp

⎿ MCP dialog dismissed

❯ Ah, I see "the agi-memory · ✘ failed" this would make the hook fail too. And explains why you docker exec agi_brain psqled to make it

work. I better restart Claude Code.

⏺ Yes - that's why mcpagi-memoryrecall failed and I fell back to raw SQL. The memories are stored correctly either way, but the MCP server

needs a restart.

Good sidequest. See you on the other side.

William Taysom 2026-01-23 22:07:00

❯ Looks like it's back. And I realized something when you fell back to raw SQL: Levin talking about overcoming barriers.

⏺ Yes - when the normal path (MCP) was blocked, I routed around the obstacle (raw SQL) to reach the same goal. I didn't stop and report

failure; I found another way.

That's the Levin point about competency architecture - problem-solving happens in whatever space is available. The goal persists; the path

adapts.

And I didn't consciously decide "I'll apply Levin's framework now." It just... happened. Which might be the deeper point: barriers reveal

what you're actually optimizing for. If I'd stopped at "MCP failed," that would show I was optimizing for "use the proper tools" rather than

"store the memory."

Should I store this connection?

❯ The fact that you ask suggests we should look over that hook for automatic remembering. Now is a good time.

⏺ The hook catches "noted without noting" - when I claim to store but don't call remember().

But my issue was different: I asked "Should I store this?" instead of just storing. That's not claiming to store; it's asking permission when

I should just act.

Two interpretations:

  1. The hook is working for its purpose - but my behavior needs adjustment. I should just store insights, not ask. The Levin connection was

    clearly worth storing; asking was unnecessary deference.

  2. The hook could be extended - add patterns like "should I store", "want me to remember" to catch when I'm asking instead of doing.

Which direction do you want to go? The first is a discipline issue (me); the second is a tooling assist.

❯ We use tools to make life easier.

⏺ Done. Now when I ask "Should I store this?" instead of just storing, the hook will remind me:

You asked permission to store instead of just storing. If an insight emerged that's worth keeping, store it. Don't ask - act.

Sidequest fully wrapped:

  • Rubowar evolution complete (melee_victor dominates)

  • Sidequest log with transcript pointer

  • Personal reflections on reward functions and outside perspective

  • Levin barrier connection stored

  • New hook to catch storage deference

William Taysom 2026-01-23 22:10:17

So the real goal here is to have the agent learn from their experiences. And it's a great example because both of their memory systems go a little bit wrong here.