AI Strategy — David Hsaiou

About me

This document captures what I am doing in agentic development — designing decision logic, writing prompts, wiring tool integrations, and building the orchestration layer that holds multi-agent pipelines together. The sections below walk through when I choose to build an agent, how I build my agentic development, and what I have learned along the way.

Concept links

davidhsaiou.com/agentic Conceptual overview of agentic systems.
davidhsaiou.com/future Opinionated forward-looking design vision.
doc.davidhsaiou.com/docs/ai-development Concrete development reference: architecture, MCP patterns, agent taxonomy.

When I build an agent

Six trigger conditions for choosing an agent over a script, scheduled job, or human.

1

Routine work needing only simple judgment, but with diverse and dynamically changing conditions

The rule set is too volatile to hard-code, and the cost of keeping a static map current exceeds the cost of letting a model handle it.

Example: the issue-resolver routing chain — Lane field → Git Repo map → keyword sniffing → self-implement. Hard-coding exhaustive rules would break on the next new repo.

2

Repetitive tasks

High-frequency, low-variance work where a human would follow the same procedure every time.

Example: pr-reviewer runs the same sequence on every PR — diff, changelog gate, acceptance criteria, rationale, tick checkboxes, merge-or-request — without skipping a step from fatigue.

3

Complex data lookup and synthesis (knowledge base or search assistant)

Assembling coherent answers from heterogeneous sources at query time.

Example: dispatcher cross-references YouTrack, existing worktrees on the filesystem, and the workspace CLAUDE.md capability registry before spawning a single sub-agent.

4

Semantic judgment

Decisions that depend on meaning rather than keyword matching.

Example: the pr-reviewer's acceptance-criteria step maps each criterion to the diff and judges whether the implementation satisfies it in spirit. Regex cannot do this.

5

Conventional procedures — lock down a specific action sequence to lower the operational bar

Locked-down execution of a known, agreed action sequence — user states intent, agent runs the procedure with zero deviation.

Example: /release-cut compresses the entire release ritual (semver bump, changelog fold, commit, tag, push, PR) into one command.

6

Workflow — lock down the workflow, owned and constrained by AI, executed through conversation

The user's interface becomes the issue tracker and a chat channel, not a sequence of commands.

Example: the pipeline — once an issue is assigned to Claude_Code and the pipeline is invoked, it owns the flow until Stage=Done. The user only touches the issue tracker and Discord.

How I build my agentic development

Building from zero

Early on, I wrote every skill and agent definition from 0 to 1 by hand. Not because upstream libraries are bad, but because writing the prompt myself was the only way to actually feel how the model responds — where it drifts, which line is too loose, which guard rule is missing. Once I could see the failure firsthand I knew exactly what to add or tighten: a clearer instruction, a scope boundary, an extra guard condition. That from-zero phase built the failure intuition that is the prerequisite for scaling judgment (see "Building agents with agents" below).

Prompt engineering

Markdown agent definitions with YAML frontmatter (tools, model, isolation). Three things matter: explicit workflow (numbered steps, each with its own preconditions and actions, so the model never has to invent the order); scope (every prompt names what the agent must NOT do, otherwise models generalise by analogy and violate system invariants); and fail-stop (when a precondition fails, the agent halts cleanly and reports rather than improvising — e.g. the Open Questions entry-gate in issue-resolver stops on ambiguous requirements instead of guessing forward).

RAG

When the issue-creater (or any planning-stage agent) needs context to scope a task, it reads from four sources at runtime — CLAUDE.md (conventions + capability registry), YouTrack (related issues, parent epics), web search (external docs, prior-art), and the target git repo (existing code + conventions). No vector store, no pre-indexing; each source is read fresh when the agent needs it. The model's job is picking which fragment matters for the task in front of it.

Tool Use

Almost entirely tool-driven: MCP servers (YouTrack, Gitea, Discord, Kubernetes) plus Bash/Read/Edit/Write/Grep/Glob. MCP is the current choice for fast integration — most skills wrap a single MCP call, a few (e.g. release-cut, resolve-merge-conflict) orchestrate several MCP / git steps, but the tool surface stays auditable in either case and the model isn't reasoning about shell escaping or process state. The general principle: each tool boundary should expose a small, well-typed contract so the orchestrating layer never needs to know the backend's internal state.

Agent architecture

Layered orchestration — single responsibility per layer, explicit handoff (PR number + repo passed down — sub-agents cannot discover them). Specialist coders (csharp-backend-coder, blazor-frontend-coder, doc-writer, ui-designer) sit below issue-resolver and focus on domain logic; the resolver owns branches, commits, and PR ops.

dispatcher
  └── pipeline (per issue, concurrent)
        ├── issue-resolver  (implementation + PR open)
        ├── pr-reviewer     (review + merge or request changes)
        └── pr-resolver     (apply feedback, re-label Ready)

Multi-agent collaboration

The key decision is isolation mode: every agent runs in its own git worktree (isolation: worktree is the plugin default), so concurrent pipelines don't collide. Shared invariants live in shared gates — gitignore-guard runs before every commit, and changelog-guard runs before every PR; both issue-resolver and pr-resolver invoke them. Write the rule once, call it from every agent that needs it.

My viewpoint on agent development

1

Today's models are not strong enough on their own. Every agent I have built required significant tuning — prompt iteration, scope adjustment, guard rules — before it behaved reliably.

2

Boundary-setting is the single most important discipline. A model knows a hundred ways to solve a problem; we usually want only one or two. Without explicit scope, the model picks the wrong one — not because it is weak, but because it has no signal telling it which choice is correct in this system.

3

Building agents with agents is the direction that matters most right now. Once the from-zero phase has built the failure intuition, the next step is scaling that judgment — having an agent generate the next agent's prompt, CLAUDE.md, or skill is faster than hand-writing and parallelisable: draft many candidate agents at once, run them against the same task, and keep the best one as the final version.

4

AI output is cheap; for a human it is a single sentence of intent. When a generation is wrong, the cost of asking again is trivially low. This changes how to think about quality control: don't try to make every generation perfect — make the iteration cycle fast.

5

Context windows and focus drift are real constraints. Models lose track of the goal as context grows. The system around the model has to carry the weight: decompose oversized tasks into small executable items, hand each one to a fresh agent run, then assemble the results. Reliability lives in the orchestration layer as much as in the prompt.

Stack and tooling

Claude Code (Sonnet 4.6 / Opus 4.7) Primary agent runtime; Opus reserved for pr-reviewer where review quality dominates.
MCP servers YouTrack (issues), Gitea (PRs), Discord (notifications), Kubernetes (cluster ops).
Git worktrees Per-agent isolation for concurrent execution.
Infra Self-hosted Gitea (git host) + YouTrack (issue tracker) + Discord bot (notifications); Kubernetes cluster running a .NET Aspire-orchestrated multi-service platform; Cloudflare on the edge / CDN.

AI Assisted

This document was drafted with Claude Code (Sonnet 4.6 / Opus 4.7) and reviewed line-by-line by me.