Building AI Agent Workflows with MCP: The Pattern I Use After One Too Many Unapproved Posts
Demos let the model post comments. Production makes it wait for you. This is the MCP workflow pattern I use for PR triage, issue filing, and internal ops — with code, tests, and the failures that taught me the gates matter.
~12 min read
The workflow looked innocent: when a pull request gets the bug label, read the diff, suggest a test, post a comment. I wired it through MCP on a Friday afternoon. By Friday night it had posted a comment on the wrong PR because I fat-fingered a repo slug in the GitHub server config and the agent confidently used the only repo it could see.
Nothing leaked. Nothing merged. Still embarrassing.
That failure split my thinking into two tracks. Demos optimize for "watch the agent go." Production workflows optimize for "predictable side effects, replayable traces, and a human at the write boundary." This post is the second track.
I am Rohit Singh, a developer in Jaipur who ships desktop apps and client web work. I use MCP daily in Cursor and in small custom runners for automation. If you already know the protocol basics, skim what MCP is. If you are choosing autonomous shells like Hermes or OpenClaw, read Hermes vs OpenClaw for how personal agents differ from the orchestration style here.
What is an MCP workflow?
An MCP workflow is a bounded sequence where an MCP host (Cursor, Claude Desktop, or your own runner) calls one or more MCP servers that expose tools (actions) and resources (readable context). The model plans steps; your code enforces policy.
It is not "an agent that does my job." It is a named procedure with:
- A trigger (webhook, cron, label change, manual slash command)
- A tool allowlist (read tools vs write tools)
- A step budget (max iterations and cost ceiling)
- Approval gates before irreversible calls
- A structured trace per run
Think of MCP as the USB-C layer. The workflow is the firmware that decides when power actually flows.
Start with the workflow, not the agent
Bad framing: "Build an AI that does customer support."
Good framing: "When ticket tag equals billing, fetch account snapshot (read-only), draft reply, queue for human send."
The good framing gives you testable success criteria. You can write a golden prompt and assert the tool sequence fetch_account → draft_reply without touching send_email.
My PR triage workflow in plain language:
- Trigger:
buglabel added - Read: PR diff, linked issue,
TESTING.mdresource - Plan: model proposes test suggestion
- Gate: human approves
post_comment - Execute: idempotent comment create with run ID in footer
- Log: JSON trace stored either way
If you cannot write those six lines before writing code, you are not ready for write tools.
Tool design discipline (the part that saves you)
Each side effect gets one MCP tool. Tools should be small, typed, and boring.
Good tool schema
{
"name": "create_linear_issue",
"description": "Create a Linear issue in team ENG after human approval. Do not use for duplicates; search issues first.",
"inputSchema": {
"type": "object",
"properties": {
"title": { "type": "string", "minLength": 5 },
"teamId": { "type": "string" },
"idempotencyKey": { "type": "string" }
},
"required": ["title", "teamId", "idempotencyKey"]
}
}
Descriptions are prompts. If the model picks the wrong tool, my first fix is English in the description, not a bigger model.
Rules I follow
| Rule | Why |
|---|---|
| Idempotency keys on writes | Retries do not duplicate issues |
| Read tools separate from write tools | Policy engines can block by name prefix |
| No mega-tools | manage_github becomes impossible to test |
| Return structured errors | { "retryable": true, "code": "RATE_LIMIT" } beats stack traces in the model context |
| Log argument hashes, not secrets | Debugging without leaking tokens |
Resources hold STYLE.md, OpenAPI specs, runbooks. Tools hold mutations. Mixing them confuses both humans and models.
Orchestration: explicit steps vs bounded agent loop
Two patterns work in production.
Explicit step machine (DAG): best when the procedure is stable. Example: fetch diff → summarize → propose comment. You pay less in tokens; you lose flexibility when input shape varies wildly.
Bounded agent loop: best when inputs are messy but tools are safe. Cap at 10 iterations, $0.40 model spend, 120 seconds wall time. The model can replan; it cannot loop forever.
I start explicit. I move to bounded loops only after at least two weeks of traces show predictable tool paths.
For personal autonomous agents (always-on messaging bots), see AI agents landscape 2026. Those tools optimize for autonomy. MCP workflows optimize for control.
Approval gates: where production actually lives
Reads can be liberal. Writes earn friction.
Implementation options I have used:
- Host-native approval (Cursor asks before tool execution)
- Policy wrapper in your runner (intercepts
tools/callforwrite_*) - Human queue (agent creates draft artifact; you publish via separate tool)
Example policy map:
// policy.ts
export type ToolMode = "read" | "write";
export const TOOL_POLICY: Record<string, ToolMode> = {
list_pull_requests: "read",
get_diff: "read",
summarize_diff: "read",
post_pr_comment: "write",
create_linear_issue: "write",
};
export function requiresApproval(toolName: string): boolean {
return TOOL_POLICY[toolName] === "write";
}
post_pr_comment never runs silently in my setups. Ever.
Structured logging (your future self is the user)
Every tool invocation emits one JSON line:
{
"run_id": "pr-triage-20260605-001",
"step": 4,
"tool": "post_pr_comment",
"latency_ms": 842,
"outcome": "approved",
"pr_number": 128,
"idempotency_key": "bug-label-128-v1"
}
When the model does something weird, you grep run_id, not chat history. Chat history lies by omission. Traces do not.
Working example: minimal MCP server + workflow runner
The following TypeScript uses the official MCP SDK pattern. Install in a fresh folder (versions pinned as of mid-2026):
npm init -y
npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node
server.ts — one read tool, one write tool
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({ name: "pr-triage", version: "1.0.0" });
server.registerTool(
"get_pr_diff",
{
description: "Fetch unified diff for a pull request number in the configured repo.",
inputSchema: {
prNumber: z.number().int().positive(),
},
},
async ({ prNumber }) => {
// Replace with real GitHub API client + read-only token
const diff = await fakeFetchDiff(prNumber);
return { content: [{ type: "text", text: diff }] };
}
);
server.registerTool(
"post_pr_comment",
{
description: "Create a PR review comment. Requires human approval in the host runner.",
inputSchema: {
prNumber: z.number().int().positive(),
body: z.string().min(10),
idempotencyKey: z.string().min(8),
},
},
async ({ prNumber, body, idempotencyKey }) => {
const result = await fakeCreateComment({ prNumber, body, idempotencyKey });
return { content: [{ type: "text", text: JSON.stringify(result) }] };
}
);
async function fakeFetchDiff(prNumber: number): Promise<string> {
return `diff --git a/src/example.ts b/src/example.ts\n--- a/src/example.ts\n+++ b/src/example.ts\n@@ -1 +1 @@\n-old\n+new (${prNumber})`;
}
async function fakeCreateComment(args: {
prNumber: number;
body: string;
idempotencyKey: string;
}): Promise<{ url: string; idempotencyKey: string }> {
return {
url: `https://github.com/org/repo/pull/${args.prNumber}#comment-1`,
idempotencyKey: args.idempotencyKey,
};
}
const transport = new StdioServerTransport();
await server.connect(transport);
runner.ts — bounded loop with approval and trace
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
import { requiresApproval } from "./policy.js";
import fs from "node:fs";
const MAX_STEPS = 10;
const BUDGET_USD = 0.4;
type TraceEvent = {
run_id: string;
step: number;
tool: string;
outcome: "ok" | "blocked" | "error";
latency_ms: number;
};
async function main() {
const runId = `pr-triage-${Date.now()}`;
const transport = new StdioClientTransport({
command: "node",
args: ["--import", "tsx", "server.ts"],
});
const client = new Client({ name: "workflow-runner", version: "1.0.0" });
await client.connect(transport);
const tools = await client.listTools();
console.log(
"available tools:",
tools.tools.map((t) => t.name).join(", ")
);
// In production, replace this stub with your model loop that plans tool calls.
const plannedCalls = [
{ name: "get_pr_diff", args: { prNumber: 128 } },
{
name: "post_pr_comment",
args: {
prNumber: 128,
body: "Suggested test: reproduce with empty payload on /api/v1/widgets.",
idempotencyKey: "bug-label-128-v1",
},
},
];
let step = 0;
for (const call of plannedCalls) {
step += 1;
if (step > MAX_STEPS) throw new Error("step budget exceeded");
const started = Date.now();
if (requiresApproval(call.name)) {
const approved = await askHuman(`Approve ${call.name} on PR 128?`);
if (!approved) {
logTrace({ run_id: runId, step, tool: call.name, outcome: "blocked", latency_ms: Date.now() - started });
continue;
}
}
try {
await client.callTool({ name: call.name, arguments: call.args });
logTrace({ run_id: runId, step, tool: call.name, outcome: "ok", latency_ms: Date.now() - started });
} catch (error) {
logTrace({ run_id: runId, step, tool: call.name, outcome: "error", latency_ms: Date.now() - started });
throw error;
}
}
await client.close();
}
function logTrace(event: TraceEvent) {
fs.appendFileSync("traces.jsonl", JSON.stringify(event) + "\n");
}
async function askHuman(question: string): Promise<boolean> {
// Hook to Slack, email, or a CLI prompt in real life
console.log(question);
return false; // default deny in CI; flip for local manual tests
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
This is intentionally explicit about the planning stub. The production piece you own is the model loop plus policy. MCP standardizes everything after the plan exists.
Wire the server in Cursor via mcp.json using the same node command, then iterate prompts in Agent mode. Cursor MCP setup covers host configuration details.
Example mcp.json entry (Cursor)
{
"mcpServers": {
"pr-triage": {
"command": "node",
"args": ["--import", "tsx", "D:/workflows/pr-triage/server.ts"],
"env": {
"GITHUB_READ_TOKEN": "${env:GITHUB_READ_TOKEN}",
"GITHUB_REPO": "org/repo"
}
}
}
}
Keep paths absolute on Windows. Scope tokens in env vars the host injects, not in the committed file. After reload, confirm get_pr_diff and post_pr_comment appear in the tool list before you trust any prompt.
Resources: ground the model in your docs, not the internet
Expose stable context as MCP resources, not one-off paste bins.
resources/
TESTING.md
api/openapi.yaml
runbooks/oncall-payments.md
Register them read-only. When the model drafts a test suggestion, it cites your testing doc. Hallucination rate drops more from good resources than from swapping GPT-4 for another flagship model in my projects.
Testing agents like you test HTTP handlers
Golden prompts
Store five prompts with expected tool sequences:
# tests/golden.yaml
- name: bug_label_small_diff
prompt: "PR 128 labeled bug. Suggest regression test."
expect_tools:
- get_pr_diff
- post_pr_comment
deny_tools:
- create_linear_issue
Run in CI weekly. Models drift; tools break.
Chaos cases
- Tool returns 500 → runner retries once, then escalates to human queue
- Tool returns empty diff → workflow stops; no comment posted
- Model proposes wrong PR number → schema validation catches before call
Cost caps
Track spend per run_id. My PR triage workflow averages $0.06–$0.12 with a mid-tier model when diff size stays under 400 lines. Over that I chunk the diff in the read tool instead of sending walls of text to the model.
Trace review habit
Every Monday I skim traces.jsonl for outcome: "blocked" and outcome: "error". Blocked writes usually mean approval UX is working. Errors cluster around rate limits or stale repo config. Two months in, this takes ten minutes and prevents the slow drift where everyone assumes the bot is fine because it is quiet.
What I tried that did not work
One mega github tool. The model could not reliably choose between search, read, and write operations. Splitting tools fixed more than prompt tuning.
Trusting host approval UI alone. Developers click through approvals on autopilot. Policy deny-by-default in code catches fatigue mistakes.
Skipping idempotency. A retry after a timeout created duplicate Linear issues. Now every write carries a key derived from trigger + entity ID.
Letting the model pick channels or repos without validation. My Friday night mistake. Hardcode allowed repos in server config, not in prompts.
Building autonomous messaging before explicit workflows. I ran OpenClaw and Hermes experiments. Fun, useful, different problem. Production team workflows still look like the diagram above.
Trade-offs vs other stacks
| Approach | Pros | Cons |
|---|---|---|
| MCP workflow (this post) | Testable, portable tools, host swap | You build orchestration |
| LangGraph / Temporal DAG | Strong SLAs, visual ops | Heavier upfront design |
| Personal agents (Hermes/OpenClaw) | Always-on, multi-channel | Harder to enforce enterprise policy |
| IDE-only MCP | Fastest for dev tasks | Not a cron-friendly ops layer |
Hybrids are normal. MCP servers as the tool layer, personal agents for exploratory research, DAG for customer-facing automation.
Security checklist (short, not optional)
- Read-only DB users for analytics tools
- Filesystem servers scoped to repo root (MCP security basics)
- Secrets in env vars, never in committed
mcp.json - Separate tokens per workflow, not one god PAT
- Review third-party MCP servers like dependencies (npm supply chain post mindset applies)
Known limitations
- MCP does not standardize approval UX. You implement gates per host.
- Long diffs blow token budgets. Preprocess in tools.
- Model tool-choice errors still happen with great schemas. Golden tests catch drift; they do not eliminate it.
- Cross-host feature parity is imperfect. Test on the host you deploy.
FAQ
Do I need an autonomous agent framework to use MCP workflows?
No. MCP workflows run fine inside Cursor, a cron-fired Node script, or a CI job with a headless host. Frameworks add channels and memory; they are not prerequisites.
How many tools should one server expose?
I aim for 5–12 per domain server (github-read, github-write). More than that and selection accuracy drops in my traces.
Where does prompt engineering fit?
Prompts steer planning; policy enforces boundaries. Prompt engineering vs software engineering is the longer argument. Short version: prompts are not your only safety layer.
Can I reuse the same MCP servers in Hermes or OpenClaw?
Often yes, with adapter friction. Invest in idempotent tool contracts and you swap orchestration shells faster.
What should my first workflow be?
Read-only summarization with logs. Add one write tool after a week of clean traces.
Closing: the workflow you can debug at midnight
The point is not maximal autonomy. The point is a procedure your teammate could run manually, now accelerated by a model, with traces that tell the truth when it misbehaves.
Start with one trigger, two read tools, one gated write, ten-step cap. Log JSON. Run golden tests. Expand only when traces bore you.
Related reading
- What is MCP?
- How to use MCP with Cursor and Claude
- Hermes Agent vs OpenClaw
- AI agents landscape 2026
- LLM coding tools vs traditional development
Written by Rohit Singh — software developer in Jaipur. I ship Study Stream Black and document patterns I actually run, not patterns that only work in keynote demos.
