All labs
ActiveDecember 2025

Lab

Agentic Workflow Orchestrator

A multi-step AI agent that automates engineering delivery workflows — ticket triage, context gathering, draft generation, and review routing — using a tool-calling loop.

Agentic AIAutomationTool Use

The problem

Engineering delivery teams spend a surprising fraction of their time on coordination work that does not require judgment: parsing incoming requests, gathering context from multiple tools, drafting initial responses, and routing work to the right people. The judgment part — deciding what to actually build — is a small portion of the total cycle.

This lab explored how much of that coordination overhead could be handled by an AI agent operating in a tool-calling loop, without requiring a human in the loop for each step.

What this lab built

An orchestrator agent that runs on new engineering tickets and executes a four-step pipeline:

Triage — classifies the ticket by type (bug, feature, question, incident), urgency, and affected system. Outputs a structured triage object that downstream steps consume.

Context gathering — calls a set of tools to pull relevant context: similar past tickets from a vector store, related code changes from git history, runbook references, and team ownership from a service catalog.

Draft generation — produces a structured draft response: a summary of what is being asked, what context was found, a recommended next action, and any blockers or dependencies identified.

Review routing — based on the triage classification and context findings, routes the draft to the appropriate reviewer with a pre-filled review checklist.

The full loop runs in under 90 seconds for most tickets and reduces the first-response time from hours to minutes.

Architecture decisions

The agent uses a ReAct-style loop (reason, act, observe) rather than a fixed pipeline. This was necessary because context gathering is not deterministic — some tickets require one tool call, others require five, and hard-coding the sequence created brittle failure modes.

All tool calls are logged with their inputs, outputs, and latency. This trace is attached to the ticket as a hidden comment, giving reviewers full visibility into what the agent found and why it made its recommendations.

A confidence threshold gates the routing step. If the agent's confidence in the triage classification is below 0.75, it flags the ticket for human triage before proceeding. This prevents high-urgency tickets from being misclassified and routed to the wrong team.

What I learned

Tool design is 80% of the work. The agent loop itself was relatively straightforward to implement. The hard part was designing the tool interfaces so that the agent could reliably use them — clear input schemas, predictable output formats, and graceful degradation when a tool returns no results.

Confidence calibration matters more than accuracy. An agent that is wrong 10% of the time is fine if it knows when it is uncertain. An agent that is confidently wrong is dangerous. The confidence threshold and the human-triage fallback are the most important safety controls in the system.

The happy path is easy. Edge cases are where agents fail. Tickets that span multiple systems, have contradictory context, or reference past incidents that are not in the vector store all require different handling. Building the edge-case coverage took three times as long as the core loop.

Status

Active. The orchestrator is running on a subset of incoming tickets for one engineering team. Triage accuracy is at 89% and mean first-response time has dropped from 4.2 hours to 23 minutes. The next phase will expand tool coverage to include customer-facing error logs and on-call runbooks.

Want this built for your context?

Labs are proof-of-concept. If this pattern applies to a problem you have, let's talk about what a production version would look like.

Book a strategy conversation