How 38shift Runs on AI Agents

Most AI agencies talk about what they build for clients. We think the more useful test is what they run themselves.

At 38shift, we run core internal operations on the same kinds of agents we build for clients, and we learned more from operating those systems day to day than we ever could from demos alone. Based on our internal operating data from the last six months, agent-driven workflows now handle roughly 70% of our repetitive internal operations. Measured against our previous manual model, that has reduced internal ops costs by about 40%, while preserving human review wherever judgment still matters.

That timing matters. McKinsey's November 5, 2025 State of AI survey found that 88% of organizations report regular AI use in at least one business function, but only about one-third say they have started scaling AI across the enterprise. Agent adoption is moving too, but it is still early: 23% report scaling an agentic AI system somewhere in the business, while 39% are still experimenting. The gap is no longer interest. It is operationalization.

At a glance

—Content agent produces 8-12 draft-ready blog posts per month with minimal editing. What used to take 3 freelancers now takes 1 editor plus 1 agent.
—Sales outreach agent manages 200+ lead interactions weekly, qualifying prospects and booking demos once a lead meets our qualification threshold.
—Research agent compiles client competitive analysis in under 2 hours. Previously, that took about 2 days of manual work.
—Finance ops agent processes invoice-to-ledger matching with zero false positives across 247 matches in our last six months of internal tracking.
—Document agent reduces project handoff preparation from roughly 4 hours to 30 minutes.
—McKinsey's November 5, 2025 survey found 88% of organizations use AI in at least one business function, but only about one-third have begun scaling AI across the enterprise.

The roles we automated

The simplest way to understand our internal org chart is that humans still own outcomes, while agents own repeatable execution inside clearly defined lanes.

38SHIFT
|
+-- Leadership / Strategy (Human)
|   |
|   +-- sets priorities, decides where autonomy is allowed
|   +-- reviews exceptions, edge cases, and high-stakes outputs
|
+-- Growth Function
|   |
|   +-- SDR / Editor / Strategist (Human owners)
|   |   |
|   |   +-- Sales Outreach Agent -> researches leads, drafts outreach, routes qualified replies
|   |   +-- Content Agent -> turns briefs, notes, and research into first drafts
|   |
|   +-- Human checkpoint -> approves messaging, takes live calls, publishes final copy
|
+-- Delivery Function
|   |
|   +-- Analyst / Operator / PM (Human owners)
|   |   |
|   |   +-- Research Agent -> compiles market scans, audits, comparisons
|   |   +-- Document Agent -> converts calls and notes into handoffs and project memory
|   |
|   +-- Human checkpoint -> synthesizes findings, shapes recommendations, owns client delivery
|
+-- Finance / Ops Function
|   |
|   +-- Ops Lead / Finance Review (Human owners)
|   |   |
|   |   +-- Finance Agent -> matches invoices, flags discrepancies, prepares reconciliation
|   |
|   +-- Human checkpoint -> approves exceptions, payments, and vendor issues
|
+-- Shared operating rule
    |
    +-- Agent handles structured repetitive work
    +-- Human handles judgment, approvals, and anything ambiguous
    +-- Escalation path: agent -> owner -> leadership if risk or uncertainty increases

Human owners set priorities, approve judgment-heavy work, and step in on exceptions. Agents handle the repetitive middle of the workflow.

The Content Agent generates first drafts of blog posts, case studies, and email sequences. It pulls from client work, our internal research base, and the editorial calendar. It does not produce final copy on its own. What it does well is turn 90 minutes of scattered thinking into a structured 1,200-word first draft that needs about 30 minutes of editorial revision.

The Sales Outreach Agent handles the first layer of outbound work. It researches companies, identifies relevant signals, drafts personalized messages, tracks replies, and routes qualified leads forward. Over the last six months, it has supported more than 8,000 lead interactions. The practical result is that our SDR spends less time on repetitive email production and more time on qualified conversations.

The Research Agent takes a client brief and assembles competitive analysis, feature comparisons, and technology audits. For a client exploring a new product direction, what used to take two analyst-days now takes about two hours of structured review and synthesis.

The Finance Agent processes invoices, matches them to purchase orders, and flags discrepancies for review. In one recent case, it identified a vendor overcharge equal to roughly 3% of a $50,000 contract. In our internal tracking over the last six months, it processed 247 invoice matches with zero false positives.

The Document Agent turns calls, notes, and implementation context into cleaner handoff material. It reduces the friction between project delivery and operational continuity, which is one of the easiest places for small teams to lose time without noticing.

Why this matters beyond our team

Our internal stack maps closely to where public AI adoption is already strongest. McKinsey's November 5, 2025 survey found that AI use is most commonly reported in IT, marketing and sales, and knowledge management. That fits the reality we see too. The first workflows worth automating are usually not the most glamorous ones. They are the ones with clear structure, high repetition, and measurable drag.

Microsoft's April 23, 2025 Work Trend Index makes the same broader point from the leadership side. It found that 82% of leaders say 2025 is a pivotal year to rethink strategy and operations, while 81% expect agents to be at least moderately integrated into their AI strategy within the next 12 to 18 months. In other words, the market is moving from experimentation toward redesign.

The challenge is that redesign is harder than adoption. It is relatively easy to add an LLM to a task. It is much harder to build a repeatable operating layer around that model and keep the workflow trustworthy over time.

Numbers that matter

—Content: cost per post moved from roughly $600 to roughly $120, and monthly output increased from 4 posts to 8-12.
—Sales: the system supports 200+ lead interactions weekly and shifts human time away from repetitive email production toward qualified calls and follow-up.
—Research: competitive scan time moved from about 2 days to about 2 hours, with the agent reviewing 15+ documents before human synthesis.
—Finance: invoice matching tracked over the last six months reached 247 matches, while reconciliation time fell from about 4 hours per week to about 45 minutes for exceptions.
—Documentation: handoff preparation moved from about 4 hours to about 30 minutes, with less project memory loss between build, delivery, and follow-through.

What surprised us

Consistency was the biggest win. Humans have bad days. Agents do not get tired, distracted, or demoralized after repetitive work. The content agent produces the same level of draft quality on Monday as it does on Friday. The sales agent does not lose energy after a long sequence of rejections. That sounds boring, but operationally it compounds.

Agents also surfaced patterns we might have missed manually. In one research cycle, the system flagged that 8 of 15 target competitors had shifted toward more vertical-specific positioning. That changed how we framed the opportunity for the client.

What did not work

A fully autonomous support chatbot was not a good fit for us. It handled routine inquiries well enough, but broke down on the cases where context, nuance, or judgment mattered most. The lesson was not that agents fail. The lesson was that not every workflow should be pushed toward the same level of autonomy.

We also underestimated prompt design early on. The first versions of the content agent were technically correct and editorially dead. It took eight rounds of system-prompt refinement to get something that sounded closer to us. The model was not the whole problem. The operating logic around it mattered just as much.

Human review is part of the system

We do not run agents without human review loops where accuracy matters. That is not caution for its own sake. McKinsey's November 5, 2025 survey found that 51% of organizations using AI reported at least one negative consequence in the past year, and nearly one-third reported consequences tied to AI inaccuracy. The more operationally important the workflow is, the more review design matters.

That is why we think of agents as force multipliers, not permission to remove judgment from the process. Good internal systems reduce repetitive work. They do not erase accountability.

The practical takeaway

The point of running internal agents is not to say every business should automate everything. It is to prove, in our own operations, which kinds of workflows create real leverage and which ones still depend on human judgment.

That proof matters more than theory. It shapes how we design systems for clients, how we scope automation safely, and how we decide where autonomy is actually useful.

If you are trying to figure out which internal workflow is actually worth automating first, that is usually the most useful place to start.

How 38shift Runs on AI Agents: Our Internal Stack

At a glance

The roles we automated

Why this matters beyond our team

Numbers that matter

What surprised us

What did not work

Human review is part of the system

The practical takeaway

Keep Reading

We Shipped WebMCP. Here Is What That Means.

Inside a Signal-Driven Growth Engine

The Governance Layer Every AI Product Needs