1

The problem

Research is upstream of every outbound email and it's the part that doesn't scale: matching a fuzzy ICP, finding the right person, and grounding the message in something real. The email is the cheap step; the homework is the cost. Off-the-shelf tools — Apollo, Clay — couldn't run the kind of multi-step research I needed, and they couldn't be tuned to my ICP. Apollo is a contact database with enrichment templates; Clay is a spreadsheet of HTTP calls and prompt cells. Neither chains a Common Crawl scan into a pricing-page fetch into a Pydantic-validated verdict, then routes low-confidence rows to a deterministic fallback. And fuzzy criteria like “compliance-heavy SaaS doing annual audits” don't fit a templated job-title filter.

2

What this does

Five LangGraph stages — discover, enrich, contacts, qa, outreach — turn an ICP description into verified, personalized first-touch emails. Discovery fans Common Crawl CDX + Brave Search into a slug-dedup gate. Enrichment is one DeepSeek V4-Pro thinking-mode call per company at temperature 0.2, with a Pydantic-validated JSON verdict. The whole graph runs on Cloudflare Containers with AsyncPostgresSaver-checkpointed threads on Neon — every state is inspectable, resumable, and auditable.

3

Why it's interesting

Three gates carry the quality. Schema-constrained JSON: DeepSeek's verdict is parsed against a Pydantic model and a malformed reply falls through to a deterministic heuristic — no half-parsed industry tag ever reaches scoring. Confidence floor: classifications below 0.4 are dropped to the heuristic, and a re-classification is only skipped when the existing record is both confident (≥ 0.6) and fresh (< 30 days) — stale or shaky verdicts don't propagate. Plan-approval gate: outreach_queue calls interrupt() and waits for an explicit approved_ids resume — no draft is sent without a human in the loop.

4

By the numbers

69 LangGraph graphs in production and eight external integrations live: Resend (transactional sends + Svix-verified reply webhooks), NeverBounce (SMTP RCPT-TO verification), LinkedIn Voyager + GitHub GraphQL (contact signals), Common Crawl (discovery seed), Ashby + Greenhouse (ATS job postings), and Cloudflare Containers + R2 (compute substrate). Each one has a webhook signature check or a rate limiter on the read side. Same graph code in dev and prod — `langgraph dev` on :8002 locally, `AsyncPostgresSaver`-checkpointed threads on Neon when it ships.

At a glance
5 stages30d freshness gateSonnet 4.6 cache: 10% readNeverBounce verified