skip to content
Agentic Lead Gen
open sourcediscovery agent: scanning 820 domains35 cited papers

autonomous AI agents that
discover, enrich, and close B2B leads

five specialized AI agents work autonomously to find companies, enrich profiles, discover decision-maker contacts, and craft personalized outreach -- end to end, without human intervention. your agents work 24/7 so you don't have to.

agents active -- last run: today
50,000+
pages discovered
autonomous crawl agents (RL + UCB1)
300+
leads qualified
multi-agent enrichment pipeline
92%
contact accuracy
AI-verified email + LinkedIn
24x7
agent uptime
fully autonomous, zero manual work
820 domains discovered → 4,200 companies enriched → 1,100 contacts verified → 300 personalized outreach campaigns
pipeline modules

from raw web pages to scored B2B leads with generated reports. seven modules, zero cloud dependencies.

00
orchestrate

system overview

SQLite WAL + LanceDB HNSW + ChromaDB hybrid storage in ~15 GB footprint

01
crawl

rl crawler

DQN agent with 448-dim state + UCB1 multi-armed bandit explores 820 domains, achieving 3× harvest rate

02
extract

ner extraction

BERT-base-cased + spaCy + BERTopic extract entities at 92.3% F1, processing ~100 pages/sec

03
resolve

entity resolution

Siamese 128-dim embeddings with SQLite CTEs deduplicate in <1ms ANN queries

04
score

lead scoring

XGBoost 50% + LogReg 25% + RF 25% ensemble scores leads with 89.7% precision

05
report

report generation

Ollama + SQLite/ChromaDB RAG generates reports with 97% factual accuracy in 10-30s

06
evaluate

evaluation

SHAP explanations + cascade error tracking monitor pipeline health (CER ~0.15)

00
orchestrate

system overview

SQLite WAL + LanceDB HNSW + ChromaDB hybrid storage in ~15 GB footprint

01
crawl

rl crawler

DQN agent with 448-dim state + UCB1 multi-armed bandit explores 820 domains, achieving 3× harvest rate

02
extract

ner extraction

BERT-base-cased + spaCy + BERTopic extract entities at 92.3% F1, processing ~100 pages/sec

03
resolve

entity resolution

Siamese 128-dim embeddings with SQLite CTEs deduplicate in <1ms ANN queries

04
score

lead scoring

XGBoost 50% + LogReg 25% + RF 25% ensemble scores leads with 89.7% precision

05
report

report generation

Ollama + SQLite/ChromaDB RAG generates reports with 97% factual accuracy in 10-30s

06
evaluate

evaluation

SHAP explanations + cascade error tracking monitor pipeline health (CER ~0.15)

pipeline benchmarks

every number measured, every claim paper-backed. see BENCHMARKS.md for methodology.

300
pages to leads
50K pages → 300 leads (99.4% reduction)
15%
harvest rate
3× baseline via RL crawler
92%
NER F1 score
BERT-base + spaCy extraction
1ms
ANN latency
siamese 128-dim entity resolution
89%
scoring precision
89.7% precision / 86.5% recall
97%
factual accuracy
RAG report generation via ollama
182ms
per-lead latency
end-to-end without LLM step
1,500
annual cost
$1,500 local vs $13,200 cloud
VN

built by vadim nicolai — an AI engineer who got tired of paying $10K+/year for cloud CRMs that don't understand his ICP. agentic lead gen deploys autonomous AI agents that crawl, extract, score, and enrich B2B prospects end-to-end — no manual steps, no babysitting, just agents working 24/7 on your pipeline.

backed by 35 cited papers since 2023view source

why local-first B2B lead gen

cloud CRMs are optimized for their margins, not your pipeline. Lead-gen reverses that — it works on your hardware.

reinforcement learning finds what keyword crawlers miss

DQN with 448-dimensional state space and UCB1 multi-armed bandit learns which domains yield the best leads. 3× harvest rate over baseline random crawling.

448-dim state encodes page structure, link density, and domain history

UCB1 bandit balances exploration vs exploitation across 820 domains

you get 3× more relevant pages per crawl cycle, automatically

ML ensemble, not a single model

XGBoost handles 50% of scoring weight, logistic regression 25%, random forest 25%. each model catches what the others miss — 89.7% precision, 86.5% recall.

ensemble outperforms any single model by 4-7% on precision-recall AUC

SHAP explanations show why each lead scored high or low

conformal prediction gives calibrated confidence intervals on every score

your data never leaves your machine

SQLite graph + LanceDB vectors + ChromaDB embeddings — all local. no API calls to score leads. $1,500/year total cost vs $5,400-13,200 for cloud alternatives.

~15 GB footprint for the entire pipeline with all indexes

182ms per-lead end-to-end latency without LLM generation

64-89% cost savings: commodity hardware vs cloud CRM subscriptions

ready to deploy your own pipeline?

300 qualified leads per crawl cycle. fully local. backed by 35 cited papers.

architecture

storage

hybrid graph + vector + document store

sqlite wallancedb hnswchromadb

ML / RL

RL crawling + ensemble scoring

dqnucb1xgboostbert nersiamese

generation

local LLM report generation

ollamaragbertopic

evaluation

cascade error tracking + drift detection

shapevidently
fully open source — fork it, self-host it, extend it for your ICP

stop managing pipelines. let agents do it.

autonomous agents discover, enrich, score, and deliver 300+ qualified B2B leads per cycle. fully local. $1,500/year total cost.

not ready yet? get agentic pipeline updates

one email per month. new agents, benchmarks, and autonomy upgrades.

Agentic Lead Gen

agentic B2B lead generation platform. AI agents handle discovery, enrichment, scoring, and outreach — end to end.

source code

built by one person who got tired of paying cloud CRMs $10K/year to own their own leads. this is not a startup. there is no pricing page. it is a pipeline that does one thing well: generate qualified B2B leads on commodity hardware.

Agentic Lead Gen