autonomous AI agents that
discover, enrich, and close B2B leads
five specialized AI agents work autonomously to find companies, enrich profiles, discover decision-maker contacts, and craft personalized outreach -- end to end, without human intervention. your agents work 24/7 so you don't have to.
- 50,000+
- pages discovered autonomous crawl agents (RL + UCB1)
- 300+
- leads qualified multi-agent enrichment pipeline
- 92%
- contact accuracy AI-verified email + LinkedIn
- 24x7
- agent uptime fully autonomous, zero manual work
from raw web pages to scored B2B leads with generated reports. seven modules, zero cloud dependencies.
system overview
SQLite WAL + LanceDB HNSW + ChromaDB hybrid storage in ~15 GB footprint
rl crawler
DQN agent with 448-dim state + UCB1 multi-armed bandit explores 820 domains, achieving 3× harvest rate
ner extraction
BERT-base-cased + spaCy + BERTopic extract entities at 92.3% F1, processing ~100 pages/sec
entity resolution
Siamese 128-dim embeddings with SQLite CTEs deduplicate in <1ms ANN queries
lead scoring
XGBoost 50% + LogReg 25% + RF 25% ensemble scores leads with 89.7% precision
report generation
Ollama + SQLite/ChromaDB RAG generates reports with 97% factual accuracy in 10-30s
evaluation
SHAP explanations + cascade error tracking monitor pipeline health (CER ~0.15)
system overview
SQLite WAL + LanceDB HNSW + ChromaDB hybrid storage in ~15 GB footprint
rl crawler
DQN agent with 448-dim state + UCB1 multi-armed bandit explores 820 domains, achieving 3× harvest rate
ner extraction
BERT-base-cased + spaCy + BERTopic extract entities at 92.3% F1, processing ~100 pages/sec
entity resolution
Siamese 128-dim embeddings with SQLite CTEs deduplicate in <1ms ANN queries
lead scoring
XGBoost 50% + LogReg 25% + RF 25% ensemble scores leads with 89.7% precision
report generation
Ollama + SQLite/ChromaDB RAG generates reports with 97% factual accuracy in 10-30s
evaluation
SHAP explanations + cascade error tracking monitor pipeline health (CER ~0.15)
every number measured, every claim paper-backed. see BENCHMARKS.md for methodology.
- 300
- pages to leads 50K pages → 300 leads (99.4% reduction)
- 15%
- harvest rate 3× baseline via RL crawler
- 92%
- NER F1 score BERT-base + spaCy extraction
- 1ms
- ANN latency siamese 128-dim entity resolution
- 89%
- scoring precision 89.7% precision / 86.5% recall
- 97%
- factual accuracy RAG report generation via ollama
- 182ms
- per-lead latency end-to-end without LLM step
- 1,500
- annual cost $1,500 local vs $13,200 cloud
built by vadim nicolai — an AI engineer who got tired of paying $10K+/year for cloud CRMs that don't understand his ICP. agentic lead gen deploys autonomous AI agents that crawl, extract, score, and enrich B2B prospects end-to-end — no manual steps, no babysitting, just agents working 24/7 on your pipeline.
why local-first B2B lead gen
cloud CRMs are optimized for their margins, not your pipeline. Lead-gen reverses that — it works on your hardware.
reinforcement learning finds what keyword crawlers miss
DQN with 448-dimensional state space and UCB1 multi-armed bandit learns which domains yield the best leads. 3× harvest rate over baseline random crawling.
448-dim state encodes page structure, link density, and domain history
UCB1 bandit balances exploration vs exploitation across 820 domains
you get 3× more relevant pages per crawl cycle, automatically
ML ensemble, not a single model
XGBoost handles 50% of scoring weight, logistic regression 25%, random forest 25%. each model catches what the others miss — 89.7% precision, 86.5% recall.
ensemble outperforms any single model by 4-7% on precision-recall AUC
SHAP explanations show why each lead scored high or low
conformal prediction gives calibrated confidence intervals on every score
your data never leaves your machine
SQLite graph + LanceDB vectors + ChromaDB embeddings — all local. no API calls to score leads. $1,500/year total cost vs $5,400-13,200 for cloud alternatives.
~15 GB footprint for the entire pipeline with all indexes
182ms per-lead end-to-end latency without LLM generation
64-89% cost savings: commodity hardware vs cloud CRM subscriptions
ready to deploy your own pipeline?
300 qualified leads per crawl cycle. fully local. backed by 35 cited papers.
architecture
storage
hybrid graph + vector + document store
ML / RL
RL crawling + ensemble scoring
generation
local LLM report generation
evaluation
cascade error tracking + drift detection
stop managing pipelines. let agents do it.
autonomous agents discover, enrich, score, and deliver 300+ qualified B2B leads per cycle. fully local. $1,500/year total cost.
one email per month. new agents, benchmarks, and autonomy upgrades.