Guides

Waterfall Enrichment + Visitor Identity: Full Signal Stack

Single-provider enrichment covers 50-70%. Waterfall hits 85-95%. Add visitor ID as Layer 0 to go from anonymous traffic to fully enriched leads.

Elene Marjanidze Elene Marjanidze · · 10 min read
Waterfall Enrichment + Visitor Identity: Full Signal Stack

Your enrichment provider says it covers 70% of your contacts. So you feed it a list, run the waterfall, and sure enough - most records come back filled. Emails, phone numbers, firmographics. Feels productive.

But step back for a second.

Where did that list come from? Form fills? A scraped LinkedIn export? A conference badge scan from six months ago? Whatever the source, you’re enriching contacts you already know about. The 97% of website visitors who never filled out a form? Your enrichment stack has no idea they exist.

That’s the gap. And it’s enormous.

Single-provider enrichment covers 50-70% of records you give it. Waterfall enrichment - cascading through multiple providers - pushes that to 85-95%. Impressive numbers. But both approaches share the same blind spot: they only work on known contacts.

What about the thousands of anonymous visitors hitting your pricing page, reading your case studies, and bouncing every single day? Those are your highest-intent prospects, and your enrichment stack can’t touch them because it doesn’t know they’re there.

This guide shows you how to fix that by building the complete signal stack - from anonymous visitor to fully enriched, scored, researched lead - using visitor identification as the foundation layer.


Table of Contents

  1. Why Single-Provider Enrichment Fails
  2. The Waterfall Model Explained
  3. The Missing Layer: Visitor Identification as Layer 0
  4. The Complete Signal Stack
  5. Layer-by-Layer Breakdown
  6. Coverage Improvements: The Math
  7. Cost Per Fully Enriched Lead
  8. Implementation Options
  9. Pro Tips for Credit Efficiency
  10. FAQ

Why Single-Provider Enrichment Fails

No single data provider has everything. Each one maintains a different database, collects data through different methods, and covers different segments of the market. The result: predictable coverage gaps.

Here’s a rough picture of how the major providers stack up:

ProviderStrong OnWeak OnTypical Fill Rate
ApolloEmail addresses, B2B contactsPhone numbers, SMB coverage55-65%
LushaDirect dials, phone numbersSmaller overall database50-60%
Clearbit (now Breeze)Company data, firmographicsPerson-level contact info45-55%
ZoomInfoEnterprise contacts, org chartsSMB, international60-70%
PeopleDataLabsBreadth, developer-friendly APIData freshness, accuracy50-60%

See the pattern? Apollo might nail 60% of your email lookups but whiff on half the phone numbers. Lusha fills in the phones but has a smaller total database. Clearbit gives you beautiful company profiles but won’t reliably produce direct emails for the people at those companies.

When you rely on a single provider, you’re accepting a 30-50% gap across your records. That’s not a rounding error. That’s half your pipeline walking around with missing phone numbers, outdated titles, or no email at all.

And the data decay problem makes it worse. People change jobs. Companies get acquired. Phone numbers go stale. Even a provider with 70% initial coverage starts degrading the moment you pull the data.

The industry figured out a solution: don’t pick one provider. Pick all of them.


The Waterfall Model Explained

Waterfall enrichment is simple in concept. Instead of querying one provider and accepting whatever comes back, you query Provider A first. For any fields that come back empty, you query Provider B. Still missing data? Try Provider C. And so on.

Record: jane@acme.com


   ┌──────────┐
   │ Apollo   │ → Found email ✓, phone ✗, title ✗
   └────┬─────┘
        │ (phone + title still missing)

   ┌──────────┐
   │ Lusha    │ → Found phone ✓, title ✗
   └────┬─────┘
        │ (title still missing)

   ┌──────────┐
   │ PDL      │ → Found title ✓
   └──────────┘

Result: email ✓  phone ✓  title ✓  (90% fill)

The math stacks up fast:

  • Provider A alone: 60% fill rate
  • Provider A + B: catches 20% of what A missed = ~80% fill rate
  • Provider A + B + C: catches 10% of what B missed = ~90% fill rate

This is why tools like Clay have exploded in popularity. Clay automates the waterfall across 150+ data providers so you don’t have to manually orchestrate API calls. You set up the cascade logic once, and every record gets run through the chain automatically.

The results are real. OpenAI reportedly used Clay’s waterfall model to double their enrichment rates from 40% to 80%. Across Clay’s customer base, teams regularly hit 85-95% fill rates when waterfalling through three or more providers.

But there’s a catch that almost nobody talks about.

Waterfall enrichment makes your known contacts more complete. It does nothing for contacts you don’t know about.

You still need a starting input. A name. An email. A domain. The waterfall enriches what you feed it. If your input is a list of 300 form fills from last month, you’ll get 270 beautifully enriched records. Meanwhile, the 9,700 anonymous visitors who browsed your site and left? The waterfall never saw them.

That’s where the entire model breaks.


The Missing Layer: Visitor Identification as Layer 0

Waterfall enrichment is Layer 1. It takes known contacts and makes them more complete. But the real leverage is in what happens before the waterfall - identifying who those anonymous visitors are in the first place.

This is what we call Layer 0: Identification.

Think about where your highest-intent buyers actually are right now. They’re not sitting in your CRM. They’re not on your email list. They’re on your website, right now, reading your pricing page, browsing your integrations, comparing you against alternatives. And 97% of them will leave without ever telling you who they are.

Visitor identification solves this by resolving anonymous website sessions into real contact records - person-level data including name, email, company, job title, LinkedIn URL, and behavioral signals like pages visited and session duration.

Here’s why Layer 0 changes the economics of everything downstream:

  • Without Layer 0: Your waterfall only enriches form fills (3% of traffic). You’re waterfalling cold lists.
  • With Layer 0: Your waterfall enriches identified visitors (30-40% of traffic). You’re waterfalling warm, high-intent contacts.

That’s the difference between enriching 300 contacts from form fills vs. 3,000-4,000 contacts from visitor identification. Same traffic. Same waterfall. 10-13x more leads entering the enrichment pipeline.

And because these contacts were actively browsing your site when they were identified, they’re inherently higher intent than any scraped list or purchased database. The enrichment data you add downstream gets applied to people who already showed buying signals.


The Complete Signal Stack

Here’s the full architecture. Six layers. Each one transforms the data from the layer above and feeds it to the layer below.

┌─────────────────────────────────────────────────────────┐
│              THE COMPLETE SIGNAL STACK                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Layer 0: IDENTIFY     Leadpipe → Who is this visitor?  │
│           ↓            Name, email, company, pages      │
│                                                         │
│  Layer 1: VALIDATE     ZeroBounce / NeverBounce         │
│           ↓            Is the email deliverable?        │
│                                                         │
│  Layer 2: ENRICH       Apollo → Lusha → PeopleDataLabs  │
│           ↓            Phone, LinkedIn, firmographics   │
│                                                         │
│  Layer 3: INTENT       Leadpipe Orbit API               │
│           ↓            What topics are they researching? │
│                                                         │
│  Layer 4: RESEARCH     Claygent / Perplexity            │
│           ↓            Company news, challenges, fit    │
│                                                         │
│  Layer 5: SCORE        ICP formula                      │
│           ↓            Qualified? Priority tier?        │
│                                                         │
│  Layer 6: ACT          CRM / AI SDR / Slack             │
│                        Outreach, sequence, notify       │
└─────────────────────────────────────────────────────────┘

Each layer multiplies the value of the one before it. Identification without enrichment gives you partial records. Enrichment without validation wastes credits on bad emails. Validation without intent scoring treats every lead equally. Intent without research produces generic outreach.

The stack only works when all layers are connected. Miss one, and you lose the compounding effect.

Most teams have some of these layers. Almost nobody has all of them wired together. The good news: the tooling has matured to the point where you can build this entire stack in an afternoon for under $650/month.


Layer-by-Layer Breakdown

Let’s walk through each layer - what it does, the recommended tool, the cost, and what it adds to your lead record.

Layer 0: Identify

Purpose: Turn anonymous website visitors into known contacts.

A JavaScript pixel on your site resolves anonymous visitors using deterministic matching against a proprietary identity graph. When someone visits your pricing page, you get their name, email (personal and professional), company, job title, LinkedIn URL, and full behavioral data - pages visited, session duration, return visits.

Why it matters: Everything downstream depends on having a contact to work with. Without this layer, you’re limited to the 3% who fill out forms.

Layer 1: Validate

Purpose: Confirm email deliverability before spending enrichment credits.

Tools like ZeroBounce or NeverBounce check whether the identified email addresses are actually deliverable. This filters out invalid, disposable, and catch-all addresses before you waste money enriching them.

Why it matters: Enriching an invalid email is burning money. Validation typically costs $0.005-0.01 per check and saves you from wasting $0.10-0.50 in enrichment credits on dead addresses. It also protects your sender reputation if you plan to email these contacts.

Layer 2: Enrich

Purpose: Fill in the gaps - phone numbers, LinkedIn profiles, firmographics, tech stack data.

This is where the waterfall lives. Your identified contact goes through Apollo, then Lusha, then PeopleDataLabs (or whatever provider cascade you’ve configured). Each provider fills in what the previous one missed.

Why it matters: Leadpipe identifies the person and gives you an email. The enrichment waterfall adds the phone number, verifies the LinkedIn profile, appends company size, industry, revenue, tech stack, and job seniority. Your SDR (human or AI) now has a complete picture.

Layer 3: Intent

Purpose: Layer buying intent signals on top of contact data.

Leadpipe’s Orbit API tracks cross-site research behavior and assigns intent scores from 1-100. This tells you not just WHO the visitor is, but WHAT they’re actively researching across the web. Someone with an intent score of 85 for “visitor identification software” is a very different lead than someone who stumbled onto your blog from a random Google search.

Why it matters: Intent scoring lets you prioritize which leads get immediate attention vs. which go into a nurture sequence. Without it, every identified visitor looks the same.

Layer 4: Research

Purpose: Generate personalized context for outreach.

AI research tools like Claygent or Perplexity analyze the enriched company data and produce summaries: recent funding rounds, product launches, hiring patterns, competitive landscape, challenges the company is likely facing. This becomes the raw material for personalized emails and talk tracks.

Why it matters: “Hi Jane, I saw you visited our pricing page” is lazy outreach. “Hi Jane, I noticed Acme Corp just expanded into EMEA and you’re scaling the marketing team - here’s how companies in similar growth stages handle visitor identification” is the kind of message that gets replies.

Layer 5: Score

Purpose: Qualify leads against your ICP and assign priority tiers.

Using the enriched data (company size, industry, revenue, job title) plus intent signals, you score each lead against your ideal customer profile. Tier 1 leads go straight to your AI SDR or sales team. Tier 2 enters a nurture sequence. Tier 3 gets dropped.

Why it matters: Your sales team’s time is finite. Scoring ensures they spend it on the leads most likely to convert - not on every person who happened to visit your site.

Layer 6: Act

Purpose: Trigger outreach, create CRM records, notify sales.

The scored, enriched, researched lead gets routed to the right destination. That could be a CRM deal, an AI SDR sequence, a Slack alert, or a manual task for your sales team. The routing depends on the lead’s score and tier.

Here’s what each layer adds to the total cost and coverage:

LayerToolMonthly CostWhat It AddsCumulative Coverage
0: IdentifyLeadpipe$299Name, email, company from anonymous traffic30-40% of visitors
1: ValidateZeroBounce~$50Email deliverability checkFilters to valid emails
2: EnrichClay waterfall$185Phone, LinkedIn, firmographics85-95% fill on identified
3: IntentLeadpipe OrbitIncludedTopic research signals, score 1-100Adds buying intent layer
4: ResearchClaygent~$100Company analysis, personalization fuelAdds context for outreach
5: ScoreClay formulasIncludedICP qualification, priority tiersFilters to qualified leads
Total~$634/moAnonymous to fully enriched, scored, researched

Coverage Improvements: The Math

This is where the signal stack earns its keep. Let’s run the numbers for a site with 10,000 monthly visitors.

Without Layer 0 (No Visitor Identification)

Your enrichment waterfall only processes form fills:

MetricCount
Monthly visitors10,000
Form fill rate3%
Form fills (known contacts)300
Enrichment fill rate (waterfall)85%
Fully enriched leads255
ICP qualification rate20-30%
Qualified, enriched leads50-80

Fifty to eighty qualified leads from 10,000 visitors. That’s a 0.5-0.8% yield on your traffic.

With Layer 0 (Visitor Identification Added)

Now add Leadpipe as Layer 0 before the enrichment waterfall:

MetricCount
Monthly visitors10,000
Leadpipe identification rate30-40%
Identified visitors3,000-4,000
Email validation pass rate~85%
Valid, identified contacts2,550-3,400
Enrichment fill rate (waterfall)85%
Fully enriched leads2,170-2,890
ICP qualification rate20-30%
Qualified, enriched leads430-870

That’s 6-16x more qualified leads from the exact same traffic. No additional ad spend. No new content. No new campaigns. Just a Layer 0 that captures the intent signals your enrichment stack was blind to.

The takeaway: Waterfall enrichment is an optimization. Visitor identification is a category shift. Waterfalling gets you from 60% to 90% fill rates on known contacts. Adding Layer 0 gets you from 300 contacts to 3,000+ contacts. The leverage is in the identification, not the enrichment.

And it compounds. Those 430-870 qualified leads didn’t come from a purchased list or a scraped database. They came from your own website traffic - people who were actively researching your product. The conversion rates downstream are dramatically higher because the intent signal is already baked in.


Cost Per Fully Enriched Lead

One of the most common questions: how does the cost stack up compared to other approaches?

Stack ConfigurationMonthly CostLeads/MonthCost per Lead
Leadpipe + Clay + Orbit~$484500-1,500$0.32-0.97
Clay only (from lists)$185Depends on input$0.50-2.00
ZoomInfo + Clay$1,400+Depends on input$2.00-5.00
Manual enrichment$2,000+ (time cost)50-100$20-40

The Leadpipe + Clay combination hits a sweet spot because Leadpipe handles the highest-cost step - identification - at a flat monthly rate. You’re not paying per API call for the identity resolution. That means your per-lead cost actually decreases as your traffic grows. More visitors means more identifications at the same price, spreading the fixed cost across more leads.

Compare that to ZoomInfo, where you’re paying $15,000-25,000/year for seat licenses before you even start enriching. Or to manual research, where a junior SDR spending 15 minutes per lead at $25/hour is burning $6.25 per record and still missing half the data.

For teams evaluating data providers for AI SDRs, the cost-per-enriched-lead metric is especially critical because AI agents chew through leads at high volume. A $5/lead input cost destroys your unit economics when the AI is processing thousands of contacts per month. RevOps teams that want to feed this enriched data directly into their warehouse or CDP should look at Leadpipe for RevOps: Programmatic Data for Your Stack for the integration patterns.


Implementation Options

There are three paths to building this stack. Pick the one that matches your team’s technical capacity and existing tools.

Best for: Teams already using or evaluating Clay. Most automated path.

Leadpipe webhook → Clay webhook table → Clay waterfall → CRM export

Leadpipe fires a webhook every time it identifies a visitor. Clay receives the webhook into a table, then automatically runs the enrichment waterfall, validation, scoring, and research steps. Qualified leads get pushed to your CRM or AI SDR.

The full setup is covered step-by-step in our Clay waterfall integration guide. If you’re also using HubSpot as your CRM, the Leadpipe + Clay + HubSpot guide covers the end-to-end pipeline.

Setup time: 30-60 minutes.

Option 2: Custom Pipeline

Best for: Engineering teams that want full control. Most flexible.

Leadpipe webhook → Your backend → API calls to enrichment providers → Database → CRM

You receive the Leadpipe webhook in your own backend, then orchestrate API calls to Apollo, Lusha, PeopleDataLabs, or any other provider directly. This gives you complete control over the waterfall logic, deduplication, error handling, and data storage.

This is the path that platforms and agencies take when building visitor identification into their own products via the Leadpipe API. It’s also the right choice if you need to keep all data in your own infrastructure for compliance reasons.

Setup time: 2-5 days depending on complexity.

Option 3: Zapier/Make

Best for: Non-technical teams that want something working today.

Leadpipe webhook → Zapier/Make → Enrichment steps → CRM

Leadpipe’s native integrations work with Zapier and Make out of the box. You can build a Zap that receives identified visitors, runs them through enrichment steps (many providers have native Zapier integrations), and pushes qualified leads to your CRM.

It’s not as powerful or cost-efficient as the Clay-based approach, but it works without touching a single API or writing a line of code. For teams under 5,000 monthly visitors, the identity-data-as-a-service approach with Zapier may be the fastest path to value.

Setup time: 1-2 hours.


Pro Tips for Credit Efficiency

Building the stack is one thing. Running it efficiently is another. Here are the tactics that separate teams burning money from teams printing pipeline.

1. Only enrich high-intent visitors.

Not every identified visitor is worth enriching. Someone who bounced off your homepage after 5 seconds is not the same as someone who spent 4 minutes on your pricing page. Use Leadpipe’s page-level filtering to only trigger enrichment webhooks for high-intent pages: pricing, demo, case studies, comparison pages, and integrations.

2. Use excluded paths to skip low-value traffic.

Leadpipe’s exclusion list feature lets you block identification on pages where visitors are unlikely to be buyers - support docs, blog posts about general topics, careers pages. This conserves your monthly identification credits for the traffic that actually matters.

3. Gate phone enrichment behind email validation.

Phone number lookups are typically the most expensive enrichment credit. Don’t waste them on contacts with invalid email addresses. Run email validation (Layer 1) first, and only send validated contacts into the phone number enrichment step. This alone can cut enrichment costs by 15-20%.

4. Use intent scores to tier your enrichment.

Not every lead deserves the full six-layer treatment. Use Leadpipe Orbit’s intent scores to create tiers:

  • Intent score 70-100: Full enrichment + AI research + immediate SDR outreach
  • Intent score 40-69: Basic enrichment + nurture sequence
  • Intent score 0-39: Identification only, no enrichment spend

This approach can reduce your enrichment costs by 40-60% while focusing budget on the leads most likely to convert.

5. Run lightweight enrichment on all, full enrichment on qualified.

There’s a two-pass approach that works well at scale: run a cheap enrichment step (just company data + email validation) on all identified visitors. Then score them against your ICP. Only the leads that pass ICP qualification get the full waterfall treatment with phone numbers, AI research, and personalization.

Try Leadpipe free with 500 leads to test how many of your anonymous visitors are identifiable before committing to the full stack.


FAQ

How is visitor identification different from enrichment?

Enrichment takes a contact you already know and adds more data to their record. Visitor identification discovers contacts you didn’t know existed by resolving anonymous website sessions into real people. They’re complementary - identification creates the contacts, enrichment completes them. For a deeper comparison, see our guide on how these categories relate.

Does Leadpipe replace Clay?

No. Leadpipe and Clay do fundamentally different things. Leadpipe identifies anonymous website visitors (Layer 0). Clay enriches known contacts through a multi-provider waterfall (Layer 2). They work together in the same stack. Leadpipe creates the leads. Clay makes them complete. Most teams using both report 5-10x more enriched leads than Clay alone because Clay finally has a source of high-intent contacts to work with.

What match rate should I expect from the identification layer?

Leadpipe’s deterministic matching typically achieves 30-40% match rates depending on traffic quality. B2B-heavy sites with US traffic tend to land at the higher end. International traffic and B2C-heavy sites will be lower. The key differentiator is that Leadpipe uses its own proprietary identity graph - not a resold third-party graph - and identifies visitors even without LinkedIn profiles, which is a limitation of tools like RB2B.

Can I build the waterfall myself without Clay?

Yes. If you have engineering resources, you can orchestrate the waterfall directly via API calls to enrichment providers. The Leadpipe developer guide covers webhook payloads and API integration in detail. You’ll need to handle the cascade logic, error handling, deduplication, and rate limiting yourself. Clay handles all of that out of the box, which is why it’s the recommended path for most teams.



The gap in most B2B data stacks isn’t enrichment quality. It’s enrichment coverage. You can waterfall through every provider on the planet and still miss 97% of your website visitors because you never identified them in the first place.

Layer 0 fixes that. Add visitor identification before your enrichment waterfall, and you go from enriching 300 form fills to enriching 3,000+ identified, high-intent contacts from the same traffic.

The complete signal stack - identify, validate, enrich, score intent, research, qualify, and act - costs under $650/month and produces fully enriched leads at $0.32-0.97 each. That’s cheaper than a single ZoomInfo seat. And the leads are warmer because they were on your site, showing real intent, when you identified them.

Start with 500 free identified leads and see how many of your anonymous visitors are already identifiable. No credit card. No sales call. Just data.