How to Reconcile Leads Against Contacts in Salesforce at Scale

March 20268 min readBy Similarity API Team

Duplicate people records are almost inevitable in modern Salesforce environments. Identity data flows in from forms, enrichment tools, outbound prospecting, partner systems, event imports, product signups, and manual entry. Even with well‑run processes, slight variations in names, emails, titles, and company formats accumulate over time — especially as multiple systems feed the same CRM.

At scale, teams eventually need a way to answer very practical questions:

  • Which of the new leads we are importing already exist as contacts?
  • Which account owner should this inbound lead actually belong to?
  • How do we clean identity data across the CRM before a migration or reporting reset?

This is where lead‑to‑contact reconciliation workflows typically emerge.

Why teams run this workflow

The motivation is operational.

  • reporting accuracy — duplicate identities fragment attribution and pipeline analytics
  • routing correctness — new leads often need to inherit ownership from existing accounts
  • import risk reduction — bulk uploads can create thousands of duplicates without pre‑checks
  • automation enablement — teams surface similar contacts, block conversions, or auto‑assign ownership

Over time this becomes a recurring RevOps capability rather than a one‑time cleanup task.

What this looks like in practice

Common patterns include:

Pre‑import identity checks

  • export contacts
  • reconcile new leads against the contact base
  • review high‑confidence matches
  • merge or update before import

Scheduled identity cleanup jobs

  • compare recently created leads to contacts
  • write suggested match IDs or similarity scores to custom fields
  • create review queues for RevOps

Automation‑driven identity resolution

  • Apex triggers call an HTTP reconciliation endpoint before lead insert
  • Salesforce Flows surface candidate matches for SDR review
  • nightly jobs reassign leads to existing account owners

At this stage, similarity matching becomes part of operational CRM infrastructure.

Exact vs similarity matching in CRM reconciliation

Traditional CRM deduplication relies on exact matching — typically email equality or strict rule logic. This works well when identifiers are clean and consistent.

In real GTM environments, identity signals drift:

  • people use multiple emails
  • company names are formatted differently
  • titles and suffixes vary
  • records are created by different systems and teams

This is where similarity‑based matching becomes necessary. Instead of asking "are these fields identical?" the workflow asks "are these records likely to represent the same real‑world person?"

Exact matching remains useful as a first filter. Similarity matching extends coverage to ambiguous cases that exact rules cannot resolve at scale.

How reconciliation pipelines usually work

Conceptually, identity matching pipelines involve:

  1. preprocessing — normalize casing, punctuation, token order, company suffixes
  2. similarity calculation — compare identity strings
  3. filtering — keep matches above a confidence threshold

This logic is straightforward on small datasets. It becomes harder when:

  • CRM datasets grow into hundreds of thousands of records
  • imports and enrichment create continuous identity drift
  • reconciliation must run frequently or automatically

This is typically where teams move from ad‑hoc scripts to more scalable approaches.

Substituting the pipeline with a single reconciliation call

Build it yourself

⚙️Design & algorithm selection
Preprocessing & normalization
🧱Blocking strategy (for scale)
📊Scoring & threshold tuning
🔽Filtering & candidate ranking
📁Output formatting

Pipeline to build, test, and maintain

VS

Call Similarity API

Similarity API

1 API Call
One integration
Scales automatically
No maintenance
Any HTTP environment

In practice, this entire comparison workflow can be replaced with one API request:

payload = {
    "data_a": lead_match_strings,
    "data_b": contact_match_strings,
    "config": {
        "similarity_threshold": 0.82,
        "top_n": 3,
        "to_lowercase": True,
        "remove_punctuation": True,
        "use_token_sort": True,
        "output_format": "flat_table"
    }
}

res = requests.post(
    "https://api.similarity-api.com/reconcile",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json=payload
).json()

The key design choice is defining the identity string — typically a combination of first name, last name, email, company or account name, and title.

Example output

With flat_table, results are returned as row‑level matches keyed by dataset indexes.

index_atext_aindex_btext_bscorematched
0Jane | Doe | jane@acme.com | Acme Inc1542Jane | Doe | j.doe@acme.com | Acme0.93TRUE
0Jane | Doe | jane@acme.com | Acme Inc9811Janet | Doe | janet@acme.com | Acme Corp0.84TRUE
1Mark | Lee | mark@north.io | North IO2207Marc | Lee | mlee@north.io | North.io0.81FALSE

Other output formats are available. This one is commonly used because it makes it easy to:

  • join results back to Salesforce Lead and Contact IDs
  • inspect candidate matches in review queues or notebooks
  • feed downstream merge, routing, or automation workflows

Ultimately, lead‑contact reconciliation is not just about deduplicating records. It is about establishing a scalable way to interpret identity similarity across revenue systems — whether the workflow runs from a notebook, an ETL job, an Apex callout, or any HTTP‑based automation layer.

Try it on your own CRM data

Upload a CSV of leads and contacts — up to 100k rows free, no setup needed.