How to Check If Contacts Are Already in Your CRM Before Importing

April 202614 min readBy Similarity API Team

You have a new list of leads — from a trade show, a vendor, a Sales Navigator export, or a data provider. You want to import it into your CRM. But you know some of these people are probably already in there. You've been collecting contacts for years. There's overlap. The question is how much.

Every CRM gives you the same advice: use the "update existing contacts" setting on import, and it'll match on email. That works for contacts who appear in both lists with the same email address. It doesn't work for:

  • Contacts with no email in the new list (trade show badge scans often don't include email)
  • Contacts who gave a personal email to your CRM and a work email to the vendor
  • Contacts whose name is spelled slightly differently across the two sources
  • Company names that are the same company written differently — "Acme Corp" vs "Acme Corporation"

So the standard advice creates a false sense of safety. Your CRM accepts the import, says "0 duplicates created," and you end up with dozens or hundreds of new records that are actually existing contacts.

Want to reconcile your datasets in under 2 minutes?

Upload two CSV files and find matches in seconds — no signup, no install, 1,000 rows free.

Try it for free →

Why Email-Only Matching Falls Short

Every major CRM — HubSpot, Salesforce, Pipedrive, Zoho — deduplicates contacts on email address during import. If the incoming email matches an existing contact's email exactly, it updates rather than creates. If it doesn't match — for any reason — a new record is created.

The scenarios where this fails are common:

  • Missing email. Trade show badge exports frequently have no email. Every one of those contacts creates a new record regardless of whether they've filled out a form on your site.
  • Different email address. The same person may have submitted a form with their work email and given a personal email to a data vendor. Two different emails, zero match.
  • Company contacts without individual emails. If you're matching companies rather than individuals, email doesn't help at all — HubSpot deduplicates companies on domain name, Salesforce on account name (exact match).
  • Name variants. "Jennifer Walsh" and "Jen Walsh" are the same person. Your CRM treats them as different contacts.

The Standard Workaround — and Why It Doesn't Work

The most common advice in HubSpot Community threads and Salesforce forums is: export your existing contacts as a CSV, then use VLOOKUP in Excel to compare against your import file.

This works when both files have the same email for the same person. It completely fails when:

  • The emails are different or missing
  • Company names are formatted differently across the two files
  • You're trying to match on name + company together rather than just email

VLOOKUP checks for exact character matches. "Jen Walsh" and "Jennifer Walsh" return #N/A. "Acme Corp" and "Acme Corporation" return #N/A. You're left with a long list of apparent non-matches that may actually be existing contacts — and no way to know which are genuine new records.

How to Actually Do This

The reliable method is fuzzy matching across two files — your import list and your CRM export — with name and company as matching fields alongside email.

Step 1: Export your existing CRM contacts

  • From HubSpot: Contacts → Actions → Export
  • From Salesforce: Reports → New Report → Contacts, export as CSV
  • Include: first name, last name, company name, email, and any CRM record ID

Step 2: Prepare your import file

Make sure both files have comparable columns — first name, last name, company. If your import file has full name in a single column, split it before comparing.

Step 3: Compare the two files using fuzzy matching

This is where exact-match tools fall short. You need a tool that scores similarity between records — so "Jen Walsh at Acme Corp" in your import file matches "Jennifer Walsh at Acme Corporation" in your CRM export, even though neither field is identical.

Clean by Similarity API's reconciliation feature handles this: upload your import file as File A and your CRM export as File B, select the columns to match on, and it returns three categories:

  • Matched — records in your import file that likely already exist in your CRM
  • Net-new — records in your import file with no match in your CRM
  • In CRM only — records in your CRM that don't appear in your import file

You review the matched records with similarity scores before deciding what to do with them — skip on import, update existing records, or flag for manual review.

Step 4: Import only the net-new records

Take the net-new group and import them. For the matched group, decide per record: does the import file have fresher information (new job title, direct phone) that's worth updating? Or is the existing CRM record more complete?

What to Match On

Email first. If both files have email, use it. Exact email match is reliable and fast. Filter out confirmed matches before running fuzzy matching on the remainder — this reduces noise significantly.

Name + company together for the rest. For records without email matches, combine first name, last name, and company into a single match signal. "Jennifer Walsh at Acme Corporation" and "Jen Walsh at Acme Corp" score highly enough to flag as a likely match. Neither name alone nor company alone would be sufficient — the combination is what makes it reliable.

Company name only for account-level matching. If you're reconciling company lists (accounts in Salesforce, companies in HubSpot) rather than individual contacts, match on company name with suffix normalization — strip Inc., LLC, Corp., Ltd. before comparing.

Handling the Grey Area

Fuzzy matching returns similarity scores, not yes/no decisions. You'll have:

  • High-confidence matches (score > 0.85): Almost certainly the same person. Safe to skip on import or flag as existing.
  • Medium-confidence matches (score 0.70–0.85): Review manually. Could be the same person with an unusual name variant, or two different people at the same company with similar names.
  • Low-confidence (score < 0.70): Treat as net-new. The similarity is probably coincidental.

The threshold you set depends on how much false-positive risk you're willing to accept. For a trade show list where you care most about not contacting existing customers with introductory messaging, a lower threshold (catching more possible matches for review) is better than missing existing contacts.

Key Takeaways

  • CRM import deduplication on email address alone misses a large percentage of actual existing contacts — missing emails, different emails, and name variants all slip through
  • VLOOKUP comparison of two CSV files only catches exact string matches — the same limitations as the CRM itself
  • Fuzzy matching on name and company together catches what email-only matching misses — the same person written differently across two sources
  • Matching on both files before import gives you three actionable groups: confirmed existing, likely existing (for review), and genuinely net-new
  • For company-level matching, normalizing business suffixes before comparing dramatically increases match rate

FAQ