How to Deduplicate Your Contact List Before Importing to HubSpot

(The Problem with Email-Only Matching)

March 202610 min readBy Similarity API Team

You spent weeks sourcing a contact list. You imported it into HubSpot. And now you have three records for the same person — because they gave you their work email once, their personal email once, and a colleague typed their name slightly differently the third time.

HubSpot didn't flag any of them as duplicates. From HubSpot's perspective, they aren't.

This is the most common import mistake in HubSpot — and it's not user error. It's a gap in how HubSpot's native deduplication works, and it's worth understanding before your next import rather than cleaning up after it.

What HubSpot Actually Does (and Doesn't Do)

HubSpot's automatic deduplication for contacts is built on a single rule: email address.

When a new contact arrives — via import, form submission, or API — HubSpot checks whether a contact with that exact email address already exists. If one does, it updates the existing record. If not, it creates a new one.

That's it. That's the full logic for automatic deduplication.

For companies, the equivalent rule is domain name: HubSpot matches on the primary company domain field, not the company name.

What this means in practice:

ScenarioHubSpot's behavior
Same person, same email, imported twice✅ Correctly deduplicates
Same person, two different emails❌ Creates two records
Same person, email missing on one record❌ Creates two records
"Microsoft Corp" vs "Microsoft Corporation"❌ Creates two records
"microsoft.com" vs "microsoft.com" (domain)✅ Correctly deduplicates companies
"Microsoft" with no domain vs existing record❌ Creates a new company record

The pattern is clear: HubSpot deduplicates reliably on exact-match fields. Anything that requires judgment — name similarity, missing fields, formatting differences — it misses.

Why This Happens More Than You Think

Duplicate contacts don't come from careless data entry alone. They come from the normal reality of collecting contact data from multiple sources:

  • Trade show exports. Badge scanners produce first name, last name, company — often no email, or a personal email rather than work email. These don't match your CRM records that were built from LinkedIn outreach or form fills.
  • List purchases or enrichment. Third-party data providers use their own formatting conventions. "Johnson & Johnson" vs "Johnson and Johnson". "J&J" vs "Johnson Johnson". None of these match each other in HubSpot.
  • Form submissions without email. Some forms don't require email. HubSpot creates a record anyway — and when that person later submits a form with their email, you now have two records.
  • Manual entry by different reps. One rep types "Sarah Smith, Acme Inc." Another types "S. Smith, Acme Incorporated." Different emails. Three months of activity on both records. Now you have two records and split history.
  • CRM migrations and merges. Any time data moves between systems — a Salesforce migration, an acquisition, a tool consolidation — name and contact formatting differences multiply.

None of these get caught by email-exact-match. They all survive into your HubSpot database.

Key Takeaways (So Far)

  • HubSpot automatically deduplicates contacts on email address only — nothing else is automatic
  • For companies, HubSpot deduplicates on domain name — company name variations are not matched
  • The most common duplicate scenarios — missing emails, multiple email addresses, name variations — all slip through
  • These duplicates come from normal data collection: trade shows, enrichment, form fills, manual entry, migrations

What HubSpot's Native Duplicate Manager Does (and Its Limits)

HubSpot does have a built-in Duplicate Manager tool — but it's worth being clear about what it is and isn't.

The Duplicate Manager surfaces suggested duplicate pairs using a combination of name similarity, email, and phone number. You review each pair and decide whether to merge or dismiss. It's a manual review queue, not an automated process.

The limitations are significant:

  • It requires Professional tier or above. If you're on Starter — or using certain non-Marketing Hub plans — you don't have access to it at all.
  • It runs on a delay. HubSpot recalculates the duplicate suggestions roughly every two weeks. If you just imported a list with 500 duplicates, they won't appear in your queue immediately.
  • It caps results. Even with access, HubSpot surfaces around 2,000 duplicate pairs at a time. If your import created more than that, you're working through a partial view.
  • It's pair-based, not cluster-based. If five records all represent the same contact, HubSpot shows you pairs — A+B, B+C — rather than grouping all five together. You can end up doing multiple merge passes to fully resolve a single contact.
  • It's designed for ongoing maintenance, not bulk cleanup. Reviewing pairs one at a time is workable for a few dozen duplicates per month. After a large import with thousands of potential duplicates, it's not a realistic cleanup path.

The Real Fix: Clean Before You Import

The most effective approach to HubSpot duplicate contacts isn't cleaning them up after import — it's preventing them from entering in the first place. A clean import is exponentially easier to manage than a dirty database.

Here's what that looks like in practice:

Step 1: Standardize formatting before anything else

Before any fuzzy matching, simple normalization removes the easiest duplicates. In your spreadsheet:

  • Trim whitespace from all fields (leading and trailing spaces are invisible and cause exact-match failures)
  • Lowercase or title-case all name fields consistently
  • Strip punctuation variants from company names — remove trailing periods, normalize ampersands ("&" vs "and")
  • Remove business entity suffixes from company names for comparison purposes: Inc., LLC, Corp., Ltd., GmbH — or at minimum, standardize them to one form

A contact named " Sarah Smith " (with spaces) won't match "Sarah Smith" in any system. Normalization catches these before you even start matching.

Step 2: Deduplicate within your import file

Before touching HubSpot, deduplicate your import file itself. This is the step most people skip — and it's the most important one, because it's far easier to resolve duplicates in a spreadsheet than in a live CRM.

What you're looking for: records in your file that refer to the same person or company, regardless of whether they share the same email or are formatted identically.

For small files under a few hundred rows with simple formatting differences, a spreadsheet formula approach can work. For anything larger or with meaningful name variations — which is most real-world contact lists — you need fuzzy matching to catch records like these as duplicates:

Record ARecord B
Jennifer Walsh, Acme CorpJen Walsh, Acme Corporation
Robert Chen, Global Partners LLCBob Chen, Global Partners
acme.industries@email.comacme_industries@email.com

These would all survive a standard Remove Duplicates pass. Fuzzy matching scores their similarity and flags them for review or automatic grouping.

Step 3: Match against your existing HubSpot data

The second deduplication challenge is different: your import file might be clean internally, but contain contacts that already exist in HubSpot under slightly different names or emails.

This is a reconciliation problem — matching two lists against each other — rather than a deduplication problem. You export your existing HubSpot contacts, then compare your import file against that export, flagging records that likely represent the same person. This is exactly the kind of problem covered in Part 3 of this series: matching two lists with fuzzy logic.

A Before/After: What This Looks Like on a Real Import

Say you have a 500-row contact list from a trade show. Here's what a pre-import clean typically finds:

  • Formatting cleanup: 30–50 rows trimmed, standardized casing, punctuation removed. These would have created near-exact duplicates on import.
  • Internal fuzzy deduplication: 15–25 record pairs flagged as likely the same person — different email addresses, slight name variations, company name differences. You review and consolidate.
  • Match against existing HubSpot export: 40–80 contacts already in your CRM, often with more complete records (activity history, deal associations, email engagement). Rather than importing these and creating duplicates, you update the existing records instead.

Net result: a 500-row file becomes a ~380-row import of genuinely new contacts, with 80 existing records updated and 40 rows discarded as duplicates. That's a meaningfully cleaner CRM — without spending hours in HubSpot's Duplicate Manager after the fact.

Which Tool for Which Step

StepWhat you needTool options
Formatting / normalizationText cleanup in a spreadsheetExcel / Google Sheets formulas, or any data cleaning tool
Internal list deduplicationFuzzy matching within one fileDedicated dedupe tool, Google Sheets add-on
Match against HubSpot exportFuzzy matching across two files (reconciliation)Dedicated dedupe/reconcile tool
Post-import cleanupPair-based manual reviewHubSpot native Duplicate Manager (Professional+), Dedupely, Insycle

For the first three steps — everything before the import — a standalone tool where you upload a file and get a clean result back is the most practical option. You're not working inside HubSpot yet, so there's no reason to use a HubSpot-connected tool. A simple upload-and-download workflow is faster and keeps your live database out of the loop until you're confident in the result.

What to Look for in a Pre-Import Deduplication Tool

Since you're cleaning a file rather than connecting to a live CRM, the features that matter are different from CRM-native tools:

  • Fuzzy name matching, not just exact. The tool needs to score similarity between strings, not just check for identical values. "Jennifer Walsh" and "Jen Walsh" should be flagged as a likely match.
  • Company name handling. Business entity suffix stripping (Inc., LLC, Corp.) and word-order independence ("Global Partners LLC" ≈ "LLC Global Partners") dramatically improve match quality for company fields.
  • Configurable threshold. You need to be able to tune sensitivity — higher for tighter matching, lower to catch more variants. No single threshold works for all datasets.
  • Two-file comparison. For matching your import against your existing HubSpot export, the tool needs to support comparing list A against list B — not just deduplicating within a single list.
  • Downloadable clean output. The result should be a clean CSV you can import directly into HubSpot, not a report you then have to act on manually.
  • Data privacy. Check that data isn't stored after processing. You're likely uploading contact information — it should be processed ephemerally and not retained.

Key Takeaways

  • HubSpot deduplicates contacts on email address only — name variations, missing emails, and formatting differences all create duplicates
  • The Duplicate Manager has real limits — requires Professional tier, runs on a two-week delay, caps at ~2,000 pairs, and requires manual review
  • Clean before you import — it's far easier to resolve duplicates in a spreadsheet than in a live CRM
  • Three pre-import steps: normalize formatting, deduplicate within your file, match against your existing HubSpot export
  • For anything beyond a few hundred rows, you need fuzzy matching — exact-match tools miss most real-world duplicates

Frequently Asked Questions

Clean Your Import File Before It Hits HubSpot

Upload your contact list, set your similarity threshold, and download a deduplicated file ready to import — no formulas, no code, no manual review of 2,000 pairs in HubSpot after the fact.

Next in this series:

Part 3: How to Match Two Lists with Fuzzy Logic: Merging a Trade Show Export with Your CRM — the reconciliation step that catches contacts who already exist in HubSpot before you create duplicates on import.

Missed Part 1?

How to Find and Merge Duplicate Company Names in a Spreadsheet or CSV covers the same problem for company records specifically.