Data Cleaning Checklist Before Importing Contacts to HubSpot or Salesforce
Most CRM import problems are preventable. HubSpot and Salesforce both deduplicate on exact match only — email address for contacts, domain name or account name for companies. Anything that doesn't match exactly creates a new record. A 20-minute clean before import saves hours of manual merging afterward.
Here's the checklist, in the order it should be done.
Want to dedupe your CSV in under 2 minutes?
Upload your CSV and find duplicates in seconds — no signup, no install, 1,000 rows free.
Try it for free →1. Remove Blank and Malformed Rows
Before anything else, remove rows that will fail on import or create empty records.
- Delete rows with no name and no email
- Delete rows that are clearly test data ("test", "asdf", "123")
- Remove header rows that appear mid-file (common in merged exports)
- Check for rows where all fields are shifted one column (import mapping errors)
2. Standardize Formatting
Inconsistent formatting causes duplicates even when the underlying data is identical.
- Trim leading and trailing spaces from all fields (invisible but cause matching failures)
- Normalize capitalization — title case for names, lowercase for emails
- Standardize phone number format if you're importing phone numbers
- Remove special characters from name fields that your CRM doesn't accept
- Check that email addresses are valid format — no missing @, no spaces
3. Normalize Company Names
Company name is the field most likely to create duplicates on import.
- Decide on one format for business entity suffixes and apply it consistently — or strip them entirely before comparing. "Inc.", "Incorporated", "Inc" are all the same thing.
- Normalize abbreviations — "&" vs "and", "Intl" vs "International"
- Flag any company names that are clearly the same entity written differently ("IBM" vs "International Business Machines")
4. Deduplicate Within Your Import File
Before touching your CRM, find duplicate records within the file itself. This is the step most people skip — and the one that creates the most post-import cleanup work.
For exact duplicates (same email, same name): Google Sheets or Excel Remove Duplicates handles this fine.
For near-duplicates (name variants, missing emails, abbreviation differences): you need fuzzy matching. Standard spreadsheet tools will miss "Jennifer Walsh" and "Jen Walsh", or "Acme Corp" and "Acme Corporation".
Clean by Similarity API does this without any setup — upload your file, match on company name and contact name together, download a clean version. It catches the variants that exact-match tools miss and lets you review duplicate clusters before committing.
Key things to catch at this step:
- Same contact, two different email addresses
- Same company, name written differently across rows
- Same person at the same company, submitted twice from different forms
- Trade show leads that duplicated contacts already in your import file
5. Match Against Your Existing CRM Data
Your import file might be clean internally but still contain contacts that already exist in your CRM under a different email or slightly different name.
- Export your existing CRM contacts/accounts
- Compare your import file against that export — flag records that likely already exist
- For matches: decide whether to update the existing record or skip the import row
- For HubSpot: ensure every contact has an email address — without one, HubSpot creates a new record even if the person already exists
- For Salesforce: ensure every account has a domain name or Record ID — Salesforce won't deduplicate on account name alone for all import methods
6. Validate Key Fields for Your CRM's Deduplication Logic
Each CRM has specific fields it uses to detect duplicates on import. If these fields are missing or wrong, you'll get duplicates regardless of how clean the rest of your data is.
HubSpot:
- Every contact row has an email address (HubSpot deduplicates contacts on email only)
- Every company row has a domain name (HubSpot deduplicates companies on domain only)
- Domains are normalized — no "www.", no trailing slashes, consistent format
Salesforce:
- Accounts have a website/domain field populated where possible
- If updating existing records, include Salesforce Record ID as the unique identifier
- Run a small test import (10–20 rows) first to verify deduplication behavior before the full file
7. Final File Check Before Upload
- Column headers match exactly what your CRM expects (or you've mapped them)
- No merged cells (Excel exports sometimes carry these)
- File is saved as CSV or XLSX — not ODS or Numbers format
- File size is within your CRM's import limit
- You have a backup of the original file before import
Key Takeaways
- Do it in order — formatting before deduplication, deduplication before CRM matching. Each step depends on the previous one being done.
- Fuzzy matching is not optional for real-world contact data — exact-match tools miss most of the actual duplicates in lists built from multiple sources.
- HubSpot and Salesforce both have hard deduplication rules — email for HubSpot contacts, domain for HubSpot companies, Record ID or domain for Salesforce. If those fields are missing, you'll get duplicates regardless of everything else.
- A 20-minute clean before import beats hours of manual merging after — the Duplicate Manager in HubSpot and Salesforce shows pairs one at a time, requires manual review, and caps at 2,000–10,000 pairs depending on your plan.
Free for files up to 1,000 rows. No signup required.