How to Clean an Apollo, ZoomInfo, or Lusha Export Before Importing to Your CRM
Buying or exporting leads from Apollo, ZoomInfo, Lusha, or any data vendor is one of the fastest ways to build a prospect list. It's also one of the fastest ways to pollute your CRM with duplicates.
The problem isn't the tools — it's that they aggregate data from multiple sources, each with its own naming conventions. The same company appears as "Acme Corp", "Acme Corporation", "Acme Inc.", and "ACME" depending on which source each record came from. Import that directly and your CRM has four account records for one company before you've made a single call.
Want to dedupe your CSV in under 2 minutes?
Upload your CSV and find duplicates in seconds — no signup, no install, 1,000 rows free.
Try it for free →Why Data Vendor Exports Are Consistently Messy
Company names are aggregated, not standardized. Apollo, ZoomInfo, and Lusha pull company data from LinkedIn, company websites, SEC filings, and other sources. Each source formats company names differently. The tools don't normalize before exporting.
You've probably run overlapping searches. If you've exported "VP Sales at SaaS companies, 50–200 employees, US" twice over six months, you have many of the same contacts in both exports — often with slightly different details if the vendor updated their data.
Contacts may already be in your CRM. Anyone who filled out a form on your site, attended one of your webinars, or was manually added by a rep likely exists in your CRM. They also exist in Apollo or ZoomInfo. Their LinkedIn email and their work email may be different. Exact-match deduplication on email misses this entirely.
The same contact, multiple titles. Data vendors update job titles as people change roles. If you exported the same person six months apart, they may appear twice with different titles but the same name and company.
Step 1: Combine Multiple Exports First
If you've exported from the same tool multiple times, or from multiple tools, combine all exports into a single file before cleaning. Deduplicating across all your vendor data at once is more efficient than cleaning each file separately and then merging later.
Watch for column name differences between exports from different tools — Apollo, ZoomInfo, and Lusha all use slightly different header names for the same fields. Standardize headers before combining.
Step 2: Normalize Company Names
Company name is the field with the most variation in vendor exports. Normalizing before deduplicating converts many fuzzy matches into exact matches.
- Strip business suffixes: Inc., Corp., LLC, Ltd., GmbH, PLC, Limited, Incorporated — "Acme Corp" and "Acme Corporation" both become "Acme"
- Lowercase everything for comparison
- Remove punctuation: "Johnson & Johnson" and "Johnson and Johnson" should match
- Flag obvious abbreviations: "IBM" vs "International Business Machines" won't match even after normalization — these need manual or fuzzy review
Step 3: Fuzzy-Deduplicate Within the Combined Export
After normalizing, deduplicate. The key is using fuzzy matching — not exact match — because real-world vendor data still has plenty of variation after normalization.
Match on company name and contact name together. Matching on one column alone misses cases where neither field is identical but the combination is clearly the same person. "Jen Walsh at Acme" and "Jennifer Walsh at Acme Corp" match strongly on the combination even though neither field is an exact match.
Clean by Similarity API handles this in one upload — select both columns, set your sensitivity, review the duplicate clusters, and download a clean file. No account needed to get started.
What to catch at this step:
- Same contact, name variant across two exports
- Same company, name written differently across records
- Same contact, different job title (check if these should be merged or kept as separate history)
- Contacts exported from two different tools who are actually the same person
Step 4: Compare Against Your Existing CRM
This is the step that prevents creating duplicates of records you already own. Your CRM has contacts and companies built up over years — many of them will appear in a fresh vendor export under a slightly different email or company name format.
Export your existing CRM contacts and accounts, then compare against your cleaned vendor export. Records that score as likely matches already exist in your CRM. For those, you have two options:
- Skip the import row — if the existing record is complete, no update needed
- Update the existing record — if the vendor data has fresher information (new job title, direct phone number), map the import row to the existing record's ID
This comparison requires matching two separate files — your CRM export and your vendor export. This is a reconciliation problem rather than a deduplication problem. See our guide on matching two lists for how to approach this step.
Tool-Specific Notes
Apollo: Exports include a LinkedIn URL column — this is your most reliable deduplication key within Apollo data. Two rows with the same LinkedIn URL are always the same person. Check for this before running fuzzy matching.
ZoomInfo: Company names tend to be more standardized than Apollo since ZoomInfo maintains its own company database. But contact names and emails still vary, especially for contacts exported at different times.
Lusha: Exports are typically smaller and more targeted. The main issue is contacts who appear in Lusha and in your existing CRM under different emails — personal vs work, or old employer email that's still in your CRM.
All tools: Watch for contacts with generic or role-based emails (info@, sales@, contact@) — these are rarely useful for deduplication and often cause false matches.
Key Takeaways
- Data vendor exports aggregate from multiple sources, each with different naming conventions — company name variation is structural, not a data quality failure
- Combining all vendor exports into one file before cleaning is more efficient than cleaning each separately
- Fuzzy matching on company name and contact name together catches significantly more duplicates than either field alone
- Comparing against existing CRM data before importing prevents duplicating records you already own — the most common and most avoidable source of CRM duplicates from vendor imports
- Apollo's LinkedIn URL column is a reliable exact-match deduplication key within Apollo exports specifically
Free for files up to 1,000 rows. No signup required.