How to Find & Merge Duplicate Company Names in a Spreadsheet or CSV
You exported your contact list. You ran Remove Duplicates. You imported it into your CRM.
Three months later, your sales team is calling the same company twice, your campaign reports are split across four rows, and your account executive just sent a cold outreach email to a client you closed last quarter.
Here's what happened: Excel found the exact duplicates. It missed the rest.
"Microsoft Corp", "Microsoft Corporation", "MSFT", and "microsoft corp." are four records for the same company. Remove Duplicates sees four distinct strings and keeps all of them.
From a spreadsheet's perspective, they aren't duplicates at all.
This is the company name duplicate problem — and it's one of the most common sources of dirty CRM data.
Why "Remove Duplicates" Doesn't Work for Company Names
Remove Duplicates, whether in Excel or Google Sheets, works on exact character matches. Two cells need to contain the precisely identical string for the tool to flag them.
Real company names are almost never entered consistently:
| What got entered | What it actually is |
|---|---|
| Apple Inc. | Apple Inc |
| apple inc | Apple Inc |
| Apple Incorporated | Apple Inc |
| APPLE INC. | Apple Inc |
| Salesforce.com | Salesforce |
| salesforce | Salesforce |
| Salesforce Inc | Salesforce |
| 3M Company | 3M |
| Three M | 3M |
Every row in that table would survive Remove Duplicates. Every row is a duplicate.
This happens because data entry is human. Salespeople type quickly. Leads come from forms with no validation. Lists get imported from trade shows, scraped from websites, or exported from tools that format things differently. No two humans type a company name the same way — and no two systems store it the same way either.
The Two Types of Company Name Duplicates
Understanding what you're dealing with helps you choose the right approach.
Type 1: Formatting variations
Same company name, different punctuation, casing, or abbreviations. "Inc.", "Inc", "Incorporated". "Corp.", "Corp", "Corporation". Uppercase vs. lowercase. These are the easiest to catch — a bit of text normalization handles most of them.
Type 2: Genuine name variants
Different words, same company. "Salesforce" vs. "Salesforce.com". "3M" vs. "3M Company" vs. "Three M". "Meta" vs. "Facebook" vs. "Meta Platforms Inc." These are harder — you need character-level similarity scoring to catch them, not just normalization.
Most tools handle Type 1. Very few handle Type 2 reliably at any meaningful scale.
Key Takeaways (So Far)
- Excel's Remove Duplicates only catches exact character matches — it misses most real-world company name duplicates
- Company names vary because of casing, punctuation, abbreviations, and genuine name variants
- You need fuzzy matching (similarity scoring) to catch Type 2 duplicates, not just text normalization
What Is Fuzzy Matching and Why Does It Matter Here?
Fuzzy matching is a technique that measures how similar two strings are, rather than whether they're identical. It assigns a similarity score between 0 and 1 — where 1.0 means identical and lower scores mean increasingly different strings.
"Microsoft Corp" vs. "Microsoft Corporation" → similarity score: ~0.91
"Salesforce" vs. "Salesforce.com" → similarity score: ~0.83
"Apple Inc" vs. "Apple Incorporated" → similarity score: ~0.72
"IBM" vs. "International Business Machines" → similarity score: ~0.29
You set a threshold — typically between 0.75 and 0.85 for company names — and any pair scoring above it gets flagged as a likely duplicate.
This is the approach that actually catches the variants Remove Duplicates misses. The threshold matters: too high (e.g. 0.95) and you only catch obvious typos. Too low (e.g. 0.6) and you start merging companies that aren't the same. For company names, 0.80 is a reasonable starting point.
What Similarity Threshold Should I Use for Company Names?
This is one of the most common questions — and the honest answer is: it depends on your data.
- 0.80–0.85 works well for most general company name lists. It catches formatting variants and close abbreviations without generating too many false positives.
- Go higher (0.88–0.92) if your list contains many short names or abbreviations (e.g. "IBM", "3M", "HP") — short strings score high similarity against unrelated names more easily.
- Go lower (0.75–0.80) if you have many legitimate name variants like "Salesforce" vs. "Salesforce.com" and you'd rather review more candidates manually than miss genuine duplicates.
Whatever threshold you start with, always spot-check a sample of flagged pairs before merging. No tool is perfect — human review on a random 20-row sample before you commit to a merge is always worth the five minutes.
A Worked Example: Before and After
Here's a real-world-style company list with 10 records — the kind you'd get after merging a trade show export with a CRM dump:
Before (raw list):
- Acme Corporation
- acme corp.
- ACME Corp
- Globex Industries
- Globex Ind.
- globex industries inc
- Initech
- Initech LLC
- Umbrella Corp
- Umbrella Corporation
After fuzzy deduplication (threshold: 0.80):
| Cluster | Records grouped | Canonical name chosen |
|---|---|---|
| 1 | Acme Corporation, acme corp., ACME Corp | Acme Corporation |
| 2 | Globex Industries, Globex Ind., globex industries inc | Globex Industries |
| 3 | Initech, Initech LLC | Initech |
| 4 | Umbrella Corp, Umbrella Corporation | Umbrella Corporation |
10 records → 4 canonical companies. A standard Remove Duplicates would have returned all 10.
How to Actually Do This: Your Options
Option 1: Excel (limited, manual)
Excel has no built-in fuzzy matching. The closest you can get is using EXACT() (which is just case-sensitive exact match), or writing complex nested SUBSTITUTE() and TRIM() formulas to normalize strings before comparing them. This handles Type 1 duplicates (casing, punctuation) but completely misses Type 2 (genuine name variants).
For small lists under 200 rows where all the variation is formatting-based, this can work. Beyond that, it breaks down quickly.
Verdict: Only useful for very small lists with simple formatting issues.
Option 2: Google Sheets add-on (e.g. Flookup, Fuzzy Lookup)
Several Sheets add-ons add fuzzy matching capabilities directly in the spreadsheet. You select a column, run the function, and get similarity scores or flagged duplicates back in adjacent cells.
The main limitation is scale: these tools run inside Google's Apps Script environment, which has strict execution time limits. For larger files — anything over roughly 30,000–50,000 rows — they frequently time out and require multiple partial runs.
Verdict: Good for moderate-sized lists within Google Sheets. Gets slow and cumbersome above ~30k rows.
Option 3: A dedicated deduplication tool
Standalone tools built specifically for this job — where you upload a file, tune your settings, and download a clean result — handle the whole workflow without requiring you to live inside a spreadsheet. They typically run faster than spreadsheet add-ons, handle larger files, and give you a clean downloadable output.
The key features to look for: configurable similarity threshold, preprocessing options (handling of "Inc.", "LLC", "Corp." suffixes; case normalization; punctuation removal), and clear output that shows you which records were grouped together before you commit to the merge.
Verdict: Best for larger files, one-off projects, or anyone who'd rather not work in Sheets.
What to Look for in a Deduplication Tool for Company Names
Not all fuzzy matching is equal. When evaluating a tool for company name deduplication specifically, these features matter:
- Business entity suffix handling. "Inc.", "LLC", "Corp.", "Ltd.", "GmbH", "S.A." — a good tool should let you strip these before comparing, so "Acme Inc." and "Acme LLC" score near-identical rather than slightly different. This single feature dramatically improves match quality for company lists.
- Token sort / word order independence. "Smith Johnson Partners" and "Johnson Smith Partners" are likely the same firm. A tool that scores similarity based on sorted tokens rather than raw character sequence handles this correctly. One that doesn't will miss it.
- Configurable threshold. You need to be able to tune the sensitivity. Any tool that runs one fixed threshold and returns results with no way to adjust is going to give you either too many false positives or too many misses, depending on your data.
- Cluster output, not just pairs. If records A, B, and C are all the same company, you want them grouped together as a cluster — not just told "A matches B" and "B matches C" as separate pairs. Cluster output is what you actually need to deduplicate a list.
- Preview before download. Always want to be able to see what's going to be merged before it is merged. A good tool shows you the grouped clusters with their similarity scores so you can spot-check before committing.
The Scale Question
For small lists — a few hundred to a few thousand rows — almost any approach works, including manual review. The problem gets harder fast as lists grow.
At 10,000 rows, a naive approach comparing every record to every other record makes ~50 million comparisons. At 100,000 rows, that's 5 billion. This is why browser-based tools and spreadsheet add-ons start to struggle above a certain size — they're doing more computation than the environment was designed for.
Properly built deduplication tools use blocking and indexing strategies to avoid brute-force comparison: they pre-filter candidates so only plausible matches get scored, which is what makes large-scale deduplication fast. If you're working with a list above 50,000 rows, it's worth checking whether the tool you're using handles this — otherwise you'll be waiting a long time or hitting timeouts.
Key Takeaways
- Remove Duplicates only catches exact matches — it misses almost all real-world company name duplicates
- Fuzzy matching compares strings by similarity score rather than exact equality — it's the correct tool for this job
- A threshold of 0.80–0.85 works for most company name lists; adjust based on your data
- Business entity suffix stripping (Inc., LLC, Corp.) and token sort are the two preprocessing features that most improve match quality for company names
- Always spot-check a sample of flagged pairs before merging — no automated tool is 100% accurate
- Scale matters: spreadsheet add-ons work for moderate lists; dedicated tools handle larger files without timeouts
Frequently Asked Questions
Clean Your Company List Now
If you have a spreadsheet with duplicate company names and want to clean it without formulas or add-ons, you can upload it directly and get a deduplicated file back in minutes.