Fuzzy Matching in Excel (2026): Why It's Still Broken — and the Fastest Fix

June 202612 min readBy Similarity API Team

TL;DR

In 2026, Excel still has no built-in way to find near-duplicate names like "Acme Corp" and "Acme Corporation". Remove Duplicates only catches identical rows, the old Fuzzy Lookup add-in is a 2011 Windows-only relic, and Power Query's fuzzy merge bogs down on tens of thousands of rows.

The fastest reliable fix in 2026 is to drop your file into an online tool that runs the matching in the cloud. Clean by Similarity API does exactly this — free for files under 500 rows, no signup, no install, and you can tweak the results before downloading.

First, what is fuzzy matching?

Fuzzy matching (also called fuzzy lookup, near-duplicate detection, or approximate string matching) is the technique of finding records that look like the same thing but aren't typed identically. It's what lets a tool see that all of these refer to one company:

  • Acme Corp
  • Acme Corporation
  • ACME CORP.
  • acme corp

A fuzzy matcher gives every pair of records a similarity score between 0 (totally different) and 1 (identical). You set a threshold — usually somewhere between 0.80 and 0.95 — and anything above it is treated as the same entity. Used well, fuzzy matching collapses a messy spreadsheet of customers, leads, or companies into one row per real entity.

Here's what the problem actually looks like

Below is a tiny sample contact file. It has 12 rows — but if you look closely, they describe just 5 real people. Toggle the highlighting on and off to see the duplicates, then click through the three output formats to see what a fuzzy matcher returns.

Interactive sample

contacts.xlsx — 12 rows, 5 real people

Input file

The same 5 people, written 12 different ways. Remove Duplicates in Excel finds zero matches — every row is technically unique.

#nameemailcompany
1Jennifer Walshjen.walsh@acme.comAcme Corp
2Jen Walshj.walsh@acme.comAcme Corporation
3JENNIFER WALSHjennifer@acme.comacme corp.
4Mike O'Brienmike@globex.ioGlobex Inc.
5Michael O'Brienmichael@globex.ioGlobex, Inc
6M. O'Brienmob@globex.ioGlobex
7Sara Leesara@initech.comInitech LLC
8Sara Leesara@initech.comInitech, LLC
9David Kimdkim@stark.comStark Industries
10Dave Kimd.kim@stark.comStark Industries Ltd.
11Priya Patelpriya@wayne.coWayne Enterprises
12Priya P.priya.p@wayne.coWayne Ent.

What you get back — click between the three output formats

One row per real person. Ready to import — 12 messy rows collapsed to 5.

nameemailcompanyrows merged
Jennifer Walshjen.walsh@acme.comAcme Corp3
Mike O'Brienmike@globex.ioGlobex Inc.3
Sara Leesara@initech.comInitech LLC2
David Kimdkim@stark.comStark Industries2
Priya Patelpriya@wayne.coWayne Enterprises2

Excel sees 12 unique rows. Every name, email, and company is spelled at least one character differently. Remove Duplicates returns "0 duplicate values found". That's the gap we're solving.

Why can I not delete duplicates in Excel?

You can — but only the easy ones. Excel's built-in Remove Duplicates button does an exact, character-for-character match. If two rows differ by a period, a space, the word "Inc.", or a typo, Excel considers them different rows and keeps both. For real CRM exports, trade-show lists, or Apollo/ZoomInfo dumps, that means most duplicates survive.

The real question is why Excel doesn't have a proper fuzzy match feature in 2026. There are three reasons, and they compound on each other.

1. Fuzzy matching is computationally expensive

An exact dedupe just sorts the column and walks down it — trivial work, even on a million rows. Fuzzy matching is fundamentally different: to know whether row 47 is similar to row 892, you have to compare them. In the worst case, every row has to be compared to every other row. That's N × N comparisons.

For 1,000 rows that's a million comparisons — fine. For 20,000 rows it's 400 million. For 100,000 rows it's 10 billion. Excel runs on your laptop, in a single thread, with a memory ceiling. It was simply never designed to do this kind of work, and any honest implementation inside Excel would freeze the application for minutes or hours on the kind of files people actually want to clean.

Modern online tools dodge this by running the matching on cloud servers with a proprietary algorithm that's tuned for this exact job — faster than most fuzzy-matching APIs and orders of magnitude faster than anything you can do inside a spreadsheet.

2. The add-ins Microsoft shipped never grew up

Microsoft did try. In 2011, they released a free Fuzzy Lookup add-in for Excel. It still exists. It still works on small files. And it has been functionally untouched for over a decade. It's Windows-only, doesn't run on Excel for Mac, doesn't run on Excel for the web, and isn't supported on M365 in any meaningful way. There's no roadmap, no updates, no official support channel. If you've tried to install it on a modern machine, you know.

The newer answer from Microsoft is Power Query's fuzzy merge, available in Excel and Power BI. It's the closest thing to a real built-in fuzzy matcher, and for small-to-medium files it works. But:

  • It's a merge tool, designed to join two tables, not to deduplicate one. Using it for dedupe means joining a table to itself, which doubles the work and is genuinely awkward to set up.
  • The similarity algorithm is a single knob (a threshold from 0 to 1) with almost no transparency — there's no way to see why two records matched or didn't, and no audit trail to show stakeholders.
  • Performance falls off a cliff somewhere in the tens of thousands of rows. People report Power Query queries that take 20 minutes, hang, or never finish on files Clean can process in under a minute.
  • It can't do multi-column matching intelligently — you can match on multiple columns, but each column has its own independent threshold rather than a combined similarity score across the row.

The third option people reach for is Python in Excel. It's powerful, but you're now writing pandas and RapidFuzz code inside a spreadsheet cell, paying for an M365 add-on, and waiting on cloud Python to spin up for every recalculation. That's not a fix for the people who are asking how to dedupe in Excel — it's a different product.

3. What actually breaks for you, in practice

When someone tells us "I can't dedupe this in Excel", they almost always mean one of these five things:

  • Excel freezes or crashes. Power Query's fuzzy merge or a Python-in-Excel script hangs on a file of 30k–80k rows. The application becomes unresponsive and you eventually force-quit.
  • The results are wrong in both directions. Either nothing matches (threshold too strict), or "Acme Corp" gets merged with "Acme Healthcare" (threshold too loose). And there's no obvious way to iterate.
  • You can't match on multiple columns together. "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" should clearly be one person, but neither the name nor the company is identical, and Excel can't combine them into one decision.
  • You can't clean the data the way you need before matching. Stripping "Inc.", "LLC", "GmbH", "Ltd.", handling word-order differences ("Coca-Cola Company" vs "The Coca-Cola Company"), normalizing punctuation — none of this is a checkbox in Excel.
  • You can't review or undo. Once Power Query has merged two records, that decision is buried in the query. No "show me the pairs you matched, with scores, before committing".

Why an online tool is the fastest reliable answer in 2026

In 2026, the bottleneck stops being "is there an algorithm that can do this" and becomes "where does the work run". An online fuzzy-matching tool fixes all three of Excel's problems at once:

  • The matching runs on cloud servers, not your laptop — so a 50,000-row file finishes in seconds without freezing anything.
  • Nothing to install — no add-in, no Python, no Java app, no IT ticket. Drop your file into a browser tab and you're done.
  • You see the results before committing. Inspect the pairs, raise or lower the sensitivity, re-run, and only download once you're happy.

Most teams we talk to spend more time on a single failed Excel dedupe attempt than they would spend on the whole task in an online tool. If you want to try it on your own file right now, you can open Clean and drop in your CSV or Excel file — free for files under 500 rows, no signup.

How Clean does it (and how it's different)

Clean by Similarity API is built around the file-drop workflow we just described. Here's what makes it work on the kinds of files Excel chokes on.

It uses AI to figure out your specific use case

When you upload your file, Clean inspects a sample of the rows and the column you've picked, and tells you what it thinks the data is — a list of company names, a contact list with first/last names, a SKU catalog, an address book. Different data needs different matching behaviour, and Clean uses that read to pre-fill smart defaults: which sensitivity threshold is sane, whether to ignore casing, whether to strip company suffixes like Inc. and Ltd., whether word order matters. You can always override every choice — but you start from "this is probably already right" rather than from a blank threshold dial.

A proprietary algorithm faster than most APIs

The matching engine is purpose-built for this job — not a wrapper around an off-the-shelf library. It runs faster than most fuzzy-matching APIs on the market and, against local libraries like RapidFuzz or TheFuzz, the difference is even larger at scale. You don't need to know any of this to use it — you just notice that your file comes back quickly.

Flexible cleaning steps, included automatically

Before any matching happens, Clean runs an optional cleaning step on your data — small transformations that make sure the matches are exactly what you want. These are simple on/off toggles, not formulas you have to write:

  • Lowercase everything so "ACME" and "acme" line up
  • Strip punctuation so "O'Brien" and "OBrien" match
  • Remove business suffixes — Inc., LLC, Corp., Ltd., GmbH, S.A., Pty
  • Handle word-order differences so "John Smith" and "Smith, John" match
  • Collapse extra whitespace and normalize accented characters

These run in memory just for the comparison — your original data is never modified.

Multi-column matching as one decision

Pick more than one column to match on and Clean combines them into a single similarity score for the whole row. That's what catches "Jen Walsh / Acme Corp" as a duplicate of "Jennifer Walsh / Acme Corporation" when neither field on its own would clear the threshold.

Tweak the results before you download

This is the part people miss in most tools. After Clean runs, you see the matched pairs and their similarity scores in the browser. If too many things matched, slide the threshold up and the table updates live. If borderline pairs got missed, slide it down. Toggle a cleaning step on or off and re-run. Only when the results look right do you click Download.

Just drop the file — no prep

No copy/paste, no "save as CSV", no header reshuffling, no removing empty rows beforehand. Drag your .xlsx, .csv, or Google Sheets export onto the page and Clean handles parsing, sheet selection, header detection, and encoding for you.

Free for small files

Files under 500 rows are completely free — no signup, no credit card, no feature paywall. Fuzzy matching is included on the free tier. Larger files have a small flat fee.

How to do it: step-by-step

  1. Go to similarity-api.com/free-csv-dedupe. No signup screen. The upload box is the first thing you see.
  2. Drop your Excel or CSV file. Multi-sheet .xlsx works — Clean will ask which sheet to use. UTF-8, Latin-1, and Windows encodings are all handled.
  3. Pick the column (or columns) to match on. For company dedupe, that's the company-name column. For contact dedupe, pick name and company together — Clean will combine them into one similarity score per row.
  4. Glance at the suggested settings. Clean has already read a sample of your data and pre-filled the sensitivity threshold and cleaning toggles. For most company and contact files, the defaults are already right.
  5. Run, then iterate. Look at the matched pairs and their similarity scores. Too aggressive? Slide the threshold up. Missing obvious matches? Slide it down, or turn on "strip business suffixes". Re-run as many times as you want — it's free during this stage.
  6. Download your results. You don't pick a format — Clean generates all three from the same matching run, bundled together (see them in the interactive sample above):
    • Clean file — one row per real entity. Drop straight into your CRM or back into Excel.
    • Flagged original — your full file, untouched, with a cluster_id and is_duplicate column added. Use when you need to merge manually or audit.
    • Review sheet — just the duplicate groups with similarity scores. Best for handing to a colleague to approve before committing.

Dedupe your messy Excel file now

Drop a .xlsx or .csv and find near-duplicates in seconds — no signup, no install, 500 rows free.

Try it for free →

Why three output formats matter

Most deduplication tools — including every Excel add-in — give you one output: a file with "the duplicates removed". That's fine for throwaway data, but it's the wrong default for anything you actually care about. Different situations call for different outputs:

  • Importing a clean list into a fresh CRM → the clean file. You want one row per entity, no decisions to make.
  • Cleaning an existing CRM export, then re-importing → the flagged original. You need every row preserved so the mapping back is exact, with a cluster ID so you can merge in your CRM's own tooling.
  • Cleaning a list someone else owns → the review sheet. You send them only the matched pairs with scores, they confirm or reject, and nothing is committed without sign-off.

All three are generated from the same matching run, so there's no "did I match the same way both times" risk. Excel and most add-ins force you to re-run the entire operation if you want a different shape of output.

Excel vs Clean — at a glance

Excel (Remove Duplicates / Power Query)Clean by Similarity API
Catches "Acme Corp" vs "Acme Corporation"
Handles tens of thousands of rows quickly Often freezes Seconds
Multi-column similarity (one score per row)
Strip "Inc./LLC/Ltd." as a toggle Manual formulas
See pairs + scores before committing
Tweak threshold and re-run instantly
Three output formats from one run
Cost (small file)Included with ExcelFree under 500 rows

Frequently asked questions