Clean — by Similarity API

Deduplicate Excel & CSV filesin seconds

Find or replace (in)exact duplicates across multiple columns in large files - fast! 

Drop your file here or browse

CSV, XLSX, or XLS · up to 10 MB

Multi-column matching — name, email, address & more

How It Works

How to deduplicate Excel & CSV files in 4 steps

Clean removes fuzzy duplicates — the ones with typos, abbreviations, or reordered words — effortlessly.

Step 1

Upload

Drop your CSV or Excel file. No signup, no install, no data stored.

Step 2

Auto-configure

Clean analyses your columns and recommends which ones to match on, how strict to be, and how to handle name variations. You can adjust before running.

Step 3

Review

See duplicate clusters with similarity scores before committing. You decide what to keep.

Step 4

Download Results

Get your clean file instantly — unique records, flagged clusters, and all rows scored.

Why Clean

Why Excel, Sheets, and add-ons miss real duplicates

Excel / SheetsSheets Add-onsClean
Catches "Microsoft Corp" vs "Microsoft Corporation"
AI-recommended matching settings!!
Matches across multiple columnsLimited
Works on large files (50k+ rows)Times out
No install or account needed
Shows duplicate clusters before deletingLimited
Gives you 3 different results formats
Flexible data cleaning prior to fuzzy-matching
Strips "Inc.", "LLC", "Corp." before comparingLimited✓ toggle on/off

A modern replacement for Excel's Fuzzy Lookup add-in

Microsoft's Fuzzy Lookup add-in is a 2017-era Windows-only download that slows to a crawl past a few thousand rows. And Google Sheets' Remove Duplicates only catches exact-match duplicates. Clean is the browser-based alternative — same fuzzy matching, no install, no Power Query, works on Mac and Windows, and handles up to 100,000 rows per file in both .xlsx and .csv.

See how Clean compares to Fuzzy Lookup →

Who Uses Clean

Built for messy Excel and CSV exports

From messy CRM exports to subscriber lists with split identities — Clean handles duplicates exact-match tools quietly miss.

E-commerce customer lists

Catches the same buyer registered under two different email addresses — something Excel's Remove Duplicates will never find.

Simple Pricing

Free for small files. Pay only for large Excel & CSV jobs.

Process up to 500 rows for free. Larger files are priced per run.

$0

Up to 500 rows

  • Fuzzy deduplication
  • Multi-column matching
  • Instant download
Most Popular

Large File

$1.99+

501 – 100,000 rows

  • Up to 3,000 rows — $1.99
  • Up to 10,000 rows — $4.99
  • Up to 25,000 rows — $9.99
  • Up to 50,000 rows — $19.99
  • Up to 100,000 rows — $29.99

Monthly Unlimited

$99.99/mo

Unlimited uploads

  • Up to 10 MB per file
  • Unlimited file upload / deduplication
  • Priority customer support
  • Cancel anytime

Learn more

Guides for cleaning your data

Step-by-step articles on deduplicating spreadsheets, CRM imports, and vendor exports.

NEED MORE?

Interested in deduping larger files?

Our API handles millions of rows with sub-second matching, bulk uploads, and programmatic access. Or reach out and we'll walk you through a custom solution — free of charge.

FAQ

Frequently asked questions

How do I remove duplicates from a CSV online without Excel?

Drop your .csv file into Clean — it runs entirely in your browser, no spreadsheet app, no Power Query, no add-in, no install. Pick the column(s) you want to match on, set a similarity threshold, and download the deduplicated file. Works on Mac, Windows, Linux, and Chromebook, and catches the near-duplicates Excel's built-in Remove Duplicates silently misses — "Jen Walsh" vs "Jennifer Walsh", "Acme Corp" vs "Acme Corporation", and so on. Free for files up to 500 rows, no account required.

Why can't Excel remove duplicates that are spelled differently?

Excel's built-in Remove Duplicates only catches character-for-character matches, so "Jen Walsh" and "Jennifer Walsh" survive as two separate rows, and "Acme Corp" and "Acme Corporation" are treated as completely different companies. To catch spelling variants you need fuzzy matching, which scores how similar two strings are between 0 and 1 instead of asking whether they're identical.

We broke this down in detail — including why Microsoft never shipped a real fix and what to do instead — in Fuzzy Matching in Excel (2026): Why It's Still Broken — and the Fastest Fix.

Does Microsoft Fuzzy Lookup still work in 2026?

Technically yes — Microsoft's Fuzzy Lookup add-in is still downloadable — but it's a 2017-era Windows-only desktop add-in that hasn't been meaningfully updated, doesn't run on Mac or Excel for the web, and slows to a crawl past a few thousand rows. Clean is the modern browser-based replacement: same fuzzy matching idea, but it works on any OS, handles up to 100,000 rows per file, and supports both .xlsx and .csv. Full breakdown of why the add-in keeps falling short in Fuzzy Matching in Excel (2026).

Can I use Power Query fuzzy matching instead?

Power Query's fuzzy merge is the closest thing to a built-in fuzzy matcher in Excel, but it requires Excel for Windows desktop, has limited threshold control, slows dramatically past a few thousand rows, and is really a merge tool — not a dedupe tool. Clean is browser-based, works on any OS, supports up to 100,000 rows per file, lets you tune the similarity threshold, and ships three output formats out of the box (clean file, clusters for review, all rows scored) — none of which Power Query offers natively.

What's the difference between Clean and Excel's Remove Duplicates?

Excel only catches exact character matches — "Microsoft Corp" and "Microsoft Corporation" are treated as completely different records. Clean uses fuzzy matching to score similarity between strings, so Clean catches the variants that exact matching misses. Clean also lets you match across multiple columns simultaneously, so "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" are correctly identified as the same person.

How does Clean find fuzzy duplicates?

Instead of asking whether two records are character-for-character identical, Clean scores how similar they are — a number between 0 and 1 for every pair. Anything above your threshold gets flagged as a likely duplicate. Common real-world messiness (casing, punctuation, abbreviations, business suffixes like Inc., LLC, and Corp.) is normalised automatically before scoring.

When you select a second column — matching on contact name and company together — Clean combines both signals into one decision. That's how "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" get grouped as the same person, even though neither field is an exact match on its own.

Can Clean match on multiple columns?

Yes. Select company name and contact name together — or first name + last name + email domain — and Clean combines the similarity across every selected column into a single match decision. Two weak signals (a partial name match and a partial company match) can become one strong duplicate flag, which is how "Jen Walsh at Acme Corp" is correctly grouped with "Jennifer Walsh at Acme Corporation".

What similarity threshold should I use?

For deduplication, 0.75–0.82 works well for most contact and company lists. Go higher (0.88+) if you want to be conservative. Go lower (0.75) if your data is clean and you want to catch more variants.

What do the three output formats mean for deduplication?

Unique sheet: one row per entity, duplicates removed. Clusters sheet: only the duplicate clusters with similarity scores, for manual review before merging. All rows scored sheet: your original file with a cluster ID, duplicate flag, and similarity score added to every row.

What file formats are supported?

CSV, XLSX, and XLS. Maximum 10 MB per file. If your file is larger, contact us — we can run it via the API.

How is pricing calculated for Clean?

Pricing is based on the number of rows in your file, excluding the header row. Clean is free for files up to 500 rows with no account required. For larger files: $1.99 up to 3,000 rows, $4.99 up to 10,000 rows, $9.99 up to 25,000 rows, $19.99 up to 50,000 rows, and $29.99 up to 100,000 rows. You can preview your results before paying — payment is only required to download.

How do I clean a contact list of 50,000 rows for free?

Clean's free tier covers files up to 500 rows. For a 50,000-row file you can preview the deduplicated results for free — see exactly which clusters Clean would flag — and only pay $19.99 to download the cleaned file. If you need to process this volume regularly or larger, the underlying Similarity API handles millions of rows for $1.99 per 10k rows on a pay-as-you-go plan.

What's the difference between deduplicating and reconciling two lists?

Deduplication finds duplicate records within a single file — two rows in the same spreadsheet that represent the same contact or company. Reconciliation compares two separate files — checking which rows in your new list already exist in your reference list, and which are genuinely new. Use Clean when you have one messy file to clean up before importing to your CRM. Use the reconcile tool when you have a new list — a trade show export, an Apollo download, a vendor list — and want to check it against an existing database before importing.

Is my data safe to upload?

Your file is processed in memory and deleted immediately after your session. It is never written to permanent storage, never shared, and never used for any purpose other than generating your results. You can verify this in our privacy policy.