Clean — by Similarity API

Deduplicate Excel & CSV files
in seconds

Find or replace (in)exact duplicates across multiple columns in large files - fast!

Drop your file here or browse

CSV, XLSX, or XLS · up to 10 MB

Multi-column matching — name, email, address & more

How It Works

How to deduplicate Excel & CSV files in 4 steps

Clean removes fuzzy duplicates — the ones with typos, abbreviations, or reordered words — effortlessly.

Step 1

Upload

Drop your CSV or Excel file. No signup, no install, no data stored.

Step 2

Auto-configure

Clean analyses your columns and recommends which ones to match on, how strict to be, and how to handle name variations. You can adjust before running.

Step 3

Review

See duplicate clusters with similarity scores before committing. You decide what to keep.

Step 4

Download Results

Get your clean file instantly — unique records, flagged clusters, and all rows scored.

Learn more about fuzzy matching and why Clean does better than other tools on the market.

Why Clean

Why Excel, Sheets, and add-ons miss real duplicates

	Sheets Add-ons	Clean
Catches "Microsoft Corp" vs "Microsoft Corporation"
AI-recommended matching settings!!
Matches across multiple columns	Limited
Works on large files (50k+ rows)	Times out
No install or account needed
Shows duplicate clusters before deleting	Limited
Gives you 3 different results formats
Flexible data cleaning prior to fuzzy-matching
Strips "Inc.", "LLC", "Corp." before comparing	Limited	✓ toggle on/off

A modern replacement for Excel's Fuzzy Lookup add-in

Microsoft's Fuzzy Lookup add-in is a 2017-era Windows-only download that slows to a crawl past a few thousand rows. And Google Sheets' Remove Duplicates only catches exact-match duplicates. Clean is the browser-based alternative — same fuzzy matching, no install, no Power Query, works on Mac and Windows, and handles up to 100,000 rows per file in both .xlsx and .csv.

See how Clean compares to Fuzzy Lookup →

Who Uses Clean

Built for messy Excel and CSV exports

From messy CRM exports to subscriber lists with split identities — Clean handles duplicates exact-match tools quietly miss.

E-commerce customer lists

Catches the same buyer registered under two different email addresses — something Excel's Remove Duplicates will never find.

Simple Pricing

Free for small files. Pay only for large Excel & CSV jobs.

Process up to 500 rows for free. Larger files are priced per run.

Up to 500 rows

Fuzzy deduplication
Multi-column matching
Instant download

Large File

$1.99+

501 – 100,000 rows

Up to 3,000 rows — $1.99
Up to 10,000 rows — $4.99
Up to 25,000 rows — $9.99
Up to 50,000 rows — $19.99
Up to 100,000 rows — $29.99

Monthly Unlimited

$99.99/mo

Unlimited uploads

Up to 10 MB per file
Unlimited file upload / deduplication
Priority customer support
Cancel anytime

Learn more

Guides for cleaning your data

Step-by-step articles on deduplicating spreadsheets, CRM imports, and vendor exports.

Fuzzy Matching in Excel (2026): Why It's Still Broken — and the Fastest Fix

Excel still can't deduplicate names like "Acme Corp" and "Acme Corporation" in 2026. Here's exactly why, what breaks, and how to do it in under 2 minutes with a free online tool.

CSV Deduplication Online in 2026: We Benchmarked 16 Tools on 5,000 Rows

We ran the same 5,000-row dataset — 22 exact duplicates and 228 near-duplicates — through every online CSV deduplication tool we could find. Most do exact matching only. The fuzzy ones diverge wildly. Here's the full benchmark.

Exported Passwords from Chrome or Opera? Here's How to Remove Duplicates Before Importing

Browser password exports are full of duplicates — same site saved under dozens of different URLs. Here's how to clean up your CSV before importing to Bitwarden, 1Password, or any other password manager.

How to Find Duplicate Customers with Different Emails in a Store Export

Your store counts them as two customers. Same name, same address, different email. Here's how to find and fix duplicate customer records in any export file — no coding needed.

Duplicate Customers in Your Shopify or Squarespace Export? Fix It in Under 5 Minutes

Same person, two emails — your platform doesn't catch it, but your open rates and segments do. Here's how to find duplicate customers in a Shopify or Squarespace export in under 5 minutes.

Deduplify vs Clean by Similarity API: Best Free CSV Deduplication Tool?

Deduplify caps you at 2,000 rows with no paid option. Here's how it compares to Clean by Similarity API — and which one is right for your file size and use case.

Datablist vs Clean by Similarity API: Which CSV Dedupe Tool to Pick

Datablist is a powerful lead intelligence platform — but if you just need to deduplicate a CSV file, it's more tool than the job requires. Here's how the two compare.

How to Clean an Apollo, ZoomInfo, or Lusha Export Before Importing to Your CRM

Data vendor exports from Apollo, ZoomInfo, and Lusha are full of duplicate company names and contacts that already exist in your CRM. Here's how to clean them before importing.

How to Deduplicate a LinkedIn Sales Navigator Export Before Uploading to Your CRM

Sales Navigator exports are full of duplicate company names and near-identical contacts. Here's how to clean them before they pollute your CRM.

How to Dedupe Your Contact List Before a CRM Migration

CRM migrations create more duplicates than almost any other event. Here's how to clean your contact and company data before you move it — so you start fresh, not messy.

Dedupe Checklist: Cleaning Contacts Before HubSpot or Salesforce Import

A practical checklist for cleaning contact and company data before importing to HubSpot or Salesforce — so you don't spend hours fixing duplicates and bad records afterward.

How to Remove Duplicate Company Names in Google Sheets (And Why It Misses Most of Them)

Google Sheets' Remove Duplicates only catches exact matches — "Acme Corp" and "Acme Corporation" both survive. Here's why, and what to do instead.

Best Free CSV Deduplication Tools in 2026 (Compared)

Most CSV deduplication tools only catch exact matches. Here's an honest comparison of the best free options — what each actually does, who it's for, and which ones catch real-world name variants.

How to Deduplicate Account and Contact Records Before Importing to Salesforce

Salesforce deduplicates contacts on email and accounts on name — but only exact matches. Here's what slips through and how to clean your file before it creates a duplicate problem.

Best Free Alternatives to OpenRefine for Deduplicating Contact Lists in 2026

OpenRefine is powerful but built for data engineers. If you need to deduplicate a contact list or remove duplicate company names before a CRM import, here are the better options in 2026.

How to Deduplicate Your Contact List Before Importing to HubSpot

HubSpot only deduplicates on email address — which means it misses most real-world duplicates. Here's what to clean before you hit import, and how to do it without code.

How to Find & Merge Duplicate Company Names in a Spreadsheet or CSV

Excel's Remove Duplicates misses most company name duplicates. Here's why — and how to actually find and merge records when names are spelled differently.

See all guides

NEED MORE?

Interested in deduping larger files?

Our API handles millions of rows with sub-second matching, bulk uploads, and programmatic access. Or reach out and we'll walk you through a custom solution — free of charge.

FAQ

Frequently asked questions

How do I remove duplicates from a CSV online without Excel?

Drop your .csv file into Clean — it runs entirely in your browser, no spreadsheet app, no Power Query, no add-in, no install. Pick the column(s) you want to match on, set a similarity threshold, and download the deduplicated file. Works on Mac, Windows, Linux, and Chromebook, and catches the near-duplicates Excel's built-in Remove Duplicates silently misses — "Jen Walsh" vs "Jennifer Walsh", "Acme Corp" vs "Acme Corporation", and so on. Free for files up to 500 rows, no account required.

Why can't Excel remove duplicates that are spelled differently?

Excel's built-in Remove Duplicates only catches character-for-character matches, so "Jen Walsh" and "Jennifer Walsh" survive as two separate rows, and "Acme Corp" and "Acme Corporation" are treated as completely different companies. To catch spelling variants you need fuzzy matching, which scores how similar two strings are between 0 and 1 instead of asking whether they're identical.

We broke this down in detail — including why Microsoft never shipped a real fix and what to do instead — in Fuzzy Matching in Excel (2026): Why It's Still Broken — and the Fastest Fix.

Does Microsoft Fuzzy Lookup still work in 2026?

Technically yes — Microsoft's Fuzzy Lookup add-in is still downloadable — but it's a 2017-era Windows-only desktop add-in that hasn't been meaningfully updated, doesn't run on Mac or Excel for the web, and slows to a crawl past a few thousand rows. Clean is the modern browser-based replacement: same fuzzy matching idea, but it works on any OS, handles up to 100,000 rows per file, and supports both .xlsx and .csv. Full breakdown of why the add-in keeps falling short in Fuzzy Matching in Excel (2026).

Can I use Power Query fuzzy matching instead?

Power Query's fuzzy merge is the closest thing to a built-in fuzzy matcher in Excel, but it requires Excel for Windows desktop, has limited threshold control, slows dramatically past a few thousand rows, and is really a merge tool — not a dedupe tool. Clean is browser-based, works on any OS, supports up to 100,000 rows per file, lets you tune the similarity threshold, and ships three output formats out of the box (clean file, clusters for review, all rows scored) — none of which Power Query offers natively.

What's the difference between Clean and Excel's Remove Duplicates?

Excel only catches exact character matches — "Microsoft Corp" and "Microsoft Corporation" are treated as completely different records. Clean uses fuzzy matching to score similarity between strings, so Clean catches the variants that exact matching misses. Clean also lets you match across multiple columns simultaneously, so "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" are correctly identified as the same person.

How does Clean find fuzzy duplicates?

Instead of asking whether two records are character-for-character identical, Clean scores how similar they are — a number between 0 and 1 for every pair. Anything above your threshold gets flagged as a likely duplicate. Common real-world messiness (casing, punctuation, abbreviations, business suffixes like Inc., LLC, and Corp.) is normalised automatically before scoring.

When you select a second column — matching on contact name and company together — Clean combines both signals into one decision. That's how "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" get grouped as the same person, even though neither field is an exact match on its own.

Can Clean match on multiple columns?

Yes. Select company name and contact name together — or first name + last name + email domain — and Clean combines the similarity across every selected column into a single match decision. Two weak signals (a partial name match and a partial company match) can become one strong duplicate flag, which is how "Jen Walsh at Acme Corp" is correctly grouped with "Jennifer Walsh at Acme Corporation".

What similarity threshold should I use?

For deduplication, 0.75–0.82 works well for most contact and company lists. Go higher (0.88+) if you want to be conservative. Go lower (0.75) if your data is clean and you want to catch more variants.

What do the three output formats mean for deduplication?

Unique sheet: one row per entity, duplicates removed. Clusters sheet: only the duplicate clusters with similarity scores, for manual review before merging. All rows scored sheet: your original file with a cluster ID, duplicate flag, and similarity score added to every row.

What file formats are supported?

CSV, XLSX, and XLS. Maximum 10 MB per file. If your file is larger, contact us — we can run it via the API.

How is pricing calculated for Clean?

Pricing is based on the number of rows in your file, excluding the header row. Clean is free for files up to 500 rows with no account required. For larger files: $1.99 up to 3,000 rows, $4.99 up to 10,000 rows, $9.99 up to 25,000 rows, $19.99 up to 50,000 rows, and $29.99 up to 100,000 rows. You can preview your results before paying — payment is only required to download.

How do I clean a contact list of 50,000 rows for free?

Clean's free tier covers files up to 500 rows. For a 50,000-row file you can preview the deduplicated results for free — see exactly which clusters Clean would flag — and only pay $19.99 to download the cleaned file. If you need to process this volume regularly or larger, the underlying Similarity API handles millions of rows for $1.99 per 10k rows on a pay-as-you-go plan.

What's the difference between deduplicating and reconciling two lists?

Deduplication finds duplicate records within a single file — two rows in the same spreadsheet that represent the same contact or company. Reconciliation compares two separate files — checking which rows in your new list already exist in your reference list, and which are genuinely new. Use Clean when you have one messy file to clean up before importing to your CRM. Use the reconcile tool when you have a new list — a trade show export, an Apollo download, a vendor list — and want to check it against an existing database before importing.

Is my data safe to upload?

Your file is processed in memory and deleted immediately after your session. It is never written to permanent storage, never shared, and never used for any purpose other than generating your results. You can verify this in our privacy policy.

Deduplicate Excel & CSV filesin seconds