Clean — by Similarity API

Dedupe Excel & CSV fileswith fuzzy matching

A modern fuzzy lookup for Excel and CSV files. Catches near-duplicates like "Microsoft Corp" vs "Microsoft Corporation" that exact matching misses.

Drop your file here or browse

CSV, XLSX, or XLS · up to 10 MB

Multi-column matching — name, email, address & more

How It Works

How to dedupe Excel & CSV files in 4 steps

Step 1

Upload

Drop your CSV or Excel file. No signup, no install, no data stored.

Step 2

Auto-configure

Clean analyses your columns and recommends which ones to match on, how strict to be, and how to handle name variations. You can adjust before running.

Step 3

Review

See duplicate clusters with similarity scores before committing. You decide what to keep.

Step 4

Download Results

Get your clean file instantly — unique records, flagged clusters, and all rows scored.

Who Uses Clean

Built for messy Excel and CSV exports

From messy CRM exports to subscriber lists with split identities — Clean handles duplicates exact-match tools quietly miss.

E-commerce customer lists

Catches the same buyer registered under two different email addresses — something Excel's Remove Duplicates will never find.

Why Clean

Why Excel, Sheets, and add-ons miss real duplicates

Excel / SheetsSheets Add-onsClean
Catches "Microsoft Corp" vs "Microsoft Corporation"
AI-recommended matching settings!!
Matches across multiple columnsLimited
Works on large files (50k+ rows)Times out
No install or account needed
Shows duplicate clusters before deletingLimited
Gives you 3 different results formats
Flexible data cleaning prior to fuzzy-matching
Strips "Inc.", "LLC", "Corp." before comparingLimited✓ toggle on/off

A modern replacement for Excel's Fuzzy Lookup add-in

Microsoft's Fuzzy Lookup add-in is a 2017-era Windows-only download that slows to a crawl past a few thousand rows. And Google Sheets' Remove Duplicates only catches exact-match duplicates. Clean is the browser-based alternative — same fuzzy matching, no install, no Power Query, works on Mac and Windows, and handles up to 100,000 rows per file in both .xlsx and .csv.

See how Clean compares to Fuzzy Lookup →

Simple Pricing

Free for small files. Pay only for large Excel & CSV jobs.

Process up to 500 rows for free. Larger files are priced per run.

$0

Up to 500 rows

  • Fuzzy deduplication
  • Multi-column matching
  • Instant download
Most Popular

Large File

$1.99+

501 – 100,000 rows

  • Up to 3,000 rows — $1.99
  • Up to 10,000 rows — $4.99
  • Up to 25,000 rows — $9.99
  • Up to 50,000 rows — $19.99
  • Up to 100,000 rows — $29.99

Monthly Unlimited

$99.99/mo

Unlimited uploads

  • Up to 10 MB per file
  • Unlimited file upload / deduplication
  • Priority customer support
  • Cancel anytime

NEED MORE?

Interested in deduping larger files?

Our API handles millions of rows with sub-second matching, bulk uploads, and programmatic access. Or reach out and we'll walk you through a custom solution — free of charge.

Learn more

Guides for cleaning your data

Step-by-step articles on deduplicating spreadsheets, CRM imports, and vendor exports.

FAQ

Frequently asked questions

How does Clean actually find duplicates that look different?

Most tools — including Excel — only catch duplicates when two records are character-for-character identical. "Microsoft Corp" and "Microsoft Corporation" are different strings, so they survive as two separate records.

Clean compares how similar two records are, not whether they're identical. It gives every pair a score between 0 and 1, and anything above your threshold gets flagged as a likely duplicate. Common real-world messiness — differences in casing, punctuation, abbreviations, and business suffixes like Inc., LLC, and Corp. — is handled automatically before the comparison runs.

When you add a second column — matching on both company name and contact name together — Clean combines the similarity across both fields into a single decision. That's how "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" get grouped as the same person, even though neither field is an exact match on its own. Two weak signals become one strong one.

What's the difference between deduplicating and reconciling two lists?

Deduplication finds duplicate records within a single file — two rows in the same spreadsheet that represent the same contact or company. Reconciliation compares two separate files — checking which rows in your new list already exist in your reference list, and which are genuinely new. Use Clean when you have one messy file to clean up before importing to your CRM. Use the reconcile tool when you have a new list — a trade show export, an Apollo download, a vendor list — and want to check it against an existing database before importing.

What's the difference between Clean and Excel's Remove Duplicates?

Excel only catches exact character matches — "Microsoft Corp" and "Microsoft Corporation" are treated as completely different records. Clean uses fuzzy matching to score similarity between strings, so Clean catches the variants that exact matching misses. Clean also lets you match across multiple columns simultaneously, so "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" are correctly identified as the same person.

Is my data safe to upload?

Your file is processed in memory and deleted immediately after your session. It is never written to permanent storage, never shared, and never used for any purpose other than generating your results. You can verify this in our privacy policy.

What similarity threshold should I use?

For deduplication, 0.75–0.82 works well for most contact and company lists. Go higher (0.88+) if you want to be conservative. Go lower (0.75) if your data is clean and you want to catch more variants.

What file formats are supported?

CSV, XLSX, and XLS. Maximum 10 MB per file. If your file is larger, contact us — we can run it via the API.

What do the three output formats mean for deduplication?

Unique sheet: one row per entity, duplicates removed. Clusters sheet: only the duplicate clusters with similarity scores, for manual review before merging. All rows scored sheet: your original file with a cluster ID, duplicate flag, and similarity score added to every row.

Can Clean match on multiple columns?

Yes. Select company name and contact name together and Clean combines both signals into a single match decision. "Jen Walsh at Acme Corp" matching "Jennifer Walsh at Acme Corporation" works because the combined name and company similarity is strong even though neither field is an exact match on its own.

How is pricing calculated for Clean?

Pricing is based on the number of rows in your file, excluding the header row. Clean is free for files up to 500 rows with no account required. For larger files: $1.99 up to 3,000 rows, $4.99 up to 10,000 rows, $9.99 up to 25,000 rows, $19.99 up to 50,000 rows, and $29.99 up to 100,000 rows. You can preview your results before paying — payment is only required to download.

Is this a replacement for Excel's Fuzzy Lookup add-in?

Yes. Microsoft's Fuzzy Lookup add-in for Excel is a 2017-era download that's slow on anything beyond a few thousand rows and only runs on Windows desktop Excel. Clean is the modern fuzzy lookup for Excel: it works in your browser, handles up to 100,000 rows per file, and supports both .xlsx and .csv files. No install, no plugin, no Power Query required.

How do I remove duplicates in Excel when the names are spelled differently?

Excel's built-in Remove Duplicates only catches character-for-character matches, so "Jen Walsh" and "Jennifer Walsh" survive as two separate rows. To catch spelling variants you need fuzzy matching. Upload your file to Clean, pick the name column (and a second column like company or email if you have one), and Clean scores every pair between 0 and 1. Anything above your threshold (typically 0.78) is grouped as the same person — including casing, punctuation, abbreviations, and entity suffixes like Inc. or LLC.

Can I dedupe a CSV file without using Excel or Google Sheets?

Yes. Clean runs entirely in your browser — drop your .csv file into the uploader, pick the column(s) to compare, and download the deduplicated file. No spreadsheet app, no Power Query, no add-in, no install. Works on Mac, Windows, Linux, and Chromebook.

How do I find duplicate customers across multiple columns at once?

Select two or more columns when you configure the run — for example, contact name plus company name, or first name plus last name plus email domain. Clean combines the similarity across every selected column into a single match decision, so two weak signals (a partial name match and a partial company match) can become one strong duplicate flag. This is how "Jen Walsh at Acme Corp" is correctly grouped with "Jennifer Walsh at Acme Corporation".

What's a better alternative to Power Query for fuzzy matching?

Power Query's fuzzy merge requires Excel for Windows desktop, has limited threshold control, and slows dramatically past a few thousand rows. Clean is browser-based, works on any OS, supports up to 100,000 rows per file, lets you tune the similarity threshold, and ships three output formats (clean file, clusters for review, all rows scored) — none of which Power Query offers natively.

How do I clean a contact list of 50,000 rows for free?

Clean's free tier covers files up to 500 rows. For a 50,000-row file you can preview the deduplicated results for free — see exactly which clusters Clean would flag — and only pay $19.99 to download the cleaned file. If you need to process this volume regularly or larger, the underlying Similarity API handles millions of rows for $1.99 per 10k rows on a pay-as-you-go plan.