Clean — by Similarity API

Deduplicate yourcontact list in minutes

Drop your file here or browse

CSV, XLSX, or XLS · up to 10 MB

Multi-column matching — name, email, address & more

How It Works

Three simple steps

Upload

Drop your CSV or Excel file — no signup needed.

Review

Pick columns, adjust settings, preview matches.

Download

Get your clean, deduplicated file instantly.

Why Clean

What other tools miss

Excel / SheetsSheets Add-onsClean
Catches "Microsoft Corp" vs "Microsoft Corporation"
Matches across multiple columnsLimited
Works on large files (50k+ rows)Times out
No install or account needed
Shows duplicate clusters before deletingLimited
Download all results (clean, flagged & review)
Flexible data cleaning prior to fuzzy-matching
Strips "Inc.", "LLC", "Corp." before comparingLimited✓ toggle on/off

Simple Pricing

Start free. Pay as you grow.

Process up to 1,000 rows for free. Larger files are priced per run.

$0

Up to 1,000 rows

  • Fuzzy deduplication
  • Multi-column matching
  • Instant download
Most Popular

Large File

$4.99+

1,001 – 100,000 rows

  • Up to 3,000 rows — $4.99
  • Up to 10,000 rows — $9.99
  • Up to 25,000 rows — $19.99
  • Up to 50,000 rows — $29.99
  • Up to 100,000 rows — $39.99

Monthly Unlimited

$99.99/mo

Unlimited uploads

  • Up to 10 MB per file
  • Unlimited file upload / deduplication
  • Priority customer support
  • Cancel anytime

NEED MORE?

Interested in deduping larger files?

Our API handles millions of rows with sub-second matching, bulk uploads, and programmatic access. Or reach out and we'll walk you through a custom solution — free of charge.

Learn more

Guides for cleaning your data

Step-by-step articles on deduplicating spreadsheets, CRM imports, and vendor exports.

FAQ

Frequently asked questions

How does Clean actually find duplicates that look different?

Most tools — including Excel — only catch duplicates when two records are character-for-character identical. "Microsoft Corp" and "Microsoft Corporation" are different strings, so they survive as two separate records.

Clean compares how similar two records are, not whether they're identical. It gives every pair a score between 0 and 1, and anything above your threshold gets flagged as a likely duplicate. Common real-world messiness — differences in casing, punctuation, abbreviations, and business suffixes like Inc., LLC, and Corp. — is handled automatically before the comparison runs.

When you add a second column — matching on both company name and contact name together — Clean combines the similarity across both fields into a single decision. That's how "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" get grouped as the same person, even though neither field is an exact match on its own. Two weak signals become one strong one.

What's the difference between deduplicating and reconciling two lists?

Deduplication finds duplicate records within a single file — two rows in the same spreadsheet that represent the same contact or company. Reconciliation compares two separate files — checking which rows in your new list already exist in your reference list, and which are genuinely new. Use Clean when you have one messy file to clean up before importing to your CRM. Use the reconcile tool when you have a new list — a trade show export, an Apollo download, a vendor list — and want to check it against an existing database before importing.

What's the difference between Clean and Excel's Remove Duplicates?

Excel only catches exact character matches — "Microsoft Corp" and "Microsoft Corporation" are treated as completely different records. Clean uses fuzzy matching to score similarity between strings, so Clean catches the variants that exact matching misses. Clean also lets you match across multiple columns simultaneously, so "Jen Walsh at Acme Corp" and "Jennifer Walsh at Acme Corporation" are correctly identified as the same person.

What's the difference between the reconcile tool and VLOOKUP?

VLOOKUP only matches on exact values — "Jen Walsh" and "Jennifer Walsh" return no match. The reconcile tool scores similarity between strings, so name variants, abbreviations, and company formatting differences are all caught. The reconcile tool also matches on multiple columns simultaneously, so "Jen Walsh at Acme Corp" correctly matches "Jennifer Walsh at Acme Corporation" even though neither field is identical on its own.

Is my data safe to upload?

Your file is processed in memory and deleted immediately after your session. It is never written to permanent storage, never shared, and never used for any purpose other than generating your results. You can verify this in our privacy policy.

What similarity threshold should I use?

For deduplication, 0.75–0.82 works well for most contact and company lists. For reconciling two lists, the default is slightly higher at 0.80 — because a false positive here means suppressing a genuinely new contact, which is more damaging than missing a duplicate. Go higher (0.88+) if you want to be conservative. Go lower (0.75) if your data is clean and you want to catch more variants.

What file formats are supported?

CSV, XLSX, and XLS. Maximum 10 MB per file. If your file is larger, contact us — we can run it via the API.

What do the three output formats mean for deduplication?

Clean file: one row per entity, duplicates removed. Ready to import to your CRM. Flagged file: your original file with a cluster ID and duplicate flag column added. Review sheet: only the duplicate clusters with similarity scores, for manual review before merging.

What do the three output formats mean for reconciliation?

Net-new file: rows from your new list that had no match in your reference list — safe to import, these contacts don't exist yet. Matched file: rows from your new list that matched something in your reference list, with the best match and similarity score added — for review or suppression. Annotated file: every row from your new list with three added columns: match status, similarity score, and the best match found — useful if you want to make your own decisions on borderline cases.

Can Clean and the reconcile tool match on multiple columns?

Yes — for both tools. Select company name and contact name together and the tool combines both signals into a single match decision. "Jen Walsh at Acme Corp" matching "Jennifer Walsh at Acme Corporation" works because the combined name and company similarity is strong even though neither field is an exact match on its own. For reconciliation, you select matching columns independently in each file, so the column names don't need to be the same across your two files.

How is pricing calculated for Clean?

Pricing is based on the number of rows in your file, excluding the header row. Clean is free for files up to 1,000 rows with no account required. For larger files: $4.99 up to 3,000 rows, $9.99 up to 10,000 rows, $19.99 up to 25,000 rows, $29.99 up to 50,000 rows, and $39.99 up to 100,000 rows. You can preview your results before paying — payment is only required to download.

How is pricing calculated for the reconcile tool?

Pricing is based on the combined row count across both files — File A rows plus File B rows, excluding headers. For example, a 400-row trade show export checked against a 2,000-row CRM export counts as 2,400 rows total. Free for combined totals up to 1,000 rows with no account required. For larger combinations: $4.99 up to 3,000 rows combined, $9.99 up to 10,000 rows, $19.99 up to 25,000 rows, $29.99 up to 50,000 rows, and $39.99 up to 100,000 rows. You can preview your results before paying — payment is only required to download.