GoldenMatch is also published as an npm package with full feature parity with the Python toolkit.
npm install goldenmatch
import { dedupe } from "goldenmatch";
const rows = [
{ id: 1, name: "John Smith", email: "john@example.com", zip: "12345" },
{ id: 2, name: "Jon Smith", email: "john@example.com", zip: "12345" },
{ id: 3, name: "Jane Doe", email: "jane@example.com", zip: "54321" },
];
const result = dedupe(rows, {
fuzzy: { name: 0.85 },
blocking: ["zip"],
threshold: 0.85,
});
console.log(result.stats);
The package ships with two separate entry points so the core stays edge-safe and dependency-free:
goldenmatch — edge-safe core. Works in browsers, Cloudflare Workers, Vercel Edge Runtime, Deno, Bun, and Node.goldenmatch/node — adds Node-only features: file I/O (CSV, JSON), HTTP servers, DB connectors.// Edge-safe core (pure TS, no Node APIs)
import { dedupe, match, scoreStrings, applyTransforms } from "goldenmatch";
// Node-only additions
import { readFile, writeCsv, dedupeFile, startApiServer } from "goldenmatch/node";
Deduplicate an array of rows.
interface DedupeOptions {
config?: GoldenMatchConfig;
exact?: readonly string[];
fuzzy?: Record<string, number>;
blocking?: readonly string[];
threshold?: number;
llmScorer?: boolean;
}
interface DedupeResult {
goldenRecords: readonly Row[];
clusters: ReadonlyMap<number, ClusterInfo>;
dupes: readonly Row[];
unique: readonly Row[];
stats: DedupeStats;
scoredPairs: readonly ScoredPair[];
config: GoldenMatchConfig;
}
Match target records against a reference dataset. Returns matched pairs with confidence scores.
Score similarity between two strings. Available scorers:
exact, jaro_winkler, levenshtein, token_sort, soundex_match, dice, jaccard, ensemble.
import { scoreStrings } from "goldenmatch";
const score = scoreStrings("MARTHA", "MARHTA", "jaro_winkler");
// 0.9611
Apply a chain of normalization transforms to a value.
import { applyTransforms } from "goldenmatch";
applyTransforms(" John Q. Smith ", ["strip", "lowercase", "alpha_only"]);
// "johnqsmith"
All scorers implement the same interface as Python goldenmatch.core.scorer:
| Scorer | Use case |
|---|---|
| jaro_winkler | Short strings (names). MARTHA/MARHTA -> 0.9611 |
| levenshtein | Normalized edit distance |
| token_sort | Word reordering tolerant (rapidfuzz-compatible) |
| soundex_match | Phonetic matching (1.0 if same code) |
| ensemble | Weighted combination of jaro_winkler + levenshtein + token_sort + dice |
| dice, jaccard | Set-based similarity for hex-encoded bloom filters (PPRL) |
| embedding | Cosine similarity of embeddings |
| record_embedding | Cosine similarity across whole records |
static — single blocking key with transformsmulti_pass — multiple blocking keys, union of blockssorted_neighborhood — sliding window over sorted dataadaptive — static + auto-split oversized blocksann — approximate nearest neighbor (requires hnswlib-node peer dep)canopy — TF-IDF canopy clusteringlearned — data-driven predicate selectionmost_complete — pick longest stringmajority_vote — pick most frequentsource_priority — pick first non-null from priority listmost_recent — pick value with most recent datefirst_non_null — pick first non-nullApplied at matchkey time. Same names as the Python toolkit:
lowercase, uppercase, strip, strip_all, soundex, metaphone,
digits_only, alpha_only, normalize_whitespace, token_sort,
first_token, last_token, substring:start:end, qgram:n.
The npm package ships a goldenmatch-js binary:
# Dedupe a CSV
npx goldenmatch-js dedupe data.csv --output golden.csv
# Score two strings
npx goldenmatch-js score "MARTHA" "MARHTA" --scorer jaro_winkler
# jaro_winkler: 0.9611
# Match two datasets
npx goldenmatch-js match target.csv reference.csv -o matched.csv
# Profile a dataset
npx goldenmatch-js profile data.csv
# Launch interactive TUI (requires ink peer deps)
npx goldenmatch-js tui data.csv
npx goldenmatch-js mcp-serve
Exposes 19 MCP tools over JSON-RPC on stdio.
npx goldenmatch-js serve --port 8000
Endpoints: /health, /dedupe, /match, /score, /explain, /profile, /clusters, /reviews.
npx goldenmatch-js agent-serve --port 8200
Agent card at /.well-known/agent.json advertises 10 skills.
npx goldenmatch-js tui
Requires the Ink peer deps (see below).
All peer deps are optional. Install only what you need:
| Peer dep | Unlocks |
|---|---|
yaml |
YAML config file loading |
hnswlib-node |
Sub-linear ANN blocking (vs brute-force) |
@huggingface/transformers |
ONNX cross-encoder reranking (MiniLM) |
piscina |
Worker-thread parallel block scoring |
ink, react, ink-table, ink-select-input, ink-text-input, ink-spinner, ink-gradient |
Interactive TUI |
pg |
Postgres connector + sync |
@duckdb/node-api |
DuckDB connector |
snowflake-sdk |
Snowflake connector |
@google-cloud/bigquery |
BigQuery connector |
@databricks/sql |
Databricks connector |
See packages/goldenmatch-js/examples/ for 11 full end-to-end TypeScript examples covering dedupe, match, PPRL, streaming, graph ER, Fellegi-Sunter, and more.
| Feature | Python | TypeScript |
|---|---|---|
| Core matching | Polars + rapidfuzz | Pure TS |
| Fellegi-Sunter | Yes | Yes |
| PPRL | SHA-256 | SHA-256 (interop verified byte-for-byte) |
| Graph ER | Yes | Yes |
| LLM scorer | Yes | Yes (via fetch, edge-safe) |
| Cross-encoder | sentence-transformers | @huggingface/transformers (ONNX) |
| ANN blocking | FAISS | hnswlib-node |
| Parallel scoring | Threads + Ray | piscina worker threads |
| Interactive UI | Textual TUI | Ink TUI |
| MCP server | 30 tools | 19 tools |
| REST API | Yes | Yes |
| A2A server | Yes | Yes |
| YAML configs | Yes | Yes (round-trippable) |
| Edge-safe core | No | Yes |