Map messy source columns to a known target schema — accurately, explainably, with zero config.
npm install infermap
infermap is a schema-mapping engine: give it any two field collections (records, CSVs, database tables) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Built as a faithful TypeScript port of the Python infermap package, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.
map, apply, inspect, validatenoUncheckedIndexedAccess, exactOptionalPropertyTypes, full .d.tsYou have data with messy column names. You want it mapped to a clean canonical schema. Without infermap:
// 50 lines of brittle if/else, hardcoded synonyms, and regret
if (col === "fname" || col === "first_nm") canonical[i] = "first_name";
else if (col === "email_addr" || col === "e_mail" || col === "mail") canonical[i] = "email";
// ...
With infermap:
import { map } from "infermap";
const result = map(
{ records: [{ fname: "John", lname: "Doe", email_addr: "j@d.co", tel: "555-0100" }] },
{ records: [{ first_name: "", last_name: "", email: "", phone: "" }] }
);
for (const m of result.mappings) {
console.log(`${m.source} → ${m.target} (${m.confidence.toFixed(2)})`);
}
// fname → first_name (0.44)
// lname → last_name (0.48)
// email_addr → email (0.69)
// tel → phone (0.39)
Each mapping comes with a per-scorer confidence breakdown, so when something goes wrong you can see exactly which signal contributed.
npm install infermap
# or
pnpm add infermap
# or
yarn add infermap
Requires Node ≥ 20. The default entrypoint is edge-runtime compatible.
import { map } from "infermap";
const crm = [
{ fname: "John", lname: "Doe", email_addr: "j@d.co", signup_dt: "2024-01-15" },
{ fname: "Jane", lname: "Smith", email_addr: "j@s.co", signup_dt: "2024-02-20" },
];
const canonical = [
{ first_name: "", last_name: "", email: "", created_at: "" },
];
const result = map({ records: crm }, { records: canonical });
console.log(result.mappings);
// [
// { source: "fname", target: "first_name", confidence: 0.44, breakdown: {...}, reasoning: "..." },
// { source: "lname", target: "last_name", confidence: 0.48, breakdown: {...}, reasoning: "..." },
// { source: "email_addr", target: "email", confidence: 0.69, breakdown: {...}, reasoning: "..." },
// { source: "signup_dt", target: "created_at", confidence: 0.41, breakdown: {...}, reasoning: "..." },
// ]
map() accepts any of these shapes for both source and target:
type MapInput =
| SchemaInfo // pre-extracted
| { records: Array<Record<string, unknown>>; sourceName? } // plain records
| { csvText: string; sourceName? } // CSV as string
| { jsonText: string; sourceName? } // JSON array as string
| { schemaDefinition: string | object; sourceName? }; // JSON schema file
Node users can read files directly:
import { extractSchemaFromFile } from "infermap/node";
import { MapEngine } from "infermap";
const src = await extractSchemaFromFile("./crm.csv");
const tgt = await extractSchemaFromFile("./canonical.json");
const result = new MapEngine().mapSchemas(src, tgt);
Works in any Next.js context — Server Components, Route Handlers, Server Actions, Edge Functions. The default entrypoint has zero Node built-ins, so the Edge Runtime works without any special config.
// app/api/infer/route.ts
import { map, mapResultToReport } from "infermap";
export const runtime = "edge"; // remove if you need Node APIs
export async function POST(req: Request) {
const { sourceCsv, targetCsv } = await req.json();
const result = map(
{ csvText: sourceCsv },
{ csvText: targetCsv }
);
return Response.json(mapResultToReport(result));
}
For filesystem or database access, switch to Node runtime and import from infermap/node.
Optional Node-only providers. Install only the driver you need:
npm install better-sqlite3 # for sqlite://
npm install pg # for postgresql://
npm install @duckdb/node-api # for duckdb://
import { extractDbSchema } from "infermap/node";
const schema = await extractDbSchema(
"postgresql://user:pass@host/mydb",
{ table: "customers" }
);
Reweight scorers and extend the alias table via a JSON config object:
import { map } from "infermap";
const result = map(source, target, {
config: {
scorers: {
LLMScorer: { enabled: false },
FuzzyNameScorer: { weight: 0.3 },
},
aliases: {
order_id: ["order_num", "ord_no"],
customer_id: ["cust_id", "customer_number"],
},
},
});
You can also persist a computed mapping and reload it:
import { mapResultToConfigJson, fromConfig } from "infermap";
import { writeFile, readFile } from "node:fs/promises";
await writeFile("mapping.json", mapResultToConfigJson(result));
// later:
const restored = fromConfig(await readFile("mapping.json", "utf8"));
import { MapEngine, defaultScorers, defineScorer, makeScorerResult } from "infermap";
const domainScorer = defineScorer(
"DomainMatcher",
(source, target) => {
// return null to abstain, or a ScorerResult in [0, 1]
if (source.name.startsWith("cust_") && target.name.startsWith("customer_")) {
return makeScorerResult(0.9, "shared customer prefix");
}
return null;
},
0.6 // weight
);
const engine = new MapEngine({
scorers: [...defaultScorers(), domainScorer],
});
npx infermap map ./crm.csv ./canonical.csv
npx infermap inspect ./crm.csv
npx infermap map ./crm.csv ./canonical.csv --format json -o mapping.json
npx infermap apply ./crm.csv --config mapping.json --output renamed.csv
npx infermap validate ./crm.csv --config mapping.json --required email,id --strict
The CLI uses only node:util/parseArgs — no extra runtime deps.
This package is a faithful port of infermap on PyPI. Mapping decisions, confidence scores, and unmapped lists are verified to agree with the Python engine to 4 decimal places via shared golden tests that run on every CI build.
If a Python scorer changes, the golden generator must be re-run and the TS parity tests must pass before anything merges. You can’t accidentally ship drift. If you find a parity bug, please file an issue with both inputs and both outputs.
See the Python vs TypeScript wiki page for a feature parity matrix and migration guide.
| Path | Contents | Runtime |
|---|---|---|
infermap / infermap/core |
Types, engine, all 6 scorers, Hungarian assignment, in-memory / CSV / JSON / schema-file providers, JSON config loader, map() |
edge-safe |
infermap/node |
Filesystem file reader, DB providers (SQLite / Postgres / DuckDB) | Node only |