ts-duckling
A tiny, deterministic entity extractor for TypeScript.
Extract structured data, render rich highlights, and redact PII from free-form text β no ML, no network calls.
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Extract structured entities from a chat message
const entities = Duckling().extract(
"Hey! Meet me at Times Square tomorrow at 3pm. My email is alex@company.io",
);
// β [{ kind: "location", text: "Times Square", ... },
// { kind: "time", text: "tomorrow at 3pm", ... },
// { kind: "email", text: "alex@company.io", ... }]
// Redact PII in one line
Duckling(PIIParsers).redact(
"Contact alex@company.io, SSN 078-05-1120, or call +14155552671",
);
// β "Contact ββββββββββββββββββ, SSN βββββββββββ, or call ββββββββββββ"Overview
ts-duckling uses parser combinator grammars (no ML models, no HTTP) to extract structured entities from text. Inspired by Facebookβs duckling, but built in pure TypeScript and running anywhere β Deno, Node, or the browser.
- Deterministic β same input always produces the same output
- Typed β parser selection narrows the return type automatically
- Composable β bring your own parsers alongside the built-in ones
- Self-contained β no native dependencies, no runtime downloads, no network calls
- Runs everywhere β Deno, Node.js, and browsers (see the live playground)
Documentation
Installation
Deno / JSR
import { Duckling } from "jsr:@claudiu-ceia/ts-duckling";Or add to your import map:
deno add jsr:@claudiu-ceia/ts-ducklingnpm
npx jsr add @claudiu-ceia/ts-ducklingGetting started
Extract entities
Call Duckling() with no arguments to use all 15 built-in parsers:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg =
"Hey! I'll be in Germany next Friday at 5pm. Shoot me a message at alex@company.io or visit https://example.com/invite";
for (const e of Duckling().extract(msg)) {
console.log(e.kind, e.text);
}
// location Germany
// time next Friday at 5pm
// email alex@company.io
// url https://example.com/inviteEach entity carries structured data:
// entities[0]
{
kind: "location",
value: { location: "Germany" },
start: 16,
end: 23,
text: "Germany"
}Pick specific parsers
Pass an array of parsers to narrow both what gets extracted and the return type:
import { Duckling, Email, Time, URL } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, URL.parser, Time.parser]).extract(
"Ping me at alex@company.io or https://meet.com β available tomorrow at 2pm",
);
// entities: (EmailEntity | URLEntity | TimeEntity)[]Redact PII
Use .redact() to replace matched entity spans with a mask character:
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Redact all PII (email, phone, IP, SSN, credit card, UUID, API key)
Duckling(PIIParsers).redact(
"Patient email: john.doe@clinic.org, SSN 078-05-1120, phone +14155552671",
);
// β "Patient email: ββββββββββββββββββββββ, SSN βββββββββββ, phone ββββββββββββ"
// Custom mask character
Duckling(PIIParsers).redact("Call +14155552671", { mask: "X" });
// β "Call XXXXXXXXXXXX"
// Redact only specific kinds
Duckling(PIIParsers).redact(
"Contact john.doe@clinic.org, SSN 078-05-1120",
{ kinds: ["ssn"] },
);
// β "Contact john.doe@clinic.org, SSN βββββββββββ"Render entities
Use .render() to replace entity spans via a callback β perfect for turning
plain-text messages into HTML with highlighted or linked entities:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg =
"Hey! Meet at Times Square tomorrow at 3pm, email me at alex@company.io or check https://example.com/rsvp";
const html = Duckling().render(msg, ({ entity, children }) => {
switch (entity.kind) {
case "url":
return `<a href="${children}">${children}</a>`;
case "email":
return `<a href="mailto:${children}">${children}</a>`;
default:
return `<mark data-kind="${entity.kind}">${children}</mark>`;
}
});
// β 'Hey! Meet at <mark data-kind="location">Times Square</mark>
// <mark data-kind="time">tomorrow at 3pm</mark>, email me at
// <a href="mailto:alex@company.io">alex@company.io</a> or check
// <a href="https://example.com/rsvp">https://example.com/rsvp</a>'Nested entities (e.g. an SSN containing quantity sub-parts) are rendered inside-out β inner entities are transformed first, and the parent receives the result:
import { Duckling, Quantity, SSN } from "@claudiu-ceia/ts-duckling";
Duckling([Quantity.parser, SSN.parser]).render(
"SSN 123-45-6789",
({ entity, children }) => `<${entity.kind}>${children}</${entity.kind}>`,
);
// β "SSN <ssn><quantity>123</quantity>-<quantity>45</quantity>-<quantity>6789</quantity></ssn>"Return undefined to leave a span unchanged β useful for selective rendering:
import { Duckling } from "@claudiu-ceia/ts-duckling";
// Only make URLs clickable, leave everything else as plain text
Duckling().render(
"Visit https://example.com β event on next Friday at 5pm in Germany",
({ entity, children }) => {
if (entity.kind === "url") return `<a href="${children}">${children}</a>`;
return undefined;
},
);
// β 'Visit <a href="https://example.com">https://example.com</a> β event on next Friday at 5pm in Germany'Map entities to components
Use .renderMap() when you need an array of segments instead of a single
string β ideal for React, Preact, Solid, or any framework that renders element
trees:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg = "Hey! I'm at Times Square, email me at alex@company.io";
const segments = Duckling().renderMap<JSX.Element>(
msg,
({ entity, children }) => (
<mark key={entity.start} data-kind={entity.kind}>
{children}
</mark>
),
);
// β ["Hey! I'm at ", <mark data-kind="location">Times Square</mark>,
// ", email me at ", <mark data-kind="email">alex@company.io</mark>]
// Drop it straight into a component
function HighlightedMessage({ text }: { text: string }) {
const segments = Duckling().renderMap<JSX.Element>(
text,
({ entity, children }) => {
switch (entity.kind) {
case "url":
return <a href={entity.text}>{children}</a>;
case "email":
return <a href={`mailto:${entity.text}`}>{children}</a>;
case "time":
return <time>{children}</time>;
default:
return <mark data-kind={entity.kind}>{children}</mark>;
}
},
);
return <p>{segments}</p>;
}Like .render(), nested entities are handled automatically β child spans are
mapped first, and the parent callback receives the already-mapped children as
(string | R)[].
Custom entities
Define a parser that returns an Entity, then pass it to Duckling:
import { createLanguage, map, type Parser, regex } from "@claudiu-ceia/combine";
import { Duckling, ent, type Entity } from "@claudiu-ceia/ts-duckling";
type HashtagEntity = Entity<"hashtag", { tag: string }>;
type HashtagLanguage = {
Full: Parser<HashtagEntity>;
parser: Parser<HashtagEntity>;
};
const Hashtag = createLanguage<HashtagLanguage>({
Full: () =>
map(
regex(/#[A-Za-z0-9_]{2,64}/, "hashtag"),
(m, b, a) => ent({ tag: m.slice(1) }, "hashtag", b, a),
),
parser: (s) => s.Full,
});
const entities = Duckling([Hashtag.parser]).extract("hello #duckling");
// β [{ kind: "hashtag", value: { tag: "duckling" }, start: 6, end: 15, text: "#duckling" }]Custom parsers compose freely with the built-in ones:
import { Email } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, Hashtag.parser]).extract(
"Email alex@company.io with #feedback",
);
// entities: (EmailEntity | HashtagEntity)[]Supported entities
| Entity | Kind | Example match | Notes |
|---|---|---|---|
| Time | time |
tomorrow at 3pm, 2024-01-15T10:30:00Z |
Relative, day-of-week, ISO timestamps |
| Range | range |
2020-2024, 20Β°C to 30Β°C |
Time, year, and temperature ranges |
| Temperature | temperature |
72Β°F, 20 celsius |
Fahrenheit and Celsius |
| Quantity | quantity |
5 kg, 100 miles |
Units of measurement |
| Location | location |
United States, Germany |
Countries (dataset-backed) |
| URL | url |
https://example.com/path |
Full URLs with TLD validation |
email |
user@example.com |
Standard email addresses | |
| Institution | institution |
University of Oxford |
Known institutions |
| Language | language |
English, Japanese |
Language names (dataset-backed) |
| Phone | phone |
+14155552671 |
E.164-ish phone numbers |
| IP address | ip_address |
192.168.1.1, ::1 |
IPv4 + IPv6 full form |
| SSN | ssn |
123-45-6789 |
US Social Security Numbers |
| Credit card | credit_card |
4111111111111111 |
Luhn-validated card numbers |
| UUID | uuid |
550e8400-e29b-41d4-a716-446655440000 |
RFC 4122 UUIDs |
| API key | api_key |
sk-abc123..., AKIA... |
Common provider prefixes |
API reference
Duckling()
function Duckling(): { extract; render; renderMap; redact };
function Duckling<T>(parsers: ParserTuple<T>): {
extract;
render;
renderMap;
redact;
};Creates an extractor/renderer/redactor. Without arguments, uses all 15 built-in
parsers and returns AnyEntity[]. When given an explicit parser array, the
return type narrows to the union of those entity types.
.extract(text)
extract(text: string): Entity[]Scans text and returns all matched entities, each with kind, value,
start, end, and text fields. Entities are returned in order of appearance.
.render(text, fn)
render(text: string, fn: RenderFn<Entity>): stringExtracts entities, arranges them into a span tree (wider spans parent narrower
ones), and calls fn for each entity node. The callback receives the entity and
the already-rendered text of its children. Return a replacement string, or
undefined to leave the span as-is.
.renderMap(text, fn)
renderMap<R>(text: string, fn: RenderMapFn<Entity, R>): (string | R)[]Like .render(), but instead of producing a single string, returns an array of
segments: plain-text strings interleaved with values of type R produced by
your callback. This is the API you want for React/JSX β map entities to
elements, and the result is ready to drop into a componentβs children.
The callback receives { entity, children } where children is
(string | R)[] β nested entities are already mapped.
.redact(text, opts?)
redact(text: string, opts?: RedactOptions): stringBuilt on top of .render(). Extracts entities then replaces each matched span
with opts.mask (default "β"). When opts.kinds is set, only those entity
kinds are masked. Overlapping/nested spans are resolved via the span tree.
PIIParsers
const PIIParsers: [
EmailParser,
PhoneParser,
IPAddressParser,
SSNParser,
CreditCardParser,
UUIDParser,
ApiKeyParser,
];Pre-built parser tuple for PII-sensitive entities. Use with
Duckling(PIIParsers) for a quick redaction pipeline.
RedactOptions
interface RedactOptions<K extends string = string> {
mask?: string; // default: "β"
kinds?: K[]; // when omitted, all entities are redacted
}RenderFn
type RenderFn<E> = (ctx: {
entity: E;
children: string;
}) => string | undefined;Callback for .render(). Receives the entity and the already-rendered text of
its nested children. Return a replacement string, or undefined to leave the
span unchanged.
RenderMapFn
type RenderMapFn<E, R> = (ctx: {
entity: E;
children: (string | R)[];
}) => R;Callback for .renderMap(). Receives the entity and its children as an array of
plain-text strings and already-mapped R values. Return a value of type R to
replace the span.
AnyEntity
Union of all 15 built-in entity types. This is the return element type of
Duckling().extract(...).
PIIEntity
Union of the 7 PII entity types:
EmailEntity | PhoneEntity | IPAddressEntity | SSNEntity | CreditCardEntity | UUIDEntity | ApiKeyEntity.
Caveats
ts-duckling uses grammar-based parsers, not ML. This means:
- Deterministic: same input β same output, every time
- Fast: no model loading, no network calls
- But imperfect: expect false positives/negatives for ambiguous inputs
For example:
// ts-duckling interprets 6/2022 as a date
Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
// β [{ kind: "time", text: "6/2022", ... }]If you need high accuracy on messy, ambiguous real-world text, consider an ML-based solution. If you want predictable, fast extraction from structured or semi-structured text (messages, forms, logs), ts-duckling is a great fit.
Playground
Try ts-duckling in the browser: Live Playground
Paste or type any text and see entities extracted in real-time. You can also fetch content from a URL to test against real web pages.
License
MIT Β© Claudiu Ceia