ts-duckling
A tiny, deterministic entity extractor for TypeScript.
Extract structured data and redact PII from free-form text β no ML, no network calls.
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Extract structured entities
const entities = Duckling().extract(
"Email me at foo@bar.com β meeting at 3pm",
);
// β [{ kind: "email", value: { email: "foo@bar.com" }, ... },
// { kind: "time", value: { when: "...", grain: "hour" }, ... }]
// Redact PII in one line
Duckling(PIIParsers).redact("Email me at foo@bar.com, SSN 123-45-6789");
// β "Email me at βββββββββββββββ, SSN βββββββββββ"Overview
ts-duckling uses parser combinator grammars (no ML models, no HTTP) to extract structured entities from text. Inspired by Facebookβs duckling, but built in pure TypeScript and running anywhere β Deno, Node, or the browser.
- Deterministic β same input always produces the same output
- Typed β parser selection narrows the return type automatically
- Composable β bring your own parsers alongside the built-in ones
- Self-contained β no native dependencies, no runtime downloads, no network calls
- Runs everywhere β Deno, Node.js, and browsers (see the live playground)
Documentation
Installation
Deno / JSR
import { Duckling } from "jsr:@claudiu-ceia/ts-duckling";Or add to your import map:
deno add jsr:@claudiu-ceia/ts-ducklingnpm
npx jsr add @claudiu-ceia/ts-ducklingGetting started
Extract entities
Call Duckling() with no arguments to use all 15 built-in parsers:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const entities = Duckling().extract(
"Email me at foo@example.com and visit https://example.com tomorrow at 3pm.",
);
for (const e of entities) {
console.log(e.kind, e.text);
}
// email foo@example.com
// url https://example.com
// time tomorrow at 3pmEach entity carries structured data:
// entities[0]
{
kind: "email",
value: { email: "foo@example.com" },
start: 12,
end: 27,
text: "foo@example.com"
}Pick specific parsers
Pass an array of parsers to narrow both what gets extracted and the return type:
import { Duckling, Email, URL } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, URL.parser]).extract(
"Reach me at a@b.com or https://example.com",
);
// entities: (EmailEntity | URLEntity)[]Redact PII
Use .redact() to replace matched entity spans with a mask character:
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Redact all PII (email, phone, IP, SSN, credit card, UUID, API key)
Duckling(PIIParsers).redact("Contact foo@bar.com, SSN 078-05-1120");
// β "Contact βββββββββββββββ, SSN βββββββββββ"
// Custom mask character
Duckling(PIIParsers).redact("Call +14155552671", { mask: "X" });
// β "Call XXXXXXXXXXXX"
// Redact only specific kinds
Duckling(PIIParsers).redact("foo@bar.com 123-45-6789", { kinds: ["ssn"] });
// β "foo@bar.com βββββββββββ"Custom entities
Define a parser that returns an Entity, then pass it to Duckling:
import { createLanguage, map, type Parser, regex } from "@claudiu-ceia/combine";
import { Duckling, ent, type Entity } from "@claudiu-ceia/ts-duckling";
type HashtagEntity = Entity<"hashtag", { tag: string }>;
type HashtagLanguage = {
Full: Parser<HashtagEntity>;
parser: Parser<HashtagEntity>;
};
const Hashtag = createLanguage<HashtagLanguage>({
Full: () =>
map(
regex(/#[A-Za-z0-9_]{2,64}/, "hashtag"),
(m, b, a) => ent({ tag: m.slice(1) }, "hashtag", b, a),
),
parser: (s) => s.Full,
});
const entities = Duckling([Hashtag.parser]).extract("hello #duckling");
// β [{ kind: "hashtag", value: { tag: "duckling" }, start: 6, end: 15, text: "#duckling" }]Custom parsers compose freely with the built-in ones:
import { Email } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, Hashtag.parser]).extract(
"Email a@b.com with #feedback",
);
// entities: (EmailEntity | HashtagEntity)[]Supported entities
| Entity | Kind | Example match | Notes |
|---|---|---|---|
| Time | time |
tomorrow at 3pm, 2024-01-15T10:30:00Z |
Relative, day-of-week, ISO timestamps |
| Range | range |
2020-2024, 20Β°C to 30Β°C |
Time, year, and temperature ranges |
| Temperature | temperature |
72Β°F, 20 celsius |
Fahrenheit and Celsius |
| Quantity | quantity |
5 kg, 100 miles |
Units of measurement |
| Location | location |
United States, Germany |
Countries (dataset-backed) |
| URL | url |
https://example.com/path |
Full URLs with TLD validation |
email |
user@example.com |
Standard email addresses | |
| Institution | institution |
University of Oxford |
Known institutions |
| Language | language |
English, Japanese |
Language names (dataset-backed) |
| Phone | phone |
+14155552671 |
E.164-ish phone numbers |
| IP address | ip_address |
192.168.1.1, ::1 |
IPv4 + IPv6 full form |
| SSN | ssn |
123-45-6789 |
US Social Security Numbers |
| Credit card | credit_card |
4111111111111111 |
Luhn-validated card numbers |
| UUID | uuid |
550e8400-e29b-41d4-a716-446655440000 |
RFC 4122 UUIDs |
| API key | api_key |
sk-abc123..., AKIA... |
Common provider prefixes |
API reference
Duckling()
function Duckling(): { extract; redact };
function Duckling<T>(parsers: ParserTuple<T>): { extract; redact };Creates an extractor/redactor pair. Without arguments, uses all 15 built-in
parsers and returns AnyEntity[]. When given an explicit parser array, the
return type narrows to the union of those entity types.
.extract(text)
extract(text: string): Entity[]Scans text and returns all matched entities, each with kind, value,
start, end, and text fields. Entities are returned in order of appearance.
.redact(text, opts?)
redact(text: string, opts?: RedactOptions): stringExtracts entities then replaces each matched character with opts.mask (default
"β"). When opts.kinds is set, only those entity kinds are masked.
Overlapping spans are handled correctly.
PIIParsers
const PIIParsers: [
EmailParser,
PhoneParser,
IPAddressParser,
SSNParser,
CreditCardParser,
UUIDParser,
ApiKeyParser,
];Pre-built parser tuple for PII-sensitive entities. Use with
Duckling(PIIParsers) for a quick redaction pipeline.
RedactOptions
interface RedactOptions<K extends string = string> {
mask?: string; // default: "β"
kinds?: K[]; // when omitted, all entities are redacted
}AnyEntity
Union of all 15 built-in entity types. This is the return element type of
Duckling().extract(...).
PIIEntity
Union of the 7 PII entity types:
EmailEntity | PhoneEntity | IPAddressEntity | SSNEntity | CreditCardEntity | UUIDEntity | ApiKeyEntity.
Caveats
ts-duckling uses grammar-based parsers, not ML. This means:
- Deterministic: same input β same output, every time
- Fast: no model loading, no network calls
- But imperfect: expect false positives/negatives for ambiguous inputs
For example:
// ts-duckling interprets 6/2022 as a date
Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
// β [{ kind: "time", text: "6/2022", ... }]If you need high accuracy on messy, ambiguous real-world text, consider an ML-based solution. If you want predictable, fast extraction from structured or semi-structured text (messages, forms, logs), ts-duckling is a great fit.
Playground
Try ts-duckling in the browser: Live Playground
Paste or type any text and see entities extracted in real-time. You can also fetch content from a URL to test against real web pages.
License
MIT Β© Claudiu Ceia