Skip to main content
Deno 2 is finally here πŸŽ‰οΈ
Learn more

ts-duckling logo

ts-duckling

A tiny, deterministic entity extractor for TypeScript.
Extract structured data and redact PII from free-form text β€” no ML, no network calls.

CI JSR MIT license Playground

import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";

// Extract structured entities
const entities = Duckling().extract(
  "Email me at foo@bar.com β€” meeting at 3pm",
);
// β†’ [{ kind: "email", value: { email: "foo@bar.com" }, ... },
//    { kind: "time",  value: { when: "...", grain: "hour" }, ... }]

// Redact PII in one line
Duckling(PIIParsers).redact("Email me at foo@bar.com, SSN 123-45-6789");
// β†’ "Email me at β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, SSN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

Overview

ts-duckling uses parser combinator grammars (no ML models, no HTTP) to extract structured entities from text. Inspired by Facebook’s duckling, but built in pure TypeScript and running anywhere β€” Deno, Node, or the browser.

  • Deterministic β€” same input always produces the same output
  • Typed β€” parser selection narrows the return type automatically
  • Composable β€” bring your own parsers alongside the built-in ones
  • Self-contained β€” no native dependencies, no runtime downloads, no network calls
  • Runs everywhere β€” Deno, Node.js, and browsers (see the live playground)

Documentation

Installation

Deno / JSR

import { Duckling } from "jsr:@claudiu-ceia/ts-duckling";

Or add to your import map:

deno add jsr:@claudiu-ceia/ts-duckling

npm

npx jsr add @claudiu-ceia/ts-duckling

Getting started

Extract entities

Call Duckling() with no arguments to use all 15 built-in parsers:

import { Duckling } from "@claudiu-ceia/ts-duckling";

const entities = Duckling().extract(
  "Email me at foo@example.com and visit https://example.com tomorrow at 3pm.",
);

for (const e of entities) {
  console.log(e.kind, e.text);
}
// email  foo@example.com
// url    https://example.com
// time   tomorrow at 3pm

Each entity carries structured data:

// entities[0]
{
  kind: "email",
  value: { email: "foo@example.com" },
  start: 12,
  end: 27,
  text: "foo@example.com"
}

Pick specific parsers

Pass an array of parsers to narrow both what gets extracted and the return type:

import { Duckling, Email, URL } from "@claudiu-ceia/ts-duckling";

const entities = Duckling([Email.parser, URL.parser]).extract(
  "Reach me at a@b.com or https://example.com",
);
// entities: (EmailEntity | URLEntity)[]

Redact PII

Use .redact() to replace matched entity spans with a mask character:

import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";

// Redact all PII (email, phone, IP, SSN, credit card, UUID, API key)
Duckling(PIIParsers).redact("Contact foo@bar.com, SSN 078-05-1120");
// β†’ "Contact β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, SSN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

// Custom mask character
Duckling(PIIParsers).redact("Call +14155552671", { mask: "X" });
// β†’ "Call XXXXXXXXXXXX"

// Redact only specific kinds
Duckling(PIIParsers).redact("foo@bar.com 123-45-6789", { kinds: ["ssn"] });
// β†’ "foo@bar.com β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

Custom entities

Define a parser that returns an Entity, then pass it to Duckling:

import { createLanguage, map, type Parser, regex } from "@claudiu-ceia/combine";
import { Duckling, ent, type Entity } from "@claudiu-ceia/ts-duckling";

type HashtagEntity = Entity<"hashtag", { tag: string }>;

type HashtagLanguage = {
  Full: Parser<HashtagEntity>;
  parser: Parser<HashtagEntity>;
};

const Hashtag = createLanguage<HashtagLanguage>({
  Full: () =>
    map(
      regex(/#[A-Za-z0-9_]{2,64}/, "hashtag"),
      (m, b, a) => ent({ tag: m.slice(1) }, "hashtag", b, a),
    ),
  parser: (s) => s.Full,
});

const entities = Duckling([Hashtag.parser]).extract("hello #duckling");
// β†’ [{ kind: "hashtag", value: { tag: "duckling" }, start: 6, end: 15, text: "#duckling" }]

Custom parsers compose freely with the built-in ones:

import { Email } from "@claudiu-ceia/ts-duckling";

const entities = Duckling([Email.parser, Hashtag.parser]).extract(
  "Email a@b.com with #feedback",
);
// entities: (EmailEntity | HashtagEntity)[]

Supported entities

Entity Kind Example match Notes
Time time tomorrow at 3pm, 2024-01-15T10:30:00Z Relative, day-of-week, ISO timestamps
Range range 2020-2024, 20Β°C to 30Β°C Time, year, and temperature ranges
Temperature temperature 72Β°F, 20 celsius Fahrenheit and Celsius
Quantity quantity 5 kg, 100 miles Units of measurement
Location location United States, Germany Countries (dataset-backed)
URL url https://example.com/path Full URLs with TLD validation
Email email user@example.com Standard email addresses
Institution institution University of Oxford Known institutions
Language language English, Japanese Language names (dataset-backed)
Phone phone +14155552671 E.164-ish phone numbers
IP address ip_address 192.168.1.1, ::1 IPv4 + IPv6 full form
SSN ssn 123-45-6789 US Social Security Numbers
Credit card credit_card 4111111111111111 Luhn-validated card numbers
UUID uuid 550e8400-e29b-41d4-a716-446655440000 RFC 4122 UUIDs
API key api_key sk-abc123..., AKIA... Common provider prefixes

API reference

Duckling()

function Duckling(): { extract; redact };
function Duckling<T>(parsers: ParserTuple<T>): { extract; redact };

Creates an extractor/redactor pair. Without arguments, uses all 15 built-in parsers and returns AnyEntity[]. When given an explicit parser array, the return type narrows to the union of those entity types.

.extract(text)

extract(text: string): Entity[]

Scans text and returns all matched entities, each with kind, value, start, end, and text fields. Entities are returned in order of appearance.

.redact(text, opts?)

redact(text: string, opts?: RedactOptions): string

Extracts entities then replaces each matched character with opts.mask (default "β–ˆ"). When opts.kinds is set, only those entity kinds are masked. Overlapping spans are handled correctly.

PIIParsers

const PIIParsers: [
  EmailParser,
  PhoneParser,
  IPAddressParser,
  SSNParser,
  CreditCardParser,
  UUIDParser,
  ApiKeyParser,
];

Pre-built parser tuple for PII-sensitive entities. Use with Duckling(PIIParsers) for a quick redaction pipeline.

RedactOptions

interface RedactOptions<K extends string = string> {
  mask?: string; // default: "β–ˆ"
  kinds?: K[]; // when omitted, all entities are redacted
}

AnyEntity

Union of all 15 built-in entity types. This is the return element type of Duckling().extract(...).

PIIEntity

Union of the 7 PII entity types: EmailEntity | PhoneEntity | IPAddressEntity | SSNEntity | CreditCardEntity | UUIDEntity | ApiKeyEntity.

Caveats

ts-duckling uses grammar-based parsers, not ML. This means:

  • Deterministic: same input β†’ same output, every time
  • Fast: no model loading, no network calls
  • But imperfect: expect false positives/negatives for ambiguous inputs

For example:

// ts-duckling interprets 6/2022 as a date
Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
// β†’ [{ kind: "time", text: "6/2022", ... }]

If you need high accuracy on messy, ambiguous real-world text, consider an ML-based solution. If you want predictable, fast extraction from structured or semi-structured text (messages, forms, logs), ts-duckling is a great fit.

Playground

Try ts-duckling in the browser: Live Playground

Paste or type any text and see entities extracted in real-time. You can also fetch content from a URL to test against real web pages.

License

MIT Β© Claudiu Ceia