Skip to main content
Deno 2 is finally here πŸŽ‰οΈ
Learn more

ts-duckling logo

ts-duckling

A tiny, deterministic entity extractor for TypeScript.
Extract structured data, render rich highlights, and redact PII from free-form text β€” no ML, no network calls.

CI JSR MIT license Playground

import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";

// Extract structured entities from a chat message
const entities = Duckling().extract(
  "Hey! Meet me at Times Square tomorrow at 3pm. My email is alex@company.io",
);
// β†’ [{ kind: "location", text: "Times Square", ... },
//    { kind: "time",     text: "tomorrow at 3pm", ... },
//    { kind: "email",    text: "alex@company.io", ... }]

// Redact PII in one line
Duckling(PIIParsers).redact(
  "Contact alex@company.io, SSN 078-05-1120, or call +14155552671",
);
// β†’ "Contact β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, SSN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, or call β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

Overview

ts-duckling uses parser combinator grammars (no ML models, no HTTP) to extract structured entities from text. Inspired by Facebook’s duckling, but built in pure TypeScript and running anywhere β€” Deno, Node, or the browser.

  • Deterministic β€” same input always produces the same output
  • Typed β€” parser selection narrows the return type automatically
  • Composable β€” bring your own parsers alongside the built-in ones
  • Self-contained β€” no native dependencies, no runtime downloads, no network calls
  • Runs everywhere β€” Deno, Node.js, and browsers (see the live playground)

Documentation

Installation

Deno / JSR

import { Duckling } from "jsr:@claudiu-ceia/ts-duckling";

Or add to your import map:

deno add jsr:@claudiu-ceia/ts-duckling

npm

npx jsr add @claudiu-ceia/ts-duckling

Getting started

Extract entities

Call Duckling() with no arguments to use all 15 built-in parsers:

import { Duckling } from "@claudiu-ceia/ts-duckling";

const msg =
  "Hey! I'll be in Germany next Friday at 5pm. Shoot me a message at alex@company.io or visit https://example.com/invite";

for (const e of Duckling().extract(msg)) {
  console.log(e.kind, e.text);
}
// location  Germany
// time      next Friday at 5pm
// email     alex@company.io
// url       https://example.com/invite

Each entity carries structured data:

// entities[0]
{
  kind: "location",
  value: { location: "Germany" },
  start: 16,
  end: 23,
  text: "Germany"
}

Pick specific parsers

Pass an array of parsers to narrow both what gets extracted and the return type:

import { Duckling, Email, Time, URL } from "@claudiu-ceia/ts-duckling";

const entities = Duckling([Email.parser, URL.parser, Time.parser]).extract(
  "Ping me at alex@company.io or https://meet.com β€” available tomorrow at 2pm",
);
// entities: (EmailEntity | URLEntity | TimeEntity)[]

Redact PII

Use .redact() to replace matched entity spans with a mask character:

import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";

// Redact all PII (email, phone, IP, SSN, credit card, UUID, API key)
Duckling(PIIParsers).redact(
  "Patient email: john.doe@clinic.org, SSN 078-05-1120, phone +14155552671",
);
// β†’ "Patient email: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, SSN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ, phone β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

// Custom mask character
Duckling(PIIParsers).redact("Call +14155552671", { mask: "X" });
// β†’ "Call XXXXXXXXXXXX"

// Redact only specific kinds
Duckling(PIIParsers).redact(
  "Contact john.doe@clinic.org, SSN 078-05-1120",
  { kinds: ["ssn"] },
);
// β†’ "Contact john.doe@clinic.org, SSN β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ"

Render entities

Use .render() to replace entity spans via a callback β€” perfect for turning plain-text messages into HTML with highlighted or linked entities:

import { Duckling } from "@claudiu-ceia/ts-duckling";

const msg =
  "Hey! Meet at Times Square tomorrow at 3pm, email me at alex@company.io or check https://example.com/rsvp";

const html = Duckling().render(msg, ({ entity, children }) => {
  switch (entity.kind) {
    case "url":
      return `<a href="${children}">${children}</a>`;
    case "email":
      return `<a href="mailto:${children}">${children}</a>`;
    default:
      return `<mark data-kind="${entity.kind}">${children}</mark>`;
  }
});
// β†’ 'Hey! Meet at <mark data-kind="location">Times Square</mark>
//    <mark data-kind="time">tomorrow at 3pm</mark>, email me at
//    <a href="mailto:alex@company.io">alex@company.io</a> or check
//    <a href="https://example.com/rsvp">https://example.com/rsvp</a>'

Nested entities (e.g. an SSN containing quantity sub-parts) are rendered inside-out β€” inner entities are transformed first, and the parent receives the result:

import { Duckling, Quantity, SSN } from "@claudiu-ceia/ts-duckling";

Duckling([Quantity.parser, SSN.parser]).render(
  "SSN 123-45-6789",
  ({ entity, children }) => `<${entity.kind}>${children}</${entity.kind}>`,
);
// β†’ "SSN <ssn><quantity>123</quantity>-<quantity>45</quantity>-<quantity>6789</quantity></ssn>"

Return undefined to leave a span unchanged β€” useful for selective rendering:

import { Duckling } from "@claudiu-ceia/ts-duckling";

// Only make URLs clickable, leave everything else as plain text
Duckling().render(
  "Visit https://example.com β€” event on next Friday at 5pm in Germany",
  ({ entity, children }) => {
    if (entity.kind === "url") return `<a href="${children}">${children}</a>`;
    return undefined;
  },
);
// β†’ 'Visit <a href="https://example.com">https://example.com</a> β€” event on next Friday at 5pm in Germany'

Map entities to components

Use .renderMap() when you need an array of segments instead of a single string β€” ideal for React, Preact, Solid, or any framework that renders element trees:

import { Duckling } from "@claudiu-ceia/ts-duckling";

const msg = "Hey! I'm at Times Square, email me at alex@company.io";

const segments = Duckling().renderMap<JSX.Element>(
  msg,
  ({ entity, children }) => (
    <mark key={entity.start} data-kind={entity.kind}>
      {children}
    </mark>
  ),
);
// β†’ ["Hey! I'm at ", <mark data-kind="location">Times Square</mark>,
//    ", email me at ", <mark data-kind="email">alex@company.io</mark>]

// Drop it straight into a component
function HighlightedMessage({ text }: { text: string }) {
  const segments = Duckling().renderMap<JSX.Element>(
    text,
    ({ entity, children }) => {
      switch (entity.kind) {
        case "url":
          return <a href={entity.text}>{children}</a>;
        case "email":
          return <a href={`mailto:${entity.text}`}>{children}</a>;
        case "time":
          return <time>{children}</time>;
        default:
          return <mark data-kind={entity.kind}>{children}</mark>;
      }
    },
  );

  return <p>{segments}</p>;
}

Like .render(), nested entities are handled automatically β€” child spans are mapped first, and the parent callback receives the already-mapped children as (string | R)[].

Custom entities

Define a parser that returns an Entity, then pass it to Duckling:

import { createLanguage, map, type Parser, regex } from "@claudiu-ceia/combine";
import { Duckling, ent, type Entity } from "@claudiu-ceia/ts-duckling";

type HashtagEntity = Entity<"hashtag", { tag: string }>;

type HashtagLanguage = {
  Full: Parser<HashtagEntity>;
  parser: Parser<HashtagEntity>;
};

const Hashtag = createLanguage<HashtagLanguage>({
  Full: () =>
    map(
      regex(/#[A-Za-z0-9_]{2,64}/, "hashtag"),
      (m, b, a) => ent({ tag: m.slice(1) }, "hashtag", b, a),
    ),
  parser: (s) => s.Full,
});

const entities = Duckling([Hashtag.parser]).extract("hello #duckling");
// β†’ [{ kind: "hashtag", value: { tag: "duckling" }, start: 6, end: 15, text: "#duckling" }]

Custom parsers compose freely with the built-in ones:

import { Email } from "@claudiu-ceia/ts-duckling";

const entities = Duckling([Email.parser, Hashtag.parser]).extract(
  "Email alex@company.io with #feedback",
);
// entities: (EmailEntity | HashtagEntity)[]

Supported entities

Entity Kind Example match Notes
Time time tomorrow at 3pm, 2024-01-15T10:30:00Z Relative, day-of-week, ISO timestamps
Range range 2020-2024, 20Β°C to 30Β°C Time, year, and temperature ranges
Temperature temperature 72Β°F, 20 celsius Fahrenheit and Celsius
Quantity quantity 5 kg, 100 miles Units of measurement
Location location United States, Germany Countries (dataset-backed)
URL url https://example.com/path Full URLs with TLD validation
Email email user@example.com Standard email addresses
Institution institution University of Oxford Known institutions
Language language English, Japanese Language names (dataset-backed)
Phone phone +14155552671 E.164-ish phone numbers
IP address ip_address 192.168.1.1, ::1 IPv4 + IPv6 full form
SSN ssn 123-45-6789 US Social Security Numbers
Credit card credit_card 4111111111111111 Luhn-validated card numbers
UUID uuid 550e8400-e29b-41d4-a716-446655440000 RFC 4122 UUIDs
API key api_key sk-abc123..., AKIA... Common provider prefixes

API reference

Duckling()

function Duckling(): { extract; render; renderMap; redact };
function Duckling<T>(parsers: ParserTuple<T>): {
  extract;
  render;
  renderMap;
  redact;
};

Creates an extractor/renderer/redactor. Without arguments, uses all 15 built-in parsers and returns AnyEntity[]. When given an explicit parser array, the return type narrows to the union of those entity types.

.extract(text)

extract(text: string): Entity[]

Scans text and returns all matched entities, each with kind, value, start, end, and text fields. Entities are returned in order of appearance.

.render(text, fn)

render(text: string, fn: RenderFn<Entity>): string

Extracts entities, arranges them into a span tree (wider spans parent narrower ones), and calls fn for each entity node. The callback receives the entity and the already-rendered text of its children. Return a replacement string, or undefined to leave the span as-is.

.renderMap(text, fn)

renderMap<R>(text: string, fn: RenderMapFn<Entity, R>): (string | R)[]

Like .render(), but instead of producing a single string, returns an array of segments: plain-text strings interleaved with values of type R produced by your callback. This is the API you want for React/JSX β€” map entities to elements, and the result is ready to drop into a component’s children.

The callback receives { entity, children } where children is (string | R)[] β€” nested entities are already mapped.

.redact(text, opts?)

redact(text: string, opts?: RedactOptions): string

Built on top of .render(). Extracts entities then replaces each matched span with opts.mask (default "β–ˆ"). When opts.kinds is set, only those entity kinds are masked. Overlapping/nested spans are resolved via the span tree.

PIIParsers

const PIIParsers: [
  EmailParser,
  PhoneParser,
  IPAddressParser,
  SSNParser,
  CreditCardParser,
  UUIDParser,
  ApiKeyParser,
];

Pre-built parser tuple for PII-sensitive entities. Use with Duckling(PIIParsers) for a quick redaction pipeline.

RedactOptions

interface RedactOptions<K extends string = string> {
  mask?: string; // default: "β–ˆ"
  kinds?: K[]; // when omitted, all entities are redacted
}

RenderFn

type RenderFn<E> = (ctx: {
  entity: E;
  children: string;
}) => string | undefined;

Callback for .render(). Receives the entity and the already-rendered text of its nested children. Return a replacement string, or undefined to leave the span unchanged.

RenderMapFn

type RenderMapFn<E, R> = (ctx: {
  entity: E;
  children: (string | R)[];
}) => R;

Callback for .renderMap(). Receives the entity and its children as an array of plain-text strings and already-mapped R values. Return a value of type R to replace the span.

AnyEntity

Union of all 15 built-in entity types. This is the return element type of Duckling().extract(...).

PIIEntity

Union of the 7 PII entity types: EmailEntity | PhoneEntity | IPAddressEntity | SSNEntity | CreditCardEntity | UUIDEntity | ApiKeyEntity.

Caveats

ts-duckling uses grammar-based parsers, not ML. This means:

  • Deterministic: same input β†’ same output, every time
  • Fast: no model loading, no network calls
  • But imperfect: expect false positives/negatives for ambiguous inputs

For example:

// ts-duckling interprets 6/2022 as a date
Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
// β†’ [{ kind: "time", text: "6/2022", ... }]

If you need high accuracy on messy, ambiguous real-world text, consider an ML-based solution. If you want predictable, fast extraction from structured or semi-structured text (messages, forms, logs), ts-duckling is a great fit.

Playground

Try ts-duckling in the browser: Live Playground

Paste or type any text and see entities extracted in real-time. You can also fetch content from a URL to test against real web pages.

License

MIT Β© Claudiu Ceia