Performance¶

This page presents disarm's performance numbers, how to read them, and where they are recorded. Internals (why it is fast) live in Architecture: Performance; how to run and extend the suite lives in Benchmarks. Every figure here is a recorded, fingerprinted measurement — absolutes are non-comparable across hardware, and only the ratios are durable claims.

Results¶

Two regimes, quoted separately because they stress different things. Long text (documents, batch pipelines) is dominated by per-character lookup cost; short strings (one field per call — a name, a title, a slug) are dominated by the fixed Python→Rust crossing, which disarm pays exactly once, returning already-ASCII input as the original str object.

Long text — document-scale throughput (vs Unidecode unless noted):

Operation	Throughput	Speedup
Transliterate (Latin)	~450M chars/sec	~38×
Transliterate (Cyrillic)	~106M chars/sec	~15×
Slugify	~712K slugs/sec	~10–24× vs python-slugify
Batch transliterate (100 strings)	~2.8× vs Python loop	—

Short strings — per-call, ~70–85 character inputs (vs Unidecode):

Input	Speedup
Latin	~17×
Mixed scripts	~14×
Cyrillic / Greek	~13×
ASCII passthrough (~65 ns)	returns the original object

Slugify and filename sanitisation (per call, vs the dedicated library):

Operation	Comparator	Speedup	Note
`slugify`	python-slugify	~10–24×	also transliterates accented words
`sanitize_filename`	pathvalidate	~10–16×	also transliterates, collapses dot-runs, sanitises extensions

Unidecode's own four-cell benchmark — disarm wins every cell of the cross-product of Unidecode's two entry points (unidecode_expect_ascii, unidecode_expect_nonascii) and its two sample inputs:

Cell	Ratio (Unidecode time / disarm time)
`expect_ascii` / ASCII input	1.34× (65.1 ns vs 87.6 ns)
`expect_ascii` / non-ASCII input	8.87×
`expect_nonascii` / ASCII input	24.58×
`expect_nonascii` / non-ASCII input	6.31×

The narrowest cell (1.34×) is Unidecode's strongest case — pure ASCII through its ASCII-optimised entry point — and disarm still wins it via the return-original-object fast path. The clean-room replication is in benchmarks/bench_unidecode_own.py (only the methodology is reused; the GPL benchmark file is not copied).

How to read these numbers¶

Ratios are the durable claim; absolutes are presentation. Absolute ns / chars-per-sec figures are fingerprinted and not comparable across hardware.
Fresh-string regime. Every timed call receives a newly constructed str, as production traffic does, rather than re-running one cached object (which would understate the pure-Python comparators). Recorded as regime: fresh-string/v2 (#303).
Interleaved, median-of-N, pinned comparators. Each measurement times disarm and the comparator back-to-back per round and takes the median, so transient scheduler noise cancels in the ratio. CI installs the exact versions in requirements/bench.txt with --require-hashes. Our figures are rounded down, comparators' up.
Not a like-for-like race. A transliterate() call also consults language override tables, applies the requested error-handling mode, and checks the replacement registry — work a context-free transliterator does not do. ftfy is a mojibake repairer, not a transliterator, and never appears in a transliterate ratio.

Where disarm is slower¶

Visible admission of losses is the strongest defence against cherry-picking. Both are against CPython C builtins that operate directly on the internal string buffer — disarm cannot and does not try to beat them:

Operation	Faster tool	Why disarm trades it away
NFC / NFKC normalisation	`unicodedata.normalize` (C, single string)	`normalize()` uses one Unicode version (16.0) across every code path, so results never differ between CPython's bundled tables and the Rust crate's — consistency over speed
Case folding	`str.casefold()` (C builtin, zero-alloc)	`fold_case()` is within a small factor and dominated by the boundary crossing; use `str.casefold()` for a single string on CPython's Unicode version

Absolute numbers (fingerprinted, non-comparable)¶

Absolute figures are not comparable across hardware. The short-string figures below were recorded in the fresh-string regime (#303) on an AMD EPYC 7763 CI bucket (CPython 3.12, pinned comparators from requirements/bench.txt, median-of-7 interleaved); your numbers will differ.

Input (per call)	vs Unidecode
Latin diacritics (~70–85 chars)	~17×
Mixed scripts	~14×
Greek	~13.6×
Cyrillic	~13.4×
ASCII passthrough (~65 ns)	returns original object

Document-scale throughput (same bucket): ~450M chars/sec Latin (~38×), ~106M chars/sec Cyrillic (~15×), slugify ~712K slugs/sec (~10–24×). These match the figures in the project README. Emit the full environment fingerprint — CPU microarchitecture, CPython version and build, comparator versions, rustc version, git commit, date — that any absolute belongs to with:

python scripts/perf_fingerprint.py --json

More¶

Why it is fast (flat BMP array, single boundary crossing, borrowed Cow, range dispatch, GIL-released batch loops): Architecture: Performance.
Running and extending the suite (Criterion, pyperf, corpora, methodology): Benchmarks.
Reproduce the headline ratios:

pip install disarm[bench]                      # pinned, hash-locked comparators
python benchmarks/bench_ratio.py              # short-string ratios, per script
python benchmarks/bench_unidecode_own.py      # Unidecode's four-cell benchmark
python benchmarks/bench_vs_unidecode.py       # document-scale throughput
python scripts/perf_fingerprint.py --json     # record the environment