Command-Line Interface¶
disarm provides a command-line tool for transliteration, slugification, normalization, and text processing. It reads from arguments or stdin and writes to stdout, making it composable with other Unix tools.
Installation¶
pip install disarm
After installation, the disarm command is available:
disarm t "café"
# cafe
You can also run it as a Python module:
python -m disarm t "café"
Commands¶
Every command has a short alias for faster typing in pipelines.
| Command | Alias | Description |
|---|---|---|
transliterate |
t |
Convert Unicode text to ASCII |
slugify |
s |
Generate URL-safe slugs |
normalize |
n |
Apply Unicode normalization |
pipeline |
p |
Run multi-step text processing |
demojize |
d |
Expand emoji to text descriptions |
transliterate (t)¶
Convert Unicode text to ASCII using language-aware transliteration tables.
disarm t "café résumé"
# cafe resume
disarm t "Москва"
# Moskva
disarm t "北京市"
# bei jing shi
Options:
--lang CODE- Apply language-specific transliteration rules. Use
autofor script-based detection.
disarm t --lang de "Ärger über Ölförderung"
# Aerger ueber Oelfoerderung
disarm t --lang auto "Москва"
# Moskva
--target CODE- Reverse transliteration — convert romanized Latin text back to a native script. Mutually exclusive with
--lang.
disarm t --target ru "Moskva"
# Москва
disarm t --target el "Athina"
# Αθηνα
--tones- Include tone marks in Chinese pinyin output.
disarm t --tones "北京"
# běi jīng
--strict-iso9- Use the scholarly ASCII (ISO 9-style) transliteration for Cyrillic. NOTE: ASCII digraphs (zh/ch/sh), not the diacritic ISO 9:1995 standard.
disarm t --strict-iso9 "Юрий"
# Ûrij
--gost7034- Use GOST R 7.0.34 transliteration for Cyrillic.
slugify (s)¶
Generate URL-safe slugs from Unicode text.
disarm s "Hello, World!"
# hello-world
disarm s "Ärger im Büro"
# arger-im-buro
disarm s --lang de "Ärger im Büro"
# aerger-im-buero
Options:
--lang CODE- Language-specific transliteration before slugification.
--separator CHAR- Separator character (default:
-).
disarm s --separator "_" "Hello World"
# hello_world
--max-length N- Maximum slug length.
disarm s --max-length 10 "A very long blog post title"
# a-very-lon
normalize (n)¶
Apply Unicode normalization.
disarm n "café"
# café (NFC — composed form, the default)
disarm n --form NFKC "fi"
# fi
disarm n --form NFD "é"
# é (two codepoints: e + combining acute accent)
Options:
--form {NFC,NFD,NFKC,NFKD}- Normalization form (default:
NFC).
pipeline (p)¶
Run multiple processing steps in a single pass.
disarm p --steps "normalize,fold_case,transliterate" "Héllo WÖRLD"
# hello world
disarm p --steps "normalize,strip_accents,fold_case" "Café Résumé"
# cafe resume
Options:
--steps STEPS- Comma-separated list of processing steps (required).
Available steps: normalize, transliterate, fold_case, collapse_whitespace, strip_accents, confusables, strip_control, strip_zero_width, demojize.
--form FORM- Normalization form when using the
normalizestep.
demojize (d)¶
Expand emoji to their text descriptions.
disarm d "Hello 😀 World 🌍"
# Hello grinning face World globe showing Europe-Africa
Piping and stdin¶
All commands accept input from stdin when no positional argument is given. This makes disarm composable with other tools:
# Process a file
cat names.txt | disarm t
# Chain with other commands
echo "Ünïcödé Tëxt" | disarm t
# Unicode Text
# Slugify each line of a file
while IFS= read -r line; do
echo "$line" | disarm s
done < titles.txt
# Use with xargs
cat words.txt | xargs -I{} disarm t "{}"
# Combine with sort/uniq for deduplication
cat entries.txt | disarm t | sort -u
Exit codes¶
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | No input provided (no argument and no stdin) |
| 2 | Invalid arguments (unknown command, bad option) |