Language Profiles¶
Functions for querying and extending transliteration language profiles.
list_langs¶
list_langs ¶
list_langs() -> list[str]
Return available language codes for transliteration.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> "de" in list_langs()
True
>>> "ja" in list_langs()
True
Example¶
from disarm import list_langs
langs = list_langs()
assert langs == ['am', 'ar', 'as', 'ban', 'bax', 'bg', 'bn', 'bo', 'bug', 'ca', 'chr', 'cjm', 'cop', 'cs', 'cy', 'da', 'de', 'dv', 'el', 'es', 'et', 'fa', 'fi', 'fr', 'ga', 'gu', 'he', 'hi', 'hr', 'hu', 'hy', 'is', 'it', 'ja', 'ja-kunrei', 'jv', 'ka', 'khb', 'km', 'kn', 'ko', 'lis', 'lo', 'lt', 'lv', 'ml', 'mn', 'mni', 'mr', 'mt', 'my', 'ne', 'nl', 'no', 'nod', 'nqo', 'or', 'pa', 'pl', 'pt', 'ro', 'ru', 'sa', 'sat', 'si', 'sk', 'sl', 'sq', 'sr', 'su', 'sv', 'syr', 'ta', 'tdd', 'te', 'th', 'tl', 'tr', 'tzm', 'uk', 'vai', 'vi', 'zh']
Returns both built-in and user-registered language codes, sorted alphabetically.
Tip
Use lang="auto" to auto-detect the language from the dominant non-Latin script in the input, instead of specifying a code manually. See Language Support for details.
register_lang¶
register_lang ¶
register_lang(code: str, mappings: dict[str, str]) -> None
Register or override a transliteration mapping for a language code.
.. warning::
This mutates process-global state consulted by every
transliterate/slugify/catalog_key/… call in the interpreter.
Treat it as startup-only / single-writer configuration: do not call
it from request-handling or library code in a multi-tenant process, where
it would silently alter every other caller's output. Call
:func:seal_registrations after startup to make further changes raise.
.. note::
Mappings keyed on ASCII characters do not apply to pure-ASCII input.
The core takes a fast path that returns all-ASCII text unchanged before
consulting language tables (ASCII is the transliteration target, so it
is normally identity). Language profiles are meant for non-ASCII source
characters (e.g. ä→ae). To remap an ASCII character, use
:func:register_replacements instead — its keys run as a pre-pass that
executes ahead of the ASCII fast path and therefore do apply.
| Parameters: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> register_lang("xx", {"Ä": "Ae", "ä": "ae", "Ö": "Oe", "ö": "oe"})
>>> transliterate("Ärger", lang="xx")
'Aerger'
Example¶
from disarm import register_lang, transliterate
register_lang("eo", {
"ĉ": "cx", "ĝ": "gx", "ĥ": "hx",
"ĵ": "jx", "ŝ": "sx", "ŭ": "ux",
})
assert transliterate("ĉapelo", lang="eo") == 'cxapelo'
# Verify registration
from disarm import list_langs
assert "eo" in list_langs()
Warning
This is a global, process-wide operation. Registered profiles persist for the lifetime of the Python process and are visible to all threads.
register_replacements¶
register_replacements ¶
register_replacements(replacements: dict[str, str]) -> None
Register global pre-transliteration replacements.
New entries are merged into the existing table. Existing keys are
silently overwritten. Use :func:clear_replacements to wipe the
table, or :func:remove_replacement to remove a single key.
Replacements are applied to the input as a left-to-right pre-pass before the main transliteration tables, using longest-match-at-each-position semantics (the longest registered key matching at a position wins, and its output is not re-scanned, so replacements never cascade). Keys may be multi-character and may be ASCII.
.. warning::
Like :func:register_lang, this mutates process-global state shared
by every caller. Treat it as startup-only / single-writer configuration
and call :func:seal_registrations afterwards in multi-tenant processes.
| Parameters: |
|
|---|
Examples:
>>> register_replacements({"™": "(tm)"})
>>> transliterate("hello™")
'hello(tm)'
>>> clear_replacements()
Example¶
from disarm import register_replacements, transliterate
register_replacements({
"©": "(c)",
"®": "(R)",
"™": "(TM)",
})
assert transliterate("Hello™ World©") == 'Hello(TM) World(c)'
Replacements are applied as a pre-processing step before the character-by-character transliteration lookup. They are global and persist for the process lifetime.
remove_replacement¶
remove_replacement ¶
remove_replacement(key: str) -> bool
Remove a single global pre-transliteration replacement by key.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> register_replacements({"©": "(c)"})
>>> remove_replacement("©")
True
>>> remove_replacement("©")
False
Example¶
from disarm import register_replacements, remove_replacement, transliterate
register_replacements({"©": "(c)", "®": "(R)"})
assert transliterate("©®") == '(c)(R)'
assert remove_replacement("©") == True
assert remove_replacement("©") == False
assert transliterate("©®") == '(c)(R)'
clear_replacements¶
clear_replacements ¶
clear_replacements() -> None
Clear all global pre-transliteration replacements.
Examples:
>>> register_replacements({"©": "(c)", "®": "(r)"})
>>> clear_replacements()
Example¶
from disarm import register_replacements, clear_replacements, transliterate
register_replacements({"©": "(c)", "®": "(R)"})
assert transliterate("©®") == '(c)(R)'
clear_replacements()
assert transliterate("©®") == '(c)(R)'
Note
clear_replacements() removes all user-registered replacements. Built-in transliteration tables are not affected.