Language Profiles

Functions for querying and extending transliteration language profiles.

list_langs

list_langs

list_langs() -> list[str]

Return available language codes for transliteration.

Returns:
  • list[str]

    Sorted list of language code strings (e.g. ["ar", "bg", "de", ...]).

Raises:
  • DisarmError

    If the language table lock is poisoned.

Examples:

>>> "de" in list_langs()
True
>>> "ja" in list_langs()
True

Example

from disarm import list_langs

langs = list_langs()
assert langs == ['am', 'ar', 'as', 'ban', 'bax', 'bg', 'bn', 'bo', 'bug', 'ca', 'chr', 'cjm', 'cop', 'cs', 'cy', 'da', 'de', 'dv', 'el', 'es', 'et', 'fa', 'fi', 'fr', 'ga', 'gu', 'he', 'hi', 'hr', 'hu', 'hy', 'is', 'it', 'ja', 'ja-kunrei', 'jv', 'ka', 'khb', 'km', 'kn', 'ko', 'lis', 'lo', 'lt', 'lv', 'ml', 'mn', 'mni', 'mr', 'mt', 'my', 'ne', 'nl', 'no', 'nod', 'nqo', 'or', 'pa', 'pl', 'pt', 'ro', 'ru', 'sa', 'sat', 'si', 'sk', 'sl', 'sq', 'sr', 'su', 'sv', 'syr', 'ta', 'tdd', 'te', 'th', 'tl', 'tr', 'tzm', 'uk', 'vai', 'vi', 'zh']

Returns both built-in and user-registered language codes, sorted alphabetically.

Tip

Use lang="auto" to auto-detect the language from the dominant non-Latin script in the input, instead of specifying a code manually. See Language Support for details.


register_lang

register_lang

register_lang(code: str, mappings: dict[str, str]) -> None

Register or override a transliteration mapping for a language code.

.. warning:: This mutates process-global state consulted by every transliterate/slugify/catalog_key/… call in the interpreter. Treat it as startup-only / single-writer configuration: do not call it from request-handling or library code in a multi-tenant process, where it would silently alter every other caller's output. Call :func:seal_registrations after startup to make further changes raise.

.. note:: Mappings keyed on ASCII characters do not apply to pure-ASCII input. The core takes a fast path that returns all-ASCII text unchanged before consulting language tables (ASCII is the transliteration target, so it is normally identity). Language profiles are meant for non-ASCII source characters (e.g. äae). To remap an ASCII character, use :func:register_replacements instead — its keys run as a pre-pass that executes ahead of the ASCII fast path and therefore do apply.

Parameters:
  • code (str) –

    Language code string (e.g. "xx", "custom").

  • mappings (dict[str, str]) –

    Dict of source→replacement character mappings.

Raises:
  • DisarmError

    If registrations are sealed, the language table lock is poisoned, or the mapping cannot be stored.

Examples:

>>> register_lang("xx", {"Ä": "Ae", "ä": "ae", "Ö": "Oe", "ö": "oe"})
>>> transliterate("Ärger", lang="xx")
'Aerger'

Example

from disarm import register_lang, transliterate

register_lang("eo", {
    "ĉ": "cx", "ĝ": "gx", "ĥ": "hx",
    "ĵ": "jx", "ŝ": "sx", "ŭ": "ux",
})

assert transliterate("ĉapelo", lang="eo") == 'cxapelo'

# Verify registration
from disarm import list_langs
assert "eo" in list_langs()

Warning

This is a global, process-wide operation. Registered profiles persist for the lifetime of the Python process and are visible to all threads.


register_replacements

register_replacements

register_replacements(replacements: dict[str, str]) -> None

Register global pre-transliteration replacements.

New entries are merged into the existing table. Existing keys are silently overwritten. Use :func:clear_replacements to wipe the table, or :func:remove_replacement to remove a single key.

Replacements are applied to the input as a left-to-right pre-pass before the main transliteration tables, using longest-match-at-each-position semantics (the longest registered key matching at a position wins, and its output is not re-scanned, so replacements never cascade). Keys may be multi-character and may be ASCII.

.. warning:: Like :func:register_lang, this mutates process-global state shared by every caller. Treat it as startup-only / single-writer configuration and call :func:seal_registrations afterwards in multi-tenant processes.

Parameters:
  • replacements (dict[str, str]) –

    Dict of source→replacement string mappings, applied before the main transliteration tables.

Examples:

>>> register_replacements({"™": "(tm)"})
>>> transliterate("hello™")
'hello(tm)'
>>> clear_replacements()

Example

from disarm import register_replacements, transliterate

register_replacements({
    "©": "(c)",
    "®": "(R)",
    "™": "(TM)",
})

assert transliterate("Hello™ World©") == 'Hello(TM) World(c)'

Replacements are applied as a pre-processing step before the character-by-character transliteration lookup. They are global and persist for the process lifetime.


remove_replacement

remove_replacement

remove_replacement(key: str) -> bool

Remove a single global pre-transliteration replacement by key.

Parameters:
  • key (str) –

    The source string to remove from the replacement table.

Returns:
  • bool

    True if the key was present and removed, False otherwise.

Examples:

>>> register_replacements({"©": "(c)"})
>>> remove_replacement("©")
True
>>> remove_replacement("©")
False

Example

from disarm import register_replacements, remove_replacement, transliterate

register_replacements({"©": "(c)", "®": "(R)"})
assert transliterate("©®") == '(c)(R)'

assert remove_replacement("©") == True
assert remove_replacement("©") == False
assert transliterate("©®") == '(c)(R)'

clear_replacements

clear_replacements

clear_replacements() -> None

Clear all global pre-transliteration replacements.

Examples:

>>> register_replacements({"©": "(c)", "®": "(r)"})
>>> clear_replacements()

Example

from disarm import register_replacements, clear_replacements, transliterate

register_replacements({"©": "(c)", "®": "(R)"})
assert transliterate("©®") == '(c)(R)'

clear_replacements()
assert transliterate("©®") == '(c)(R)'

Note

clear_replacements() removes all user-registered replacements. Built-in transliteration tables are not affected.