Slugification¶
disarm generates URL-safe slugs from Unicode text. The slugify() function is parameter-compatible with python-slugify, so migration requires only changing the import.
Basic usage¶
from disarm import slugify
assert slugify("Hello, World!") == 'hello-world'
assert slugify("My Blog Post — Draft #3") == 'my-blog-post-draft-3'
assert slugify("Ünïcödé Téxt") == 'unicode-text'
Parameters¶
separator¶
The character used between words (default: "-"):
assert slugify("hello world", separator="_") == 'hello_world'
assert slugify("hello world", separator=".") == 'hello.world'
lowercase¶
Whether to lowercase the output (default: True):
assert slugify("Hello World", lowercase=False) == 'Hello-World'
max_length¶
Truncate the slug to a maximum length (default: 0 = unlimited):
assert slugify("a very long title here", max_length=10) == 'a-very-lon'
word_boundary¶
When combined with max_length, truncate at word boundaries:
assert slugify("a very long title here", max_length=10, word_boundary=True) == 'a-very'
stopwords¶
Words to remove from the slug:
assert slugify("the quick brown fox", stopwords=["the", "brown"]) == 'quick-fox'
regex_pattern¶
Custom regex pattern for allowed characters:
assert slugify("hello 123 world", regex_pattern=r"[^a-z]+") == 'helloworld'
replacements¶
Pre-transliteration string replacements:
assert slugify("C++ Programming", replacements=[("C++", "cpp")]) == 'cpp-programming'
allow_unicode¶
Keep non-ASCII characters in the slug:
assert slugify("日本語テスト", allow_unicode=True) == '日本語テスト'
lang¶
Language profile for transliteration:
assert slugify("Ärger im Büro", lang="de") == 'aerger-im-buero'
Use lang="auto" to auto-detect the language from the script:
assert slugify("Москва", lang="auto") == 'moskva'
assert slugify("ภาษาไทย", lang="auto") == 'phasaaithy'
entities, decimal, hexadecimal¶
Decode HTML entities and numeric character references:
assert slugify("& test &") == 'test'
default¶
Fallback returned when the input has no sluggable characters (emoji, punctuation, or zero-width only) and would otherwise slug to the empty string — avoiding the routing hazard of multiple distinct inputs collapsing onto one empty-slug URL:
assert slugify("\U0001f525\U0001f525\U0001f525") == ''
assert slugify("\U0001f525\U0001f525\U0001f525", default="n-a") == 'n-a'
The fallback is sanitized through the same slug pipeline before being
returned, so a caller-derived default (a username, a filename) cannot inject
path-traversal or URL metacharacters into output that is assumed URL-safe. It is
also subject to the same max_length:
assert slugify("\U0001f525", default="../../etc/passwd") == 'etc-passwd'
assert slugify("\U0001f525", default="a/b?c#d") == 'a-b-c-d'
assert slugify("\U0001f525", default="this-is-long", max_length=5) == 'this'
A default that is itself unsluggable sanitizes to "".
default is available on every entry point — slugify(), Slugifier,
UniqueSlugifier, and Text.slugify. On UniqueSlugifier the fallback is made
unique like any other slug:
from disarm import UniqueSlugifier
u = UniqueSlugifier(default="n-a")
assert u("\U0001f525") == 'n-a'
assert u("\U0001f525") == 'n-a-1'
Reusable slugifiers¶
Slugifier¶
Pre-configure a slugifier for repeated use:
from disarm import Slugifier
slug = Slugifier(separator="_", lang="de", max_length=50)
assert slug("Ärger im Büro") == 'aerger_im_buero'
assert slug("Über den Wolken") == 'ueber_den_wolken'
UniqueSlugifier¶
Track previously generated slugs and append numeric suffixes for uniqueness:
from disarm import UniqueSlugifier
unique = UniqueSlugifier()
assert unique("My Post") == 'my-post'
assert unique("My Post") == 'my-post-1'
assert unique("My Post") == 'my-post-2'
unique.reset() # clear history
assert unique("My Post") == 'my-post'
External uniqueness check¶
Pass a callback for database-backed uniqueness:
def check_db(slug: str) -> bool:
"""Return True if slug already exists."""
return db.slugs.exists(slug)
unique = UniqueSlugifier(check=check_db)
unique("My Post") # queries check_db before returning
Full pipeline¶
The slugification pipeline executes in this order:
- Apply
replacements - Decode HTML entities (if
entities=True) - Decode decimal references (if
decimal=True) - Decode hexadecimal references (if
hexadecimal=True) - Transliterate (using
langif set), or keep Unicode (ifallow_unicode=True) - Lowercase (if
lowercase=True) - Apply
regex_pattern - Replace non-alphanumeric with
separator - Collapse consecutive separators
- Remove
stopwords - Truncate to
max_length(respectingword_boundaryandsave_order) - Strip leading/trailing separators