Text normalization and tokenization tools.
pip install dataknobs-xization
from dataknobs_xization import normalize, MaskingTokenizer
# Text normalization
normalized = normalize.normalize_text("Hello, World!")
# Tokenization with masking
tokenizer = MaskingTokenizer()
tokens = tokenizer.tokenize("This is a sample text.")
# Working with annotations
from dataknobs_xization import annotations
doc = annotations.create_document("Sample text", {"metadata": "value"})
This package depends on:
dataknobs-common
dataknobs-structures
dataknobs-utils
See LICENSE file in the root repository.