dataknobs

dataknobs-xization

Text normalization and tokenization tools.

Installation

pip install dataknobs-xization

Features

Usage

from dataknobs_xization import normalize, MaskingTokenizer

# Text normalization
normalized = normalize.normalize_text("Hello, World!")

# Tokenization with masking
tokenizer = MaskingTokenizer()
tokens = tokenizer.tokenize("This is a sample text.")

# Working with annotations
from dataknobs_xization import annotations
doc = annotations.create_document("Sample text", {"metadata": "value"})

Dependencies

This package depends on:

License

See LICENSE file in the root repository.