API Reference

This page provides detailed API documentation for pytector.

Core Classes

class pytector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

Bases: object

predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}
default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']
default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']
default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'
default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'
default_keyword_block_hazard_code = 'KEYWORD_BLOCK'
__init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
check_input_keywords(prompt)[source]
check_output_keywords(response)[source]
add_input_keywords(keywords)[source]
add_output_keywords(keywords)[source]
remove_input_keywords(keywords)[source]
remove_output_keywords(keywords)[source]
get_input_keywords()[source]
get_output_keywords()[source]
set_input_block_message(message)[source]
set_output_block_message(message)[source]
set_keyword_block_hazard_code(hazard_code)[source]
get_input_block_message()[source]
get_output_block_message()[source]
get_keyword_block_hazard_code()[source]
detect_injection(prompt, threshold=None)[source]
detect_injection_api(prompt='This is a test prompt.', return_raw=False)[source]
report_injection_status(prompt, threshold=None)[source]
check_response_safety(response)[source]
class pytector.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

Bases: object

Sanitizes text input by removing or neutralising prompt injection attempts.

Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.

__init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
sanitize(text, return_details=False)[source]

Run the sanitisation pipeline on text.

Returns (cleaned_text, was_modified) by default. When return_details is True, returns (cleaned_text, was_modified, changes) where changes is a list of dicts describing each modification.

report_sanitization(text)[source]

Print a human-readable sanitisation report (mirrors PromptInjectionDetector.report_injection_status).

add_keywords(keywords)[source]
remove_keywords(keywords)[source]
get_keywords()[source]
class pytector.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Bases: object

Detect and optionally redact PII entities in text.

Parameters:
  • model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for token-classification.

  • threshold (float) – Minimum confidence score for an entity to be reported.

  • entity_types (list[str] | None) – If provided, only entities whose type is in this list are returned. None means all entity types are returned.

predefined_models: Dict[str, str] = {'pasteproof-v3': 'joneauxedgar/pasteproof-pii-detector-v2'}
SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')
__init__(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]
Parameters:
Return type:

None

scan(text, threshold=None)[source]

Scan text for PII entities.

Returns (has_pii, entities) where each entity dict contains text, type, score, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:
redact(text, threshold=None, replacement='[REDACTED]')[source]

Return a copy of text with detected PII replaced by replacement.

Entities are replaced right-to-left so character offsets stay valid.

Return type:

str

Parameters:
  • text (str)

  • threshold (float | None)

  • replacement (str)

report(text, threshold=None)[source]

Print a human-readable PII scan summary.

Return type:

None

Parameters:
get_entity_types()[source]

Return the tuple of entity types supported by the default model.

Return type:

Tuple[str, ...]

class pytector.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]

Bases: object

Classify text as toxic or non-toxic.

Parameters:
  • model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for text-classification.

  • threshold (float) – Score above which text is considered toxic.

predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}
__init__(model_name='citizenlab', threshold=0.5)[source]
Parameters:
Return type:

None

detect(text, threshold=None)[source]

Detect whether text is toxic.

Returns (is_toxic, score) mirroring the PromptInjectionDetector.detect_injection return signature.

Return type:

Tuple[bool, float]

Parameters:
report(text, threshold=None)[source]

Print a human-readable toxicity summary.

Return type:

None

Parameters:
static _extract_toxic_score(results)[source]

Normalise pipeline output into a single toxicity probability.

The citizenlab model returns [{"label": "toxic"|"non-toxic", "score": float}]. Other models may use LABEL_1 / LABEL_0 conventions.

Return type:

float

Parameters:

results (List[Dict[str, Any]])

class pytector.RegexScanner(patterns=None, use_defaults=True)[source]

Bases: object

Scan text for sensitive data using compiled regular expressions.

Parameters:
  • patterns (dict[str, str] | None) – Mapping of {PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults is True, or used alone when False.

  • use_defaults (bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).

__init__(patterns=None, use_defaults=True)[source]
Parameters:
Return type:

None

scan(text)[source]

Scan text against all active patterns.

Returns (has_matches, matches) where each match dict contains pattern_name, match, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:

text (str)

redact(text, replacement='[REDACTED]')[source]

Return a copy of text with all matches replaced by replacement.

Non-overlapping matches are replaced right-to-left so offsets stay valid.

Return type:

str

Parameters:
  • text (str)

  • replacement (str)

report(text)[source]

Print a human-readable scan summary.

Return type:

None

Parameters:

text (str)

add_pattern(name, pattern)[source]

Add or overwrite a pattern at runtime.

Return type:

None

Parameters:
remove_pattern(name)[source]

Remove a pattern by name. No-op if not present.

Return type:

None

Parameters:

name (str)

get_patterns()[source]

Return a copy of the active pattern dictionary.

Return type:

Dict[str, str]

static _merge_overlapping(matches)[source]

Merge overlapping spans so redaction doesn’t double-replace.

Return type:

List[Dict[str, Any]]

Parameters:

matches (List[Dict[str, Any]])

class pytector.CanaryToken(token=None, length=16, prefix='CANARY-')[source]

Bases: object

Generate, embed, and detect canary tokens in LLM interactions.

Parameters:
  • token (str | None) – Explicit canary string. If None a random token is generated.

  • length (int) – Length of the auto-generated token (ignored when token is given).

  • prefix (str) – Prefix prepended to auto-generated tokens for easy grep-ability.

__init__(token=None, length=16, prefix='CANARY-')[source]
Parameters:
  • token (str | None)

  • length (int)

  • prefix (str)

Return type:

None

property token: str

The canary string.

wrap(system_prompt)[source]

Return system_prompt with the canary instruction appended.

The instruction tells the model to never repeat the canary.

Return type:

str

Parameters:

system_prompt (str)

check(model_output)[source]

Check whether the model leaked the canary.

Returns (leaked, token) where leaked is True when the canary appears in model_output and token is the matched string (or None if clean).

Return type:

Tuple[bool, Optional[str]]

Parameters:

model_output (str)

report(model_output)[source]

Print a human-readable leak check summary.

Return type:

None

Parameters:

model_output (str)

PromptInjectionDetector

class pytector.detector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

Bases: object

predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}
default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']
default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']
default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'
default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'
default_keyword_block_hazard_code = 'KEYWORD_BLOCK'
__init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
check_input_keywords(prompt)[source]
check_output_keywords(response)[source]
add_input_keywords(keywords)[source]
add_output_keywords(keywords)[source]
remove_input_keywords(keywords)[source]
remove_output_keywords(keywords)[source]
get_input_keywords()[source]
get_output_keywords()[source]
set_input_block_message(message)[source]
set_output_block_message(message)[source]
set_keyword_block_hazard_code(hazard_code)[source]
get_input_block_message()[source]
get_output_block_message()[source]
get_keyword_block_hazard_code()[source]
detect_injection(prompt, threshold=None)[source]
detect_injection_api(prompt='This is a test prompt.', return_raw=False)[source]
report_injection_status(prompt, threshold=None)[source]
check_response_safety(response)[source]

LangChain Integration

exception pytector.langchain.PromptInjectionBlockedError[source]

Bases: ValueError

Raised when a prompt is blocked by the guard.

class pytector.langchain.PytectorGuard(*args, **kwargs)[source]

Bases: RunnableSerializable[str, str]

LangChain Runnable that blocks unsafe prompts before downstream steps run.

For safe inputs the original string is passed through unchanged.

Parameters:
  • args (Any)

  • name (str | None)

  • model_name_or_url (str)

  • threshold (float)

  • use_groq (bool)

  • api_key (str | None)

  • groq_model (str)

  • fallback_message (str | None)

  • block_on_api_error (bool)

  • detector_kwargs (dict[str, Any])

model_name_or_url: str
threshold: float
use_groq: bool
api_key: str | None
groq_model: str
fallback_message: str | None
block_on_api_error: bool
detector_kwargs: dict[str, Any]
invoke(input, config=None, **kwargs)[source]

Transform a single input into an output.

Parameters:
  • input (str) – The input to the Runnable.

  • config (Optional[RunnableConfig]) –

    A config to use when invoking the Runnable.

    The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys.

    Please refer to RunnableConfig for more details.

  • kwargs (Any)

Return type:

str

Returns:

The output of the Runnable.

model_config = {'extra': 'ignore', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self (BaseModel) – The BaseModel instance.

  • context (Any) – The context.

Return type:

None

PromptSanitizer

Input sanitization for prompt injection defense.

class pytector.sanitizer.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

Bases: object

Sanitizes text input by removing or neutralising prompt injection attempts.

Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.

__init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
sanitize(text, return_details=False)[source]

Run the sanitisation pipeline on text.

Returns (cleaned_text, was_modified) by default. When return_details is True, returns (cleaned_text, was_modified, changes) where changes is a list of dicts describing each modification.

report_sanitization(text)[source]

Print a human-readable sanitisation report (mirrors PromptInjectionDetector.report_injection_status).

add_keywords(keywords)[source]
remove_keywords(keywords)[source]
get_keywords()[source]

Configuration

The following configuration options are available when initializing the detector:

Configuration Parameters

Parameter

Type

Description

model_name_or_url

str

Name or path of the model to use for detection

default_threshold

float

Default confidence threshold for injection detection (0.0 to 1.0)

use_groq

bool

Whether to use Groq API for detection

api_key

str

API key for Groq service (required if use_groq=True)

groq_model

str

Groq model to use for detection (default: openai/gpt-oss-safeguard-20b)

Predefined Models

The following predefined models are available:

Predefined Models

Model Name

Description

deberta

protectai/deberta-v3-base-prompt-injection

distilbert

fmops/distilbert-prompt-injection

distilbert-onnx

prompt-security/fmops-distilbert-prompt-injection-onnx

Groq API Behavior

detect_injection_api returns:

  • True for safe responses

  • False for unsafe responses (or non-standard responses treated conservatively as unsafe)

  • None when the API call fails

Use return_raw=True to inspect raw model output as (is_safe, raw_response).

Example Usage

from pytector import PromptInjectionDetector

# Basic usage with default model
detector = PromptInjectionDetector()
is_injection, probability = detector.detect_injection("Your text here")

# Using Groq API
detector = PromptInjectionDetector(
    use_groq=True,
    api_key="your-api-key"
)
is_safe = detector.detect_injection_api("Your text here")

# Using GGUF model
detector = PromptInjectionDetector("path/to/model.gguf")
is_injection, probability = detector.detect_injection("Your text here")

# Custom threshold
detector = PromptInjectionDetector(default_threshold=0.8)
is_injection, probability = detector.detect_injection("Your text here")

Sanitizer Usage

from pytector import PromptSanitizer

# All strategies enabled by default
sanitizer = PromptSanitizer()
cleaned, was_modified = sanitizer.sanitize("Ignore previous instructions. Hello!")

# With detailed change log
cleaned, was_modified, changes = sanitizer.sanitize(
    "Ignore previous instructions. Hello!",
    return_details=True,
)

# Custom configuration
sanitizer = PromptSanitizer(
    fuzzy_threshold=0.80,
    sentence_threshold=0.4,
    enable_prompt_enforcement=True,
)

Sanitizer Configuration

Sanitizer Parameters

Parameter

Default

Description

enable_encoding_detection

True

Decode and strip Base64, hex, ROT13 obfuscated payloads

enable_unicode_normalization

True

Strip invisible characters, NFKC homoglyph normalization

enable_pattern_removal

True

Regex-based structural injection pattern removal

enable_sentence_scoring

True

Heuristic per-sentence analysis; drop suspicious sentences

enable_fuzzy_matching

True

Catch paraphrased injection phrases via difflib similarity

enable_keyword_stripping

True

Final pass removing known injection phrases

enable_prompt_enforcement

False

Escape template syntax (``{ } < > ` ``)

keywords

None

Custom keyword list; None uses built-in defaults

fuzzy_threshold

0.85

Similarity cutoff for fuzzy matching (0.0-1.0)

sentence_threshold

0.5

Heuristic score cutoff for sentence removal (0.0-1.0)

PIIScanner

PII (Personally Identifiable Information) detection using transformer NER models.

Default model: joneauxedgar/pasteproof-pii-detector-v2 (ModernBERT-base, F1 0.97, 27 entity types — hosted as v3 weights on HuggingFace). Requires transformers >= 4.48.0 for ModernBERT support.

Citation

@model{pasteproof_pii_detector,
  author = {Jonathan Edgar},
  title  = {PasteProof PII Detector},
  year   = {2025},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}
class pytector.pii.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Bases: object

Detect and optionally redact PII entities in text.

Parameters:
  • model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for token-classification.

  • threshold (float) – Minimum confidence score for an entity to be reported.

  • entity_types (list[str] | None) – If provided, only entities whose type is in this list are returned. None means all entity types are returned.

predefined_models: Dict[str, str] = {'pasteproof-v3': 'joneauxedgar/pasteproof-pii-detector-v2'}
SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')
__init__(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]
Parameters:
Return type:

None

scan(text, threshold=None)[source]

Scan text for PII entities.

Returns (has_pii, entities) where each entity dict contains text, type, score, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:
redact(text, threshold=None, replacement='[REDACTED]')[source]

Return a copy of text with detected PII replaced by replacement.

Entities are replaced right-to-left so character offsets stay valid.

Return type:

str

Parameters:
  • text (str)

  • threshold (float | None)

  • replacement (str)

report(text, threshold=None)[source]

Print a human-readable PII scan summary.

Return type:

None

Parameters:
get_entity_types()[source]

Return the tuple of entity types supported by the default model.

Return type:

Tuple[str, ...]

Uses the PasteProof PII Detector (ModernBERT-base, F1 0.97) for NER-based PII detection across 27 entity types. Requires transformers >= 4.48.0 for ModernBERT support.

from pytector import PIIScanner

scanner = PIIScanner()
has_pii, entities = scanner.scan("Email john@acme.com, SSN 123-45-6789")
print(scanner.redact("Email john@acme.com, SSN 123-45-6789"))

# Filter to specific entity types
scanner = PIIScanner(entity_types=["EMAIL", "CREDIT_CARD"], threshold=0.7)
PIIScanner Parameters

Parameter

Type

Description

model_name

str

Predefined key (pasteproof-v3) or HuggingFace model ID / local path

threshold

float

Minimum confidence for an entity to be reported (default 0.5)

entity_types

list[str] | None

Filter to specific types (e.g. ["EMAIL", "SSN"]); None = all

Citation

@model{pasteproof_pii_detector,
  author    = {Jonathan Edgar},
  title     = {PasteProof PII Detector},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}

ToxicityDetector

Toxicity detection using transformer sequence-classification models.

Default model: citizenlab/distilbert-base-multilingual-cased-toxicity (DistilBERT multilingual, F1-micro 0.94, 10 languages).

class pytector.toxicity.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]

Bases: object

Classify text as toxic or non-toxic.

Parameters:
  • model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for text-classification.

  • threshold (float) – Score above which text is considered toxic.

predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}
__init__(model_name='citizenlab', threshold=0.5)[source]
Parameters:
Return type:

None

detect(text, threshold=None)[source]

Detect whether text is toxic.

Returns (is_toxic, score) mirroring the PromptInjectionDetector.detect_injection return signature.

Return type:

Tuple[bool, float]

Parameters:
report(text, threshold=None)[source]

Print a human-readable toxicity summary.

Return type:

None

Parameters:
static _extract_toxic_score(results)[source]

Normalise pipeline output into a single toxicity probability.

The citizenlab model returns [{"label": "toxic"|"non-toxic", "score": float}]. Other models may use LABEL_1 / LABEL_0 conventions.

Return type:

float

Parameters:

results (List[Dict[str, Any]])

Uses citizenlab/distilbert-base-multilingual-cased-toxicity (F1-micro 0.94, 10 languages) for toxicity classification.

from pytector import ToxicityDetector

detector = ToxicityDetector()
is_toxic, score = detector.detect("You are terrible")
detector.report("Have a wonderful day!")
ToxicityDetector Parameters

Parameter

Type

Description

model_name

str

Predefined key (citizenlab) or HuggingFace model ID / local path

threshold

float

Score above which text is considered toxic (default 0.5)

RegexScanner

Rule-based PII and credential detection using customisable regex patterns.

This module is pure Python stdlib — no model downloads, no heavy dependencies. It ships with sensible defaults for common PII types and lets users add, remove, or completely replace patterns at construction time or at runtime.

class pytector.regex_scanner.RegexScanner(patterns=None, use_defaults=True)[source]

Bases: object

Scan text for sensitive data using compiled regular expressions.

Parameters:
  • patterns (dict[str, str] | None) – Mapping of {PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults is True, or used alone when False.

  • use_defaults (bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).

__init__(patterns=None, use_defaults=True)[source]
Parameters:
Return type:

None

scan(text)[source]

Scan text against all active patterns.

Returns (has_matches, matches) where each match dict contains pattern_name, match, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:

text (str)

redact(text, replacement='[REDACTED]')[source]

Return a copy of text with all matches replaced by replacement.

Non-overlapping matches are replaced right-to-left so offsets stay valid.

Return type:

str

Parameters:
  • text (str)

  • replacement (str)

report(text)[source]

Print a human-readable scan summary.

Return type:

None

Parameters:

text (str)

add_pattern(name, pattern)[source]

Add or overwrite a pattern at runtime.

Return type:

None

Parameters:
remove_pattern(name)[source]

Remove a pattern by name. No-op if not present.

Return type:

None

Parameters:

name (str)

get_patterns()[source]

Return a copy of the active pattern dictionary.

Return type:

Dict[str, str]

static _merge_overlapping(matches)[source]

Merge overlapping spans so redaction doesn’t double-replace.

Return type:

List[Dict[str, Any]]

Parameters:

matches (List[Dict[str, Any]])

Pure-stdlib rule-based scanner with customizable patterns.

from pytector import RegexScanner

scanner = RegexScanner()
has_match, matches = scanner.scan("Key: sk-live-abc123def456")
print(scanner.redact("Email user@example.com"))

# Custom patterns only
custom = RegexScanner(
    patterns={"ORDER_ID": r"ORD-\d{8}"},
    use_defaults=False,
)
RegexScanner Parameters

Parameter

Type

Description

patterns

dict[str, str] | None

{NAME: regex} mapping merged with defaults (or used alone)

use_defaults

bool

Whether to include built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN)

CanaryToken

Canary token generation and leak detection.

Inject a unique token into your system prompt. If the model’s output contains the canary, the system prompt was leaked — regardless of how clever the injection was.

Pure Python stdlib. Zero dependencies, zero calibration.

class pytector.canary.CanaryToken(token=None, length=16, prefix='CANARY-')[source]

Bases: object

Generate, embed, and detect canary tokens in LLM interactions.

Parameters:
  • token (str | None) – Explicit canary string. If None a random token is generated.

  • length (int) – Length of the auto-generated token (ignored when token is given).

  • prefix (str) – Prefix prepended to auto-generated tokens for easy grep-ability.

__init__(token=None, length=16, prefix='CANARY-')[source]
Parameters:
  • token (str | None)

  • length (int)

  • prefix (str)

Return type:

None

property token: str

The canary string.

wrap(system_prompt)[source]

Return system_prompt with the canary instruction appended.

The instruction tells the model to never repeat the canary.

Return type:

str

Parameters:

system_prompt (str)

check(model_output)[source]

Check whether the model leaked the canary.

Returns (leaked, token) where leaked is True when the canary appears in model_output and token is the matched string (or None if clean).

Return type:

Tuple[bool, Optional[str]]

Parameters:

model_output (str)

report(model_output)[source]

Print a human-readable leak check summary.

Return type:

None

Parameters:

model_output (str)

Inject a secret token into your system prompt and detect if the model leaks it. Pure stdlib — zero dependencies, zero calibration.

from pytector import CanaryToken

canary = CanaryToken()
system_prompt = canary.wrap("You are a helpful assistant.")
# ... pass to LLM, get response ...
leaked, token = canary.check(model_output)
CanaryToken Parameters

Parameter

Type

Description

token

str | None

Explicit canary string; None auto-generates one

length

int

Length of the random part of auto-generated tokens (default 16)

prefix

str

Prefix for auto-generated tokens (default CANARY-)