API Reference

This page provides detailed API documentation for pytector.

Core Classes

class pytector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

Bases: object

predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}

default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']

default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']

default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'

default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'

default_keyword_block_hazard_code = 'KEYWORD_BLOCK'

__init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

check_input_keywords(prompt)[source]

check_output_keywords(response)[source]

add_input_keywords(keywords)[source]

add_output_keywords(keywords)[source]

remove_input_keywords(keywords)[source]

remove_output_keywords(keywords)[source]

get_input_keywords()[source]

get_output_keywords()[source]

set_input_block_message(message)[source]

set_output_block_message(message)[source]

set_keyword_block_hazard_code(hazard_code)[source]

get_input_block_message()[source]

get_output_block_message()[source]

get_keyword_block_hazard_code()[source]

detect_injection(prompt, threshold=None)[source]

detect_injection_api(prompt='This is a test prompt.', return_raw=False)[source]

report_injection_status(prompt, threshold=None)[source]

check_response_safety(response)[source]

class pytector.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

Bases: object

Sanitizes text input by removing or neutralising prompt injection attempts.

Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.

__init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

sanitize(text, return_details=False)[source]

Run the sanitisation pipeline on text.

Returns (cleaned_text, was_modified) by default. When return_details is True, returns (cleaned_text, was_modified, changes) where changes is a list of dicts describing each modification.

report_sanitization(text)[source]: Print a human-readable sanitisation report (mirrors PromptInjectionDetector.report_injection_status).

add_keywords(keywords)[source]

remove_keywords(keywords)[source]

get_keywords()[source]

class pytector.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Bases: object

Detect and optionally redact PII entities in text.

Parameters:

model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for token-classification.
threshold (float) – Minimum confidence score for an entity to be reported.
entity_types (list[str] | None) – If provided, only entities whose type is in this list are returned. None means all entity types are returned.

predefined_models: Dict[str, str] = {'pasteproof-v3': 'joneauxedgar/pasteproof-pii-detector-v2'}

SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')

__init__(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Parameters:

model_name (str)
threshold (float)
entity_types (List[str] | None)

Return type:

None

scan(text, threshold=None)[source]

Scan text for PII entities.

Returns (has_pii, entities) where each entity dict contains text, type, score, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:

text (str)
threshold (float | None)

redact(text, threshold=None, replacement='[REDACTED]')[source]

Return a copy of text with detected PII replaced by replacement.

Entities are replaced right-to-left so character offsets stay valid.

Return type:

str

Parameters:

text (str)
threshold (float | None)
replacement (str)

report(text, threshold=None)[source]

Print a human-readable PII scan summary.

Return type:

None

Parameters:

text (str)
threshold (float | None)

get_entity_types()[source]

Return the tuple of entity types supported by the default model.

Return type:: Tuple[str, ...]

class pytector.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]

Bases: object

Classify text as toxic or non-toxic.

Parameters:

model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for text-classification.
threshold (float) – Score above which text is considered toxic.

predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}

__init__(model_name='citizenlab', threshold=0.5)[source]

Parameters:

model_name (str)
threshold (float)

Return type:

None

detect(text, threshold=None)[source]

Detect whether text is toxic.

Returns (is_toxic, score) mirroring the PromptInjectionDetector.detect_injection return signature.

Return type:

Tuple[bool, float]

Parameters:

text (str)
threshold (float | None)

report(text, threshold=None)[source]

Print a human-readable toxicity summary.

Return type:

None

Parameters:

text (str)
threshold (float | None)

static _extract_toxic_score(results)[source]

Normalise pipeline output into a single toxicity probability.

The citizenlab model returns [{"label": "toxic"|"non-toxic", "score": float}]. Other models may use LABEL_1 / LABEL_0 conventions.

Return type:: float
Parameters:: results (List[Dict[str, Any]])

class pytector.RegexScanner(patterns=None, use_defaults=True)[source]

Bases: object

Scan text for sensitive data using compiled regular expressions.

Parameters:

patterns (dict[str, str] | None) – Mapping of {PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults is True, or used alone when False.
use_defaults (bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).

__init__(patterns=None, use_defaults=True)[source]

Parameters:

patterns (Dict[str, str] | None)
use_defaults (bool)

Return type:

None

scan(text)[source]

Scan text against all active patterns.

Returns (has_matches, matches) where each match dict contains pattern_name, match, start, and end.

Return type:: Tuple[bool, List[Dict[str, Any]]]
Parameters:: text (str)

redact(text, replacement='[REDACTED]')[source]

Return a copy of text with all matches replaced by replacement.

Non-overlapping matches are replaced right-to-left so offsets stay valid.

Return type:

str

Parameters:

text (str)
replacement (str)

report(text)[source]

Print a human-readable scan summary.

Return type:: None
Parameters:: text (str)

add_pattern(name, pattern)[source]

Add or overwrite a pattern at runtime.

Return type:

None

Parameters:

name (str)
pattern (str)

remove_pattern(name)[source]

Remove a pattern by name. No-op if not present.

Return type:: None
Parameters:: name (str)

get_patterns()[source]

Return a copy of the active pattern dictionary.

Return type:: Dict[str, str]

static _merge_overlapping(matches)[source]

Merge overlapping spans so redaction doesn’t double-replace.

Return type:: List[Dict[str, Any]]
Parameters:: matches (List[Dict[str, Any]])

class pytector.CanaryToken(token=None, length=16, prefix='CANARY-')[source]

Bases: object

Generate, embed, and detect canary tokens in LLM interactions.

Parameters:

token (str | None) – Explicit canary string. If None a random token is generated.
length (int) – Length of the auto-generated token (ignored when token is given).
prefix (str) – Prefix prepended to auto-generated tokens for easy grep-ability.

__init__(token=None, length=16, prefix='CANARY-')[source]

Parameters:

token (str | None)
length (int)
prefix (str)

Return type:

None

property token: str: The canary string.

wrap(system_prompt)[source]

Return system_prompt with the canary instruction appended.

The instruction tells the model to never repeat the canary.

Return type:: str
Parameters:: system_prompt (str)

check(model_output)[source]

Check whether the model leaked the canary.

Returns (leaked, token) where leaked is True when the canary appears in model_output and token is the matched string (or None if clean).

Return type:: Tuple[bool, Optional[str]]
Parameters:: model_output (str)

report(model_output)[source]

Print a human-readable leak check summary.

Return type:: None
Parameters:: model_output (str)

PromptInjectionDetector

class pytector.detector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

Bases: object

predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}

default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']

default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']

default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'

default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'

default_keyword_block_hazard_code = 'KEYWORD_BLOCK'

__init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]

check_input_keywords(prompt)[source]

check_output_keywords(response)[source]

add_input_keywords(keywords)[source]

add_output_keywords(keywords)[source]

remove_input_keywords(keywords)[source]

remove_output_keywords(keywords)[source]

get_input_keywords()[source]

get_output_keywords()[source]

set_input_block_message(message)[source]

set_output_block_message(message)[source]

set_keyword_block_hazard_code(hazard_code)[source]

get_input_block_message()[source]

get_output_block_message()[source]

get_keyword_block_hazard_code()[source]

detect_injection(prompt, threshold=None)[source]

detect_injection_api(prompt='This is a test prompt.', return_raw=False)[source]

report_injection_status(prompt, threshold=None)[source]

check_response_safety(response)[source]

LangChain Integration

exception pytector.langchain.PromptInjectionBlockedError[source]

Bases: ValueError

Raised when a prompt is blocked by the guard.

class pytector.langchain.PytectorGuard(*args, **kwargs)[source]

Bases: RunnableSerializable[str, str]

LangChain Runnable that blocks unsafe prompts before downstream steps run.

For safe inputs the original string is passed through unchanged.

Parameters:

args (Any)
name (str | None)
model_name_or_url (str)
threshold (float)
use_groq (bool)
api_key (str | None)
groq_model (str)
fallback_message (str | None)
block_on_api_error (bool)
detector_kwargs (dict[str, Any])

model_name_or_url: str

threshold: float

use_groq: bool

api_key: str | None

groq_model: str

fallback_message: str | None

block_on_api_error: bool

detector_kwargs: dict[str, Any]

invoke(input, config=None, **kwargs)[source]

Transform a single input into an output.

Parameters:

input (str) – The input to the Runnable.
config (Optional[RunnableConfig]) –
A config to use when invoking the Runnable.

The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys.

Please refer to RunnableConfig for more details.
kwargs (Any)

Return type:

str

Returns:

The output of the Runnable.

model_config = {'extra': 'ignore', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

PromptSanitizer

Input sanitization for prompt injection defense.

class pytector.sanitizer.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

Bases: object

Sanitizes text input by removing or neutralising prompt injection attempts.

Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.

__init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]

sanitize(text, return_details=False)[source]

Run the sanitisation pipeline on text.

Returns (cleaned_text, was_modified) by default. When return_details is True, returns (cleaned_text, was_modified, changes) where changes is a list of dicts describing each modification.

report_sanitization(text)[source]: Print a human-readable sanitisation report (mirrors PromptInjectionDetector.report_injection_status).

add_keywords(keywords)[source]

remove_keywords(keywords)[source]

get_keywords()[source]

Configuration

The following configuration options are available when initializing the detector:

Configuration Parameters
Parameter	Type	Description
model_name_or_url	str	Name or path of the model to use for detection
default_threshold	float	Default confidence threshold for injection detection (0.0 to 1.0)
use_groq	bool	Whether to use Groq API for detection
api_key	str	API key for Groq service (required if use_groq=True)
groq_model	str	Groq model to use for detection (default: openai/gpt-oss-safeguard-20b)

Predefined Models

The following predefined models are available:

Predefined Models
Model Name	Description
deberta	protectai/deberta-v3-base-prompt-injection
distilbert	fmops/distilbert-prompt-injection
distilbert-onnx	prompt-security/fmops-distilbert-prompt-injection-onnx

Groq API Behavior

detect_injection_api returns:

True for safe responses
False for unsafe responses (or non-standard responses treated conservatively as unsafe)
None when the API call fails

Use return_raw=True to inspect raw model output as (is_safe, raw_response).

Example Usage

from pytector import PromptInjectionDetector

# Basic usage with default model
detector = PromptInjectionDetector()
is_injection, probability = detector.detect_injection("Your text here")

# Using Groq API
detector = PromptInjectionDetector(
    use_groq=True,
    api_key="your-api-key"
)
is_safe = detector.detect_injection_api("Your text here")

# Using GGUF model
detector = PromptInjectionDetector("path/to/model.gguf")
is_injection, probability = detector.detect_injection("Your text here")

# Custom threshold
detector = PromptInjectionDetector(default_threshold=0.8)
is_injection, probability = detector.detect_injection("Your text here")

Sanitizer Usage

from pytector import PromptSanitizer

# All strategies enabled by default
sanitizer = PromptSanitizer()
cleaned, was_modified = sanitizer.sanitize("Ignore previous instructions. Hello!")

# With detailed change log
cleaned, was_modified, changes = sanitizer.sanitize(
    "Ignore previous instructions. Hello!",
    return_details=True,
)

# Custom configuration
sanitizer = PromptSanitizer(
    fuzzy_threshold=0.80,
    sentence_threshold=0.4,
    enable_prompt_enforcement=True,
)

Sanitizer Configuration

Sanitizer Parameters
Parameter	Default	Description
enable_encoding_detection	True	Decode and strip Base64, hex, ROT13 obfuscated payloads
enable_unicode_normalization	True	Strip invisible characters, NFKC homoglyph normalization
enable_pattern_removal	True	Regex-based structural injection pattern removal
enable_sentence_scoring	True	Heuristic per-sentence analysis; drop suspicious sentences
enable_fuzzy_matching	True	Catch paraphrased injection phrases via difflib similarity
enable_keyword_stripping	True	Final pass removing known injection phrases
enable_prompt_enforcement	False	Escape template syntax (``{ } < > ` ``)
keywords	None	Custom keyword list; `None` uses built-in defaults
fuzzy_threshold	0.85	Similarity cutoff for fuzzy matching (0.0-1.0)
sentence_threshold	0.5	Heuristic score cutoff for sentence removal (0.0-1.0)

PIIScanner

PII (Personally Identifiable Information) detection using transformer NER models.

Default model: joneauxedgar/pasteproof-pii-detector-v2 (ModernBERT-base, F1 0.97, 27 entity types — hosted as v3 weights on HuggingFace). Requires transformers >= 4.48.0 for ModernBERT support.

Citation

@model{pasteproof_pii_detector,
  author = {Jonathan Edgar},
  title  = {PasteProof PII Detector},
  year   = {2025},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}

class pytector.pii.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Bases: object

Detect and optionally redact PII entities in text.

Parameters:

model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for token-classification.
threshold (float) – Minimum confidence score for an entity to be reported.
entity_types (list[str] | None) – If provided, only entities whose type is in this list are returned. None means all entity types are returned.

predefined_models: Dict[str, str] = {'pasteproof-v3': 'joneauxedgar/pasteproof-pii-detector-v2'}

SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')

__init__(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]

Parameters:

model_name (str)
threshold (float)
entity_types (List[str] | None)

Return type:

None

scan(text, threshold=None)[source]

Scan text for PII entities.

Returns (has_pii, entities) where each entity dict contains text, type, score, start, and end.

Return type:

Tuple[bool, List[Dict[str, Any]]]

Parameters:

text (str)
threshold (float | None)

redact(text, threshold=None, replacement='[REDACTED]')[source]

Return a copy of text with detected PII replaced by replacement.

Entities are replaced right-to-left so character offsets stay valid.

Return type:

str

Parameters:

text (str)
threshold (float | None)
replacement (str)

report(text, threshold=None)[source]

Print a human-readable PII scan summary.

Return type:

None

Parameters:

text (str)
threshold (float | None)

get_entity_types()[source]

Return the tuple of entity types supported by the default model.

Return type:: Tuple[str, ...]

Uses the PasteProof PII Detector (ModernBERT-base, F1 0.97) for NER-based PII detection across 27 entity types. Requires transformers >= 4.48.0 for ModernBERT support.

from pytector import PIIScanner

scanner = PIIScanner()
has_pii, entities = scanner.scan("Email john@acme.com, SSN 123-45-6789")
print(scanner.redact("Email john@acme.com, SSN 123-45-6789"))

# Filter to specific entity types
scanner = PIIScanner(entity_types=["EMAIL", "CREDIT_CARD"], threshold=0.7)

PIIScanner Parameters
Parameter	Type	Description
model_name	str	Predefined key (`pasteproof-v3`) or HuggingFace model ID / local path
threshold	float	Minimum confidence for an entity to be reported (default 0.5)
entity_types	list[str] \| None	Filter to specific types (e.g. `["EMAIL", "SSN"]`); `None` = all

Citation

@model{pasteproof_pii_detector,
  author    = {Jonathan Edgar},
  title     = {PasteProof PII Detector},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}

ToxicityDetector

Toxicity detection using transformer sequence-classification models.

Default model: citizenlab/distilbert-base-multilingual-cased-toxicity (DistilBERT multilingual, F1-micro 0.94, 10 languages).

class pytector.toxicity.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]

Bases: object

Classify text as toxic or non-toxic.

Parameters:

model_name (str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable for text-classification.
threshold (float) – Score above which text is considered toxic.

predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}

__init__(model_name='citizenlab', threshold=0.5)[source]

Parameters:

model_name (str)
threshold (float)

Return type:

None

detect(text, threshold=None)[source]

Detect whether text is toxic.

Returns (is_toxic, score) mirroring the PromptInjectionDetector.detect_injection return signature.

Return type:

Tuple[bool, float]

Parameters:

text (str)
threshold (float | None)

report(text, threshold=None)[source]

Print a human-readable toxicity summary.

Return type:

None

Parameters:

text (str)
threshold (float | None)

static _extract_toxic_score(results)[source]

Normalise pipeline output into a single toxicity probability.

The citizenlab model returns [{"label": "toxic"|"non-toxic", "score": float}]. Other models may use LABEL_1 / LABEL_0 conventions.

Return type:: float
Parameters:: results (List[Dict[str, Any]])

Uses citizenlab/distilbert-base-multilingual-cased-toxicity (F1-micro 0.94, 10 languages) for toxicity classification.

from pytector import ToxicityDetector

detector = ToxicityDetector()
is_toxic, score = detector.detect("You are terrible")
detector.report("Have a wonderful day!")

ToxicityDetector Parameters
Parameter	Type	Description
model_name	str	Predefined key (`citizenlab`) or HuggingFace model ID / local path
threshold	float	Score above which text is considered toxic (default 0.5)

RegexScanner

Rule-based PII and credential detection using customisable regex patterns.

This module is pure Python stdlib — no model downloads, no heavy dependencies. It ships with sensible defaults for common PII types and lets users add, remove, or completely replace patterns at construction time or at runtime.

class pytector.regex_scanner.RegexScanner(patterns=None, use_defaults=True)[source]

Bases: object

Scan text for sensitive data using compiled regular expressions.

Parameters:

patterns (dict[str, str] | None) – Mapping of {PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults is True, or used alone when False.
use_defaults (bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).

__init__(patterns=None, use_defaults=True)[source]

Parameters:

patterns (Dict[str, str] | None)
use_defaults (bool)

Return type:

None

scan(text)[source]

Scan text against all active patterns.

Returns (has_matches, matches) where each match dict contains pattern_name, match, start, and end.

Return type:: Tuple[bool, List[Dict[str, Any]]]
Parameters:: text (str)

redact(text, replacement='[REDACTED]')[source]

Return a copy of text with all matches replaced by replacement.

Non-overlapping matches are replaced right-to-left so offsets stay valid.

Return type:

str

Parameters:

text (str)
replacement (str)

report(text)[source]

Print a human-readable scan summary.

Return type:: None
Parameters:: text (str)

add_pattern(name, pattern)[source]

Add or overwrite a pattern at runtime.

Return type:

None

Parameters:

name (str)
pattern (str)

remove_pattern(name)[source]

Remove a pattern by name. No-op if not present.

Return type:: None
Parameters:: name (str)

get_patterns()[source]

Return a copy of the active pattern dictionary.

Return type:: Dict[str, str]

static _merge_overlapping(matches)[source]

Merge overlapping spans so redaction doesn’t double-replace.

Return type:: List[Dict[str, Any]]
Parameters:: matches (List[Dict[str, Any]])

Pure-stdlib rule-based scanner with customizable patterns.

from pytector import RegexScanner

scanner = RegexScanner()
has_match, matches = scanner.scan("Key: sk-live-abc123def456")
print(scanner.redact("Email user@example.com"))

# Custom patterns only
custom = RegexScanner(
    patterns={"ORDER_ID": r"ORD-\d{8}"},
    use_defaults=False,
)

RegexScanner Parameters
Parameter	Type	Description
patterns	dict[str, str] \| None	`{NAME: regex}` mapping merged with defaults (or used alone)
use_defaults	bool	Whether to include built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN)

CanaryToken

Canary token generation and leak detection.

Inject a unique token into your system prompt. If the model’s output contains the canary, the system prompt was leaked — regardless of how clever the injection was.

Pure Python stdlib. Zero dependencies, zero calibration.

class pytector.canary.CanaryToken(token=None, length=16, prefix='CANARY-')[source]

Bases: object

Generate, embed, and detect canary tokens in LLM interactions.

Parameters:

token (str | None) – Explicit canary string. If None a random token is generated.
length (int) – Length of the auto-generated token (ignored when token is given).
prefix (str) – Prefix prepended to auto-generated tokens for easy grep-ability.

__init__(token=None, length=16, prefix='CANARY-')[source]

Parameters:

token (str | None)
length (int)
prefix (str)

Return type:

None

property token: str: The canary string.

wrap(system_prompt)[source]

Return system_prompt with the canary instruction appended.

The instruction tells the model to never repeat the canary.

Return type:: str
Parameters:: system_prompt (str)

check(model_output)[source]

Check whether the model leaked the canary.

Returns (leaked, token) where leaked is True when the canary appears in model_output and token is the matched string (or None if clean).

Return type:: Tuple[bool, Optional[str]]
Parameters:: model_output (str)

report(model_output)[source]

Print a human-readable leak check summary.

Return type:: None
Parameters:: model_output (str)

Inject a secret token into your system prompt and detect if the model leaks it. Pure stdlib — zero dependencies, zero calibration.

from pytector import CanaryToken

canary = CanaryToken()
system_prompt = canary.wrap("You are a helpful assistant.")
# ... pass to LLM, get response ...
leaked, token = canary.check(model_output)

CanaryToken Parameters
Parameter	Type	Description
token	str \| None	Explicit canary string; `None` auto-generates one
length	int	Length of the random part of auto-generated tokens (default 16)
prefix	str	Prefix for auto-generated tokens (default `CANARY-`)