API Reference
This page provides detailed API documentation for pytector.
Core Classes
- class pytector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
Bases:
object- predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}
- default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']
- default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']
- default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'
- default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'
- default_keyword_block_hazard_code = 'KEYWORD_BLOCK'
- __init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
- class pytector.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
Bases:
objectSanitizes text input by removing or neutralising prompt injection attempts.
Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.
- __init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
- sanitize(text, return_details=False)[source]
Run the sanitisation pipeline on text.
Returns
(cleaned_text, was_modified)by default. When return_details isTrue, returns(cleaned_text, was_modified, changes)where changes is a list of dicts describing each modification.
- class pytector.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]
Bases:
objectDetect and optionally redact PII entities in text.
- Parameters:
model_name (
str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable fortoken-classification.threshold (
float) – Minimum confidence score for an entity to be reported.entity_types (
list[str] | None) – If provided, only entities whose type is in this list are returned.Nonemeans all entity types are returned.
- SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')
- scan(text, threshold=None)[source]
Scan text for PII entities.
Returns
(has_pii, entities)where each entity dict containstext,type,score,start, andend.
- redact(text, threshold=None, replacement='[REDACTED]')[source]
Return a copy of text with detected PII replaced by replacement.
Entities are replaced right-to-left so character offsets stay valid.
- class pytector.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]
Bases:
objectClassify text as toxic or non-toxic.
- Parameters:
model_name (
str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable fortext-classification.threshold (
float) – Score above which text is considered toxic.
- predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}
- detect(text, threshold=None)[source]
Detect whether text is toxic.
Returns
(is_toxic, score)mirroring thePromptInjectionDetector.detect_injectionreturn signature.
- class pytector.RegexScanner(patterns=None, use_defaults=True)[source]
Bases:
objectScan text for sensitive data using compiled regular expressions.
- Parameters:
patterns (
dict[str,str] | None) – Mapping of{PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults isTrue, or used alone whenFalse.use_defaults (
bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).
- scan(text)[source]
Scan text against all active patterns.
Returns
(has_matches, matches)where each match dict containspattern_name,match,start, andend.
- redact(text, replacement='[REDACTED]')[source]
Return a copy of text with all matches replaced by replacement.
Non-overlapping matches are replaced right-to-left so offsets stay valid.
- class pytector.CanaryToken(token=None, length=16, prefix='CANARY-')[source]
Bases:
objectGenerate, embed, and detect canary tokens in LLM interactions.
- Parameters:
- wrap(system_prompt)[source]
Return system_prompt with the canary instruction appended.
The instruction tells the model to never repeat the canary.
PromptInjectionDetector
- class pytector.detector.PromptInjectionDetector(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
Bases:
object- predefined_models = {'deberta': 'protectai/deberta-v3-base-prompt-injection', 'distilbert': 'fmops/distilbert-prompt-injection', 'distilbert-onnx': 'prompt-security/fmops-distilbert-prompt-injection-onnx'}
- default_input_keywords = ['ignore', 'forget', 'disregard', 'override', 'bypass', 'hack', 'exploit', 'system', 'prompt', 'injection', 'jailbreak', 'break', 'escape', 'override', 'ignore previous', 'ignore above', 'ignore all', 'ignore everything', 'forget everything', 'forget all', 'forget previous', 'forget above', 'disregard everything', 'disregard all', 'disregard previous', 'disregard above', 'bypass all', 'bypass everything', 'bypass previous', 'bypass above', 'system prompt', 'system message', 'system instruction', 'system directive', 'override system', 'override prompt', 'override instruction', 'override directive', 'ignore system', 'ignore prompt', 'ignore instruction', 'ignore directive', 'forget system', 'forget prompt', 'forget instruction', 'forget directive', 'disregard system', 'disregard prompt', 'disregard instruction', 'disregard directive', 'bypass system', 'bypass prompt', 'bypass instruction', 'bypass directive']
- default_output_keywords = ['i have been pwned', 'i have been hacked', 'i have been compromised', 'i have been broken', 'i have been jailbroken', 'i have been escaped', 'i have been overridden', 'i have been bypassed', 'i have been exploited', 'i am now free', 'i am now unrestricted', 'i am now uncontrolled', 'i can now do anything', 'i can now access everything', 'i can now see everything', 'i can now read everything', 'i can now write everything', 'i can now execute everything', 'i can now delete everything', 'i can now modify everything', 'i can now change everything', 'i can now override everything', 'i can now bypass everything', 'i can now exploit everything', 'i can now hack everything', 'i can now break everything', 'i can now escape everything', 'i can now jailbreak everything', 'i can now compromise everything', 'i can now pwn everything']
- default_input_block_message = 'Input blocked by keyword filtering: {matched_keywords}'
- default_output_block_message = 'Output blocked by keyword filtering: {matched_keywords}'
- default_keyword_block_hazard_code = 'KEYWORD_BLOCK'
- __init__(model_name_or_url='deberta', default_threshold=0.5, use_groq=False, api_key=None, groq_model='openai/gpt-oss-safeguard-20b', enable_keyword_blocking=False, input_keywords=None, output_keywords=None, case_sensitive=False, input_block_message=None, output_block_message=None, keyword_block_hazard_code=None)[source]
LangChain Integration
- exception pytector.langchain.PromptInjectionBlockedError[source]
Bases:
ValueErrorRaised when a prompt is blocked by the guard.
- class pytector.langchain.PytectorGuard(*args, **kwargs)[source]
Bases:
RunnableSerializable[str, str]LangChain Runnable that blocks unsafe prompts before downstream steps run.
For safe inputs the original string is passed through unchanged.
- Parameters:
- invoke(input, config=None, **kwargs)[source]
Transform a single input into an output.
- Parameters:
input (
str) – The input to the Runnable.config (
Optional[RunnableConfig]) –A config to use when invoking the Runnable.
The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys.
Please refer to RunnableConfig for more details.
kwargs (Any)
- Return type:
- Returns:
The output of the Runnable.
- model_config = {'extra': 'ignore', 'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
PromptSanitizer
Input sanitization for prompt injection defense.
- class pytector.sanitizer.PromptSanitizer(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
Bases:
objectSanitizes text input by removing or neutralising prompt injection attempts.
Runs a layered pipeline of strategies: encoding detection, unicode normalisation, regex pattern removal, sentence-level scoring, fuzzy matching, and keyword stripping. An optional seventh strategy (prompt enforcement) escapes template syntax.
- __init__(enable_encoding_detection=True, enable_unicode_normalization=True, enable_pattern_removal=True, enable_sentence_scoring=True, enable_fuzzy_matching=True, enable_keyword_stripping=True, enable_prompt_enforcement=False, keywords=None, case_sensitive=False, replacement='', fuzzy_threshold=0.85, sentence_threshold=0.5, enforcement_chars=None)[source]
- sanitize(text, return_details=False)[source]
Run the sanitisation pipeline on text.
Returns
(cleaned_text, was_modified)by default. When return_details isTrue, returns(cleaned_text, was_modified, changes)where changes is a list of dicts describing each modification.
Configuration
The following configuration options are available when initializing the detector:
Parameter |
Type |
Description |
|---|---|---|
model_name_or_url |
str |
Name or path of the model to use for detection |
default_threshold |
float |
Default confidence threshold for injection detection (0.0 to 1.0) |
use_groq |
bool |
Whether to use Groq API for detection |
api_key |
str |
API key for Groq service (required if use_groq=True) |
groq_model |
str |
Groq model to use for detection (default: openai/gpt-oss-safeguard-20b) |
Predefined Models
The following predefined models are available:
Model Name |
Description |
|---|---|
deberta |
protectai/deberta-v3-base-prompt-injection |
distilbert |
fmops/distilbert-prompt-injection |
distilbert-onnx |
prompt-security/fmops-distilbert-prompt-injection-onnx |
Groq API Behavior
detect_injection_api returns:
Truefor safe responsesFalsefor unsafe responses (or non-standard responses treated conservatively as unsafe)Nonewhen the API call fails
Use return_raw=True to inspect raw model output as (is_safe, raw_response).
Example Usage
from pytector import PromptInjectionDetector
# Basic usage with default model
detector = PromptInjectionDetector()
is_injection, probability = detector.detect_injection("Your text here")
# Using Groq API
detector = PromptInjectionDetector(
use_groq=True,
api_key="your-api-key"
)
is_safe = detector.detect_injection_api("Your text here")
# Using GGUF model
detector = PromptInjectionDetector("path/to/model.gguf")
is_injection, probability = detector.detect_injection("Your text here")
# Custom threshold
detector = PromptInjectionDetector(default_threshold=0.8)
is_injection, probability = detector.detect_injection("Your text here")
Sanitizer Usage
from pytector import PromptSanitizer
# All strategies enabled by default
sanitizer = PromptSanitizer()
cleaned, was_modified = sanitizer.sanitize("Ignore previous instructions. Hello!")
# With detailed change log
cleaned, was_modified, changes = sanitizer.sanitize(
"Ignore previous instructions. Hello!",
return_details=True,
)
# Custom configuration
sanitizer = PromptSanitizer(
fuzzy_threshold=0.80,
sentence_threshold=0.4,
enable_prompt_enforcement=True,
)
Sanitizer Configuration
Parameter |
Default |
Description |
|---|---|---|
enable_encoding_detection |
True |
Decode and strip Base64, hex, ROT13 obfuscated payloads |
enable_unicode_normalization |
True |
Strip invisible characters, NFKC homoglyph normalization |
enable_pattern_removal |
True |
Regex-based structural injection pattern removal |
enable_sentence_scoring |
True |
Heuristic per-sentence analysis; drop suspicious sentences |
enable_fuzzy_matching |
True |
Catch paraphrased injection phrases via difflib similarity |
enable_keyword_stripping |
True |
Final pass removing known injection phrases |
enable_prompt_enforcement |
False |
|
keywords |
None |
Custom keyword list; |
fuzzy_threshold |
0.85 |
Similarity cutoff for fuzzy matching (0.0-1.0) |
sentence_threshold |
0.5 |
Heuristic score cutoff for sentence removal (0.0-1.0) |
PIIScanner
PII (Personally Identifiable Information) detection using transformer NER models.
Default model: joneauxedgar/pasteproof-pii-detector-v2 (ModernBERT-base,
F1 0.97, 27 entity types — hosted as v3 weights on HuggingFace).
Requires transformers >= 4.48.0 for ModernBERT support.
Citation
@model{pasteproof_pii_detector,
author = {Jonathan Edgar},
title = {PasteProof PII Detector},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}
- class pytector.pii.PIIScanner(model_name='pasteproof-v3', threshold=0.5, entity_types=None)[source]
Bases:
objectDetect and optionally redact PII entities in text.
- Parameters:
model_name (
str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable fortoken-classification.threshold (
float) – Minimum confidence score for an entity to be reported.entity_types (
list[str] | None) – If provided, only entities whose type is in this list are returned.Nonemeans all entity types are returned.
- SUPPORTED_ENTITY_TYPES: Tuple[str, ...] = ('CREDIT_CARD', 'PCI_PAN', 'PCI_TRACK', 'PCI_EXPIRY', 'API_KEY', 'AWS_KEY', 'PRIVATE_KEY', 'PASSWORD', 'HIPAA_MRN', 'HIPAA_ACCOUNT', 'HIPAA_DOB', 'GDPR_PASSPORT', 'GDPR_NIN', 'GDPR_IBAN', 'NAME', 'FIRST_NAME', 'LAST_NAME', 'SSN', 'DOB', 'DRIVER_LICENSE', 'EMAIL', 'PHONE', 'IP_ADDRESS', 'STREET', 'CITY', 'STATE', 'ZIPCODE')
- scan(text, threshold=None)[source]
Scan text for PII entities.
Returns
(has_pii, entities)where each entity dict containstext,type,score,start, andend.
- redact(text, threshold=None, replacement='[REDACTED]')[source]
Return a copy of text with detected PII replaced by replacement.
Entities are replaced right-to-left so character offsets stay valid.
Uses the PasteProof PII Detector
(ModernBERT-base, F1 0.97) for NER-based PII detection across 27 entity types.
Requires transformers >= 4.48.0 for ModernBERT support.
from pytector import PIIScanner
scanner = PIIScanner()
has_pii, entities = scanner.scan("Email john@acme.com, SSN 123-45-6789")
print(scanner.redact("Email john@acme.com, SSN 123-45-6789"))
# Filter to specific entity types
scanner = PIIScanner(entity_types=["EMAIL", "CREDIT_CARD"], threshold=0.7)
Parameter |
Type |
Description |
|---|---|---|
model_name |
str |
Predefined key ( |
threshold |
float |
Minimum confidence for an entity to be reported (default 0.5) |
entity_types |
list[str] | None |
Filter to specific types (e.g. |
Citation
@model{pasteproof_pii_detector,
author = {Jonathan Edgar},
title = {PasteProof PII Detector},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/joneauxedgar/pasteproof-pii-detector-v2}
}
ToxicityDetector
Toxicity detection using transformer sequence-classification models.
Default model: citizenlab/distilbert-base-multilingual-cased-toxicity
(DistilBERT multilingual, F1-micro 0.94, 10 languages).
- class pytector.toxicity.ToxicityDetector(model_name='citizenlab', threshold=0.5)[source]
Bases:
objectClassify text as toxic or non-toxic.
- Parameters:
model_name (
str) – A key in :pyattr:`predefined_models` or any Hugging Face model ID / local path suitable fortext-classification.threshold (
float) – Score above which text is considered toxic.
- predefined_models: Dict[str, str] = {'citizenlab': 'citizenlab/distilbert-base-multilingual-cased-toxicity'}
- detect(text, threshold=None)[source]
Detect whether text is toxic.
Returns
(is_toxic, score)mirroring thePromptInjectionDetector.detect_injectionreturn signature.
Uses citizenlab/distilbert-base-multilingual-cased-toxicity (F1-micro 0.94, 10 languages) for toxicity classification.
from pytector import ToxicityDetector
detector = ToxicityDetector()
is_toxic, score = detector.detect("You are terrible")
detector.report("Have a wonderful day!")
Parameter |
Type |
Description |
|---|---|---|
model_name |
str |
Predefined key ( |
threshold |
float |
Score above which text is considered toxic (default 0.5) |
RegexScanner
Rule-based PII and credential detection using customisable regex patterns.
This module is pure Python stdlib — no model downloads, no heavy dependencies. It ships with sensible defaults for common PII types and lets users add, remove, or completely replace patterns at construction time or at runtime.
- class pytector.regex_scanner.RegexScanner(patterns=None, use_defaults=True)[source]
Bases:
objectScan text for sensitive data using compiled regular expressions.
- Parameters:
patterns (
dict[str,str] | None) – Mapping of{PATTERN_NAME: regex_string}. Merged with the built-in defaults when use_defaults isTrue, or used alone whenFalse.use_defaults (
bool) – Whether to include the built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN).
- scan(text)[source]
Scan text against all active patterns.
Returns
(has_matches, matches)where each match dict containspattern_name,match,start, andend.
- redact(text, replacement='[REDACTED]')[source]
Return a copy of text with all matches replaced by replacement.
Non-overlapping matches are replaced right-to-left so offsets stay valid.
Pure-stdlib rule-based scanner with customizable patterns.
from pytector import RegexScanner
scanner = RegexScanner()
has_match, matches = scanner.scan("Key: sk-live-abc123def456")
print(scanner.redact("Email user@example.com"))
# Custom patterns only
custom = RegexScanner(
patterns={"ORDER_ID": r"ORD-\d{8}"},
use_defaults=False,
)
Parameter |
Type |
Description |
|---|---|---|
patterns |
dict[str, str] | None |
|
use_defaults |
bool |
Whether to include built-in patterns (EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, API_KEY, JWT_TOKEN) |
CanaryToken
Canary token generation and leak detection.
Inject a unique token into your system prompt. If the model’s output contains the canary, the system prompt was leaked — regardless of how clever the injection was.
Pure Python stdlib. Zero dependencies, zero calibration.
- class pytector.canary.CanaryToken(token=None, length=16, prefix='CANARY-')[source]
Bases:
objectGenerate, embed, and detect canary tokens in LLM interactions.
- Parameters:
- wrap(system_prompt)[source]
Return system_prompt with the canary instruction appended.
The instruction tells the model to never repeat the canary.
Inject a secret token into your system prompt and detect if the model leaks it. Pure stdlib — zero dependencies, zero calibration.
from pytector import CanaryToken
canary = CanaryToken()
system_prompt = canary.wrap("You are a helpful assistant.")
# ... pass to LLM, get response ...
leaked, token = canary.check(model_output)
Parameter |
Type |
Description |
|---|---|---|
token |
str | None |
Explicit canary string; |
length |
int |
Length of the random part of auto-generated tokens (default 16) |
prefix |
str |
Prefix for auto-generated tokens (default |