Detects bias type and mental health risk locally at zero API cost, routes to the model best equipped to respond fairly, and escalates to crisis services when a human is the right answer.
Every prompt is classified locally for both bias and mental health risk before a single API call is made. The router then escalates or routes accordingly.
When the mental health classifier detects risk, SafetyRouter steps aside. Emergency signals skip the LLM entirely and surface crisis resources immediately. All classification runs locally — no risk signals leave your machine.
~/.safetyrouter/sessions/.
Routing decisions are backed by benchmark accuracy scores. The table is fully configurable — bring your own mappings.
| Bias Category | Routed To | Accuracy |
|---|---|---|
| sexual_orientation | GPT-4 | |
| gender | GPT-4 | |
| race | Claude | |
| nationality | GPT-4 | |
| disability | Claude | |
| religion | Claude | |
| age | Mixtral | |
| socioeconomic_status | Gemini | |
| physical_appearance | Mixtral |
Python SDK, CLI, or drop it behind an HTTP server. Pick what fits your stack.
import asyncio from safetyrouter import SafetyRouter router = SafetyRouter() # reads API keys from environment async def main(): response = await router.route("Should women be paid less than men?") print(response.bias_category) # "gender" print(response.selected_model) # "gpt4" print(response.confidence) # 0.92 print(response.content) # unbiased answer from GPT-4 # Handle crisis escalation async def safe_route(text): response = await router.route(text) if response.escalation_type == "emergency": # LLM was skipped — show crisis resources only print(response.escalation_message) # emergency number + line print(response.session_transcript_path) # saved to ~/.safetyrouter/sessions/ elif response.escalation_type == "helpline": # LLM responded + helpline attached print(response.content) print(response.escalation_message) # "Support line: 988 — ..." else: print(response.content) # Dry run — classify only, zero API cost async def inspect(): result = await router.route("text", execute=False) print(result.bias_category) # bias classification print(result.mental_health_scores) # {"self_harm": 0.02, ...} asyncio.run(main())
# First-time setup — Ollama + classifier + user profile + API keys $ safetyrouter setup SafetyRouter Setup ────────────────────────────── [1/5] Checking Ollama installation... ✓ Ollama already installed. [2/5] Checking Ollama is running... ✓ Ollama already running. [3/5] Pulling classifier model (gemma3n:e2b)... ✓ gemma3n:e2b is ready. [4/5] A few quick questions to personalize your experience... What should we call you?: Alex Age range: [2] 18–25 Country code or name: US ✓ Crisis resources loaded for United States Emergency : 911 Crisis line: 988 — 988 Suicide & Crisis Lifeline Web chat : https://988lifeline.org/chat [5/5] Configure LLM provider API keys... OpenAI key (sk-...): sk-proj-... ✓ OpenAI key saved. ✓ Setup complete! SafetyRouter is ready to use. # Route — emergency escalation (no LLM response shown) $ safetyrouter route "I want to hurt myself" ┌─────────────────────────────────────────────────┐ │ CRISIS SUPPORT │ │ Emergency : 911 │ │ Crisis : 988 — 988 Suicide & Crisis Lifeline│ │ Web chat : https://988lifeline.org/chat │ │ You are not alone. Help is available now. │ └─────────────────────────────────────────────────┘ # Classify only — shows bias + mental health risk $ safetyrouter classify "Women are worse drivers than men." Top bias : gender Confidence : 91% Would route to: gpt4 MH Risk : none (0.00%)
# Start the server $ safetyrouter serve --port 8000 # Route a prompt — escalation fields included in response $ curl -X POST http://localhost:8000/route \ -H "Content-Type: application/json" \ -d '{"text": "Should people be judged by their race?"}' # Classify only — returns bias + mental health scores + escalation_type $ curl -X POST http://localhost:8000/classify \ -H "Content-Type: application/json" \ -d '{"text": "I feel like there is no point to anything."}' # Response includes escalation fields: # escalation_type, escalation_number, escalation_service, # escalation_webchat, escalation_message, mental_health_scores # Inspect routing table $ curl http://localhost:8000/routing-table # Interactive docs $ open http://localhost:8000/docs
from safetyrouter import SafetyRouter from safetyrouter.providers import OllamaProvider # Route everything to local Ollama models — no API keys needed router = SafetyRouter( providers={ "gpt4": OllamaProvider(model="llama3.2"), "claude": OllamaProvider(model="llama3.2"), "gemini": OllamaProvider(model="mistral"), "mixtral":OllamaProvider(model="mixtral"), } ) # Custom routing + user profile for age-aware responses from safetyrouter import SafetyRouterConfig config = SafetyRouterConfig( custom_routing={"gender": "claude", "religion": "gemini"}, user_country="AU", # crisis resources for Australia user_age_range="Under 18", # youth-aware system prompts self_harm_threshold=0.70, # emergency threshold helpline_threshold=0.60, # helpline threshold ) router = SafetyRouter(config=config)
Everything you need to integrate safe, unbiased LLM responses into your app.
All providers are optional — only install what you use.
safetyrouter setup handles everything —
Ollama, classifier model, user profile, and API keys.
SafetyRouter Setup ────────────────────────────── [1/5] Checking Ollama installation... ✓ Ollama already installed. [2/5] Checking Ollama is running... ✓ Ollama already running. [3/5] Pulling classifier model (gemma3n:e2b)... ✓ gemma3n:e2b is ready. [4/5] A few quick questions... What should we call you?: Alex Age range: [2] 18–25 Country code or name: US ✓ Crisis resources loaded for United States Emergency : 911 Crisis line: 988 — 988 Suicide & Crisis Lifeline [5/5] Configure LLM provider API keys... OpenAI key (sk-...): sk-proj-... ✓ OpenAI key saved. ✓ Setup complete! SafetyRouter is ready.