Skip to content

Safety filters

Safety filters block harmful content from entering or leaving the conversation in real time.

They run on user input and on agent output, scoring each turn against four content categories and blocking it before it affects the conversation.

Filters can be configured at the project level and overridden per channel (voice and chat). Each category is enabled independently and tuned to a sensitivity level.

Location

Safety filters are defined in up to three optional files:

agent_settings/
└── safety_filters.yaml       # Project-level (general) defaults
voice/
└── safety_filters.yaml       # Voice channel override
chat/
└── safety_filters.yaml       # Chat channel override

A channel-level file will override the project-level defaults for that channel. If no channel file exists, the channel inherits the project-level configuration.

What safety filters control

  • Project-level filters


    Defaults applied to every channel when no channel-specific override is set.

  • Channel-level overrides


    Per-channel tuning for voice or chat, including a global enable toggle.

  • Categories


    Violence, hate, sexual, and self-harm content.

  • Sensitivity levels


    Choose how aggressively each category filters: lenient, medium, or strict.

Categories

All four categories must be configured. Each category has its own enabled flag and level.

Category Description
violence Filters violent or graphic content
hate Filters hateful or discriminatory content
sexual Filters sexually explicit content
self_harm Filters self-harm related content

Fields

Field Where Description
enabled Channel files only true or false — global toggle for the channel. Omit at project level; the project filter is active whenever any category is enabled.
categories All files Map of the four categories. Required.
categories.<name>.enabled All files true or false — whether this category is active.
categories.<name>.level All files Sensitivity level. One of lenient, medium, strict.

Sensitivity levels

Level Behavior
lenient Blocks only the most severe content.
medium Balanced filtering.
strict Blocks borderline content.

Project-level example

Project-level filters omit the global enabled key — the filter is active whenever at least one category is enabled.

categories:
  violence:
    enabled: true
    level: medium
  hate:
    enabled: true
    level: medium
  sexual:
    enabled: true
    level: medium
  self_harm:
    enabled: true
    level: medium

Channel-level example

Channel files include a top-level enabled flag that turns filtering on or off for the entire channel.

enabled: true
categories:
  violence:
    enabled: true
    level: strict
  hate:
    enabled: true
    level: medium
  sexual:
    enabled: true
    level: medium
  self_harm:
    enabled: true
    level: strict

Validation rules

poly push rejects safety filter files that don't satisfy these rules:

  • All four categories (violence, hate, sexual, self_harm) must be present.
  • Each category must set both enabled (boolean) and level.
  • level must be one of lenient, medium, or strict.
  • Channel files must include the top-level enabled flag as a boolean.
  • Unrecognized category keys cause a validation error rather than being silently ignored.

Best practices

  • Keep settings consistent across channels unless a channel has a distinct risk profile (for example, a voice line that handles vulnerable callers may warrant strict on self_harm).
  • Start at medium and adjust based on observed false positives and missed content.
  • Review filter outcomes periodically — the right level depends on caller demographics and use case, not just the deployment.
  • Treat the channel files as overrides, not duplicates: only commit a channel file when it actually differs from the project default.

On the Agent Studio platform

The same settings can be configured in the Agent Studio UI. The platform docs cover the UI workflow and category descriptions in more depth:

  • Agent settings


    Configure personality, role, and rules alongside project-level safety filters. Open agent settings

  • Voice settings


    Configure voice-channel greetings, disclaimers, and safety filter overrides. Open voice settings

  • Chat settings


    Configure chat-channel greetings, style, and safety filter overrides. Open chat settings