Automatic text moderation
Moderation Dashboard supports text moderation that administrators can configure for words or messages in their chat apps. Once set, it will automatically mask or block words or text messages sent by a user of the chat app before they are published on channels if these words are on the forbidden word list or contain profanity as ranked by the selected third-party provider.
This functionality is enabled by Functions which can perform the real-time transformation of published messages before they are distributed to subscribers. Once the text moderation is configured, the Moderation Dashboard GUI calls the PubNub provisioning API and deploys the configured logic as the Before Publish or Fire
function. You can view this function under your keyset for the moderated app on the Admin Portal.
To configure text moderation in the Moderation Dashboard GUI, expand the Settings menu on the left-hand navigation and select Text Moderation.
There are two types of pre-configured text moderation:
- Word List that allows you to manually set a list of forbidden words or choose a pre-defined word list in one out of five available languages.
- Automatic Detection that allows you to configure PubNub to send text messages to third-party services that can apply advanced logic to detect inappropriate content before it's published on channels. By default, Moderation Dashboard provides moderation offered by Tisane, but also supports Sift Ninja (only for its existing users).
Both options have an on/off toggle at the top. Make sure to enable this toggle at the start and save the configuration after you're done.
Functions limitations
PubNub supports only one function for a given channel pattern and event. If you want to perform some different functionality besides moderation for the before publish events for the same channel patterns you are moderating, you will need to consolidate that functionality into one function. To do that, either create the moderation function and then add your additional functionality in the function's sandbox on the Admin Portal or copy the code from the generated moderation function into your existing function.
Word List
To define a set of words that should be automatically masked or blocked before sending them to channels, use Word List moderation:
With word list moderation, admins can:
- Select either a channel name or a channel pattern in which they want to apply text moderation in their app. By selecting Apply to All Channel IDs, you select all channels at once. If you decide to use a channel pattern, consider the use of distinctive naming for all the channels that moderation should be applied to, like
moderated.*
orpublic.*
. - Choose 1 out of 5 available languages for a profanity filter.
- Manually add a comma-separated list of profane words which they want to filter out when moderation is applied, or use a default word list (one for each language).
- Decide to have any moderated word replaced by a masking character, or choose to have the entire message blocked. When the Mask Word option is selected, the admin can specify the character that they want to use for masking.
- Optionally, select to have all moderated messages sent to a
banned.
channel by selecting the Route messages to banned.* checkbox option. For example, if a message sent to thepublic.watercooler
channel is moderated, the original message will be sent tobanned.public.watercooler
. As an admin, you will be able to view all such messages in the Channels section of the Moderation Dashboard GUI, in the given channel details by switching to theBanned
toggle on top:
Automatic detection
As an alternative to word list-based text moderation, you may want to provide a more advanced and automated way of determining whether messages in your chat app involve abuse, bullying, or sexism. To do that, you can apply third-party services used in Automatic Detection:
With automatic detection, admins can:
- Apply advanced moderation on a channel ID, channel pattern, or all IDs, same as with word list moderation. Note that the channel choice you make for one of these moderation options applies to both of them.
- Apply automatic moderation offered by Tisane or Sift Ninja by Two Hat (Sift Ninja only supports the existing service users). To use Tisane, you will need their API key, and to use Sift Ninja, you will need their account name, channel name, and API key.
- Select the language of the input content to be moderated (Autodetect is the default value for the field).
- Set different moderation thresholds and risk levels for messages according to categories supported by Tisane (bigotry and hate speech, personal attacks and cyberbullying, criminal activity, sexual advances, and profanity) and Sift Ninja (vulgarity, sexting, and racism).
- Decide to have the entire moderated message replaced by a masking character, not only a single profane word, like in word list moderation. Admins can also choose to have the entire message blocked. When the Mask Message option is selected, the admin can specify the character that they want to use for masking.
- Just like with word list moderation, select to have all moderated messages sent to a
banned.
channel. - Hover over the moderated message afterward to check the reason and severity for which the given third-party scaled the word or message.