An Overview of Livefyre’s Spam and Abuse Filtering Engine (SAFE).
The Livefyre Spam and Abuse Filtering Engine (SAFE), is a background process that analyzes all incoming content, and is enabled for all Livefyre customers. SAFE uses pattern rules as well as statistical models to detect spam, abuse, profanity, and bulk (repetitive) posts. You will see it referenced from time to time in other Livefyre products, notably the Content moderation tools and ModQ.
Note: SAFE is English only, except for Bulk mailing classification. If you require support for other languages, please contact your Strategic Account Manager.
SAFE filters for several types of unwanted content and assigns a system flag to matching content indicating whether it is Spam, Bulk, Profanity or one of the Abuse categories.
The Spam Filter looks for unsolicited commercial content. It uses a statistical model that relies on a variety of features (including comment content and URLs) to obtain a probabilistic answer whether a given piece of content is likely to be spam. Spam thresholds may be adjusted to customize spam tagging rates for your network or site, by request.
The Bulk Filter looks for repetitive content posted across all Livefyre networks within a short timeframe. If detected, this content is flagged as Bulk, and then trashed by default. While bulk content may be user-generated (such as “Touchdown!” posted repeatedly in a Chat during a popular football game), most originates with spam campaigns. This filter is language-independent, and works with any language.
The Profanity Filter looks for profane language, based on a tested word list. If detected, the content is flagged Profane.
Note: Lifefyre also provides a second Profanity List filter, which may be customized at both the Site and Network levels. Rules created with the Profanity List will take precedence over automated rules stemming from the SAFE Profanity filter. For more information, please see the Profanity List section in the Settings documentation.
The Abuse Filter looks for content that is commonly found to be abusive, such as hate speech and insults. To tag offensive content, this filter uses patterns defined by word lists, phrase lists, and regular expressions, that identify abusive or malicious content not typically caught by spam and bulk filters. This filter includes both a blacklist and a whitelist to judge content. The flags applied by this filter include Hate Speech, Personally Identifiable Information (PII), and Insult.
Studio Components using SAFE
Flags applied by SAFE may be used with the following Studio components:
Use the Network Settings > Moderation tab to define rules by which flagged content will be automatically pre-moderated before appearing in the stream. Use the SAFE flags to define rules for how pre-filtered content should be handled.
For example: Some sites may set a very low tolerance for Profanity, and define SAFE Rules which set all content flagged as Profane to be Bozo’d. Other sites may have a less-stringent approach, and define Rules which set Profane content to be pre-moderated before entering the stream.
ModQ allows System Owners to create rules which define moderation tasks for their network and sites. These tasks are often based on flags added to content by SAFE. Use ModQ to create rules which will generate task lists for your moderators, based on the Profanity and Spam flags.
For more information, please see Studio > ModQ.
Flags applied by SAFE will also be listed in the More Info tab of the Content page of Studio. These flags may be used as an aid to content moderation, by listing which filters were triggered by the selected content.
For more information, see the Content section of Studio.
SAFE’s Applied Flags
The following flags are applied by SAFE to filtered content, and may be used to create rules and moderate content from within Livefyre Studio.
- bulk: repetitive content posted across all networks.
- hate speech: an insult based on ethnicity or religion, especially when the target group affiliation is in a minority and/or protected.
- pii: “personally-identifiable information” is information which may identify the user. This may include an email address, physical address, social security number (for US customers), credit card number, a password, or anything that can be used in fraud or to gain someone’s identity.
- profanity: profane content, as defined by a list of English keywords, based on common use.
- insult: insulting content, as defined by a list of keywords and phrase patterns.
- spam: unsolicited, usually commercial content.
Handling Content not caught by SAFE
There are several means available to effectively handle content not caught by this filter. Options below are listed in the recommended order of process.
- As a moderator, remove the content from the stream.
- Create a Flag Rule that says if a piece of content is flagged as Spam or Offensive by 3 users, set it to Bozo.
- Ban the user that is posting unwanted content, so all their content will go directly into the Bozo state.
- Add specific words that should always be filtered to your profanity list.
Note: If a moderator posts content that is caught by our Spam Filter, it is still flagged as Spam, but is automatically Approved, and will not be set to Bozo.
If you notice trends or patterns of content not caught by SAFE, email email@example.com with the Comment IDs and text, and your support team will work with our engineers to optimize the filter for your network and troubleshoot any issues.