goaway

package module
v1.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2022 License: MIT Imports: 5 Imported by: 0

README

go-away

go-away

build Go Report Card codecov Go Reference Follow TwinProduction

go-away is a stand-alone, lightweight library for detecting profanities in Go.

This library must remain extremely easy to use. Its original intent of not adding overhead will always remain.

Installation

go get -u github.com/TwinProduction/go-away

Usage

import (
	"github.com/TwinProduction/go-away"
)

goaway.IsProfane("fuck this shit")         // returns true
goaway.IsProfane("F   u   C  k th1$ $h!t") // returns true
goaway.IsProfane("@$$h073")                // returns true
goaway.IsProfane("hello, world!")          // returns false

By default, IsProfane uses a default profanity detector, but if you'd like to disable leet speak, numerical character or special character sanitization, you have to create a ProfanityDetector instead:

profanityDetector := goaway.NewProfanityDetector().WithSanitizeLeetSpeak(false).WithSanitizeSpecialCharacters(false).WithSanitizeAccents(false)
profanityDetector.IsProfane("b!tch") // returns false because we're not sanitizing special characters

In the background

While using a giant regex query to handle everything would be a way of doing it, as more words are added to the list of profanities, that would slow down the filtering considerably.

Instead, the following steps are taken before checking for profanities in a string:

  • Numbers are replaced to their letter counterparts (e.g. 1 -> L, 4 -> A, etc)
  • Special characters are replaced to their letter equivalent (e.g. @ -> A, ! -> i)
  • The resulting string has all of its spaces removed to prevent w ords lik e tha t
  • The resulting string has all of its characters converted to lowercase
  • The resulting string has all words deemed as false positives (e.g. assassin) removed

In the future, the following additional steps could also be considered:

  • All non-transformed special characters are removed to prevent s~tring li~ke tha~~t
  • All words that have the same character repeated more than twice in a row are removed (e.g. poooop -> poop)
    • NOTE: This is obviously not a perfect approach, as words like fuuck wouldn't be detected, but it's better than nothing.
    • The upside of this method is that we only need to add base bad words, and not all tenses of said bad word. (e.g. the fuck entry would support fucker, fucking, etc.)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsProfane

func IsProfane(s string) bool

IsProfane checks whether there are any profanities in a given string (word or sentence). Uses the default ProfanityDetector

Types

type ProfanityDetector

type ProfanityDetector struct {
	// contains filtered or unexported fields
}

ProfanityDetector

func NewProfanityDetector

func NewProfanityDetector() *ProfanityDetector

NewProfanityDetector creates a new ProfanityDetector

func (*ProfanityDetector) IsProfane

func (g *ProfanityDetector) IsProfane(s string) bool

IsProfane takes in a string (word or sentence) and look for profanities. Returns a boolean

func (*ProfanityDetector) WithSanitizeAccents

func (g *ProfanityDetector) WithSanitizeAccents(sanitize bool) *ProfanityDetector

WithSanitizeAccents allows configuring of whether the sanitization process should also take into account accents. By default, this is set to true, but since this adds a bit of overhead, you may disable it if your use case is time-sensitive or if the input doesn't involve accents (i.e. if the input can never contain special characters)

func (*ProfanityDetector) WithSanitizeLeetSpeak

func (g *ProfanityDetector) WithSanitizeLeetSpeak(sanitize bool) *ProfanityDetector

WithSanitizeLeetSpeak allows configuring whether the sanitization process should also take into account leetspeak

func (*ProfanityDetector) WithSanitizeSpecialCharacters

func (g *ProfanityDetector) WithSanitizeSpecialCharacters(sanitize bool) *ProfanityDetector

WithSanitizeSpecialCharacters allows configuring whether the sanitization process should also take into account special characters

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL