goclean

package module

v0.0.0-...-75aac12 Latest Latest Go to latest Published: Feb 23, 2023 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/martinhrvn/go-clean

Links

Open Source Insights

README ¶

go-clean

go-clean is a flexible, stand-alone, lightweight library for detecting and censoring profanities in Go.

Installation

go get -u github.com/martinhrvn/go-clean

Usage

By default

package main

import (
    goclean "github.com/martinhrvn/go-clean"
)

func main() {
    goclean.IsProfane("fuck this shit")
    // returns true  
    goclean.List("fuck this shit")         
    // returns "DetectedConcern{Word: "fuck", MatchedWord: "fuck", StartIndex: 0, EndIndex: 3}"
    goclean.Redact("fuck this shit")
    // returns "**** this shit"
}

Calling goclean.IsProfane(s), goclean.ExtractProfanity(s) or goclean.Redact(s) will use the default profanity detector, that is configured in the config.json file.

If you'd like to disable leet speak, numerical character or special character sanitization, you have to create a ProfanityDetector instead:

profanityDetector := goclean.NewProfanitySanitizer(goclean.Config{
    // will not sanitize leet speak (a$$, b1tch, etc.)
    DetectLeetSpeak: false,
    // will not detect obfuscated words (f_u_c_k, etc.)
    DetectObfuscated: false,
    // replacement character for redacted words
    ReplacementCharacter: '*', 
    // Lenght for obfuscated characters (e.g. if set to "1" f_u_c_k will be detected but f___u___c___k won't)
    ObfuscationLength: 1,
	
    Profanities: []goclean.WordMatcher{
        { Word: "fuck", Regex: "f[u]+ck" }
    }
})

Configuration

Base configuration

DetectLeetSpeak: sanitize leet speak (a$$, b1tch, etc.)
- default: true
DetectObfuscated: detect obfuscated words (f_u_c_k, etc.)
- default: true
ObfuscationLength: length for obfuscated characters (e.g. if set to "1" f_u_c_k will be detected but f___u___c___k won't)
- default: 3
ReplacementCharacter: replacement character for redacted words
- default: *

WordMatchers

used for profanities and false negatives configuration

Regex:
- if found it will be used to match word instead of Word
Word:
- word to detect,
- if DetectObfuscated: true it will also match words with ObfuscationLength characters in between letters
Level:
- optional profanity level that will be returned from List method

False positive

These are words that contain words that are profanities but are not profane themselves. For example word bass contains ass but is not profane.

False negatives

These are words that may be incorrectly filtered as false positives and words that should always be treated as profane, regardless of false postives. These are matched before false positives are removed.

For example: dumbass is false negative, as bass is false positive so to be matched it needs to be added to false negatives.

Methods

List

Returns list of DetectedConcerns for profanities found in the given string. This contains:

Word: base word found (in case only regex is provided empty string will be returned, e.g. for fuuuck it will be fuck)
MatchedWord: actual word found in string (e.g. for fuuuck it will be fuuuck)
StartIndex: start index of word in string
EndIndex: end index of word in string
Level: profanity level (if provided, else it will be 0)

If the configuration is:

WordMatcher {
    Word: "fuck"
    Regex: "f[u]+ck"
    Level: 1
}

and the input string is fuuuck, it will return:

DetectedEntity {
    Word: "fuck"
    MatchedWord: "fuuuck"
    StartIndex: 0
    EndIndex: 6
}

Redact

It will return string with profanities replaced with ReplacementCharacter for each character of detected profanities.

The input string "shit hit the fan" will be returned as "**** hit the fan".

IsProfane

Returns true if the given string contains profanities.

The input string "shit hit the fan" returns true.

Documentation ¶

Index ¶

func IsProfane(str string) bool
func Redact(str string) string
type Config
- func DefaultConfig() *Config
type DetectedConcern
- func List(str string) []DetectedConcern
type ProfanitySanitizer
- func NewProfanitySanitizer(c *Config) ProfanitySanitizer
type WordMatcher

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsProfane ¶

func IsProfane(str string) bool

IsProfane checks whether there are any profanities in a given string (word or sentence).

Uses the default ProfanityDetector

func Redact ¶

func Redact(str string) string

Redact takes in a string (word or sentence) and tries to censor all profanities found.

Uses the default ProfanitySanitizer

Types ¶

type Config ¶

type Config struct {
	DetectLeetSpeak      bool   `json:"detectLeetSpeak"`
	DetectObfuscated     bool   `json:"detectObfuscated"`
	ReplacementCharacter string `json:"replacementCharacter"`
	ObfuscationLength    int32  `json:"obfuscationLength,default=3"`

	Profanities    []WordMatcher `json:"profanities"`
	FalsePositives []string      `json:"falsePositives"`
	FalseNegatives []WordMatcher `json:"falseNegatives"`
}

Config is a struct that contains the configuration for the profanity sanitizer.

func DefaultConfig ¶

func DefaultConfig() *Config

DefaultConfig is the default configuration for the profanity sanitizer.

type DetectedConcern ¶

type DetectedConcern struct {
	Word        string
	MatchedText string
	StartIndex  int32
	EndIndex    int32
	Level       int32
}

DetectedConcern contains details about detected profanity (matched text, base word, start, end index and optional level).

func List ¶

func List(str string) []DetectedConcern

List takes in a string (word or sentence) and returns list of DetectedConcern.

Uses the default ProfanitySanitizer

type ProfanitySanitizer ¶

type ProfanitySanitizer struct {
	// contains filtered or unexported fields
}

ProfanitySanitizer contains the dictionaries as well as the configuration for determining how profanity detection is handled

func NewProfanitySanitizer ¶

func NewProfanitySanitizer(c *Config) ProfanitySanitizer

NewProfanitySanitizer creates a new ProfanitySanitizer with the provided Config.

func (*ProfanitySanitizer) IsProfane ¶

func (gc *ProfanitySanitizer) IsProfane(str string) bool

IsProfane checks whether there are any profanities in a given string (word or sentence).

func (*ProfanitySanitizer) List ¶

func (gc *ProfanitySanitizer) List(message string) []DetectedConcern

List takes in a string (word or sentence) and returns list of DetectedConcern.

func (*ProfanitySanitizer) Redact ¶

func (gc *ProfanitySanitizer) Redact(str string) string

Redact takes in a string (word or sentence) and tries to censor all profanities found.

type WordMatcher ¶

type WordMatcher struct {
	Word    string `json:"word,omitempty"`
	Regex   string `json:"regex,omitempty"`
	Level   int32  `json:"level,omitempty,default=1"`
	Matcher *regexp.Regexp
}

WordMatcher is a struct that contains the word or regex to be matched and the level of the word.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL