przetak

command module
v0.0.0-...-c4c1657 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2023 License: Apache-2.0 Imports: 4 Imported by: 0

README

Przetak: fewer weeds on the Web

Przetak is a library for checking whether a text contains abusive or vulgar speech in Polish. While it is written in Go, it can be used by programs written in many other languages thanks to FFI (Foreign Function Interface).

Przetak is resilient to:

  • replicating letters,
  • spacing out the words,
  • inserting non-letters between letters,
  • homograph spoofing, i.e. replacing letters with similar characters.

Also, thanks to its use of character 5-grams, it handles some frequent misspellings and out-of-vocabulary words composed of morphemes with an abusive or vulgar meaning.

Przetak finished the Polish contest of cyberbullying detection PolEval 2019 in second place. Here is a paper about Przetak, and here are the slides from my presentation at AI & NLP Workshop Day 2019.

Installation

First, get the package:

$ go get github.com/MarcinCiura/przetak

Change directory to your ${GOPATH}/src/github.com/MarcinCiura/przetak and run make to build the shared library. Depending on your operating system, the shared library will be called:

  • libprzetak.so on Linux,
  • libprzetak.dylib on macOS,
  • przetak.dll on Windows.

Usage

Przetak's evaluate() function returns an integer whose bits with respective values 1, 2, or 4 are set if the input UTF-8 string contains:

  • abusive words,
  • vulgar words with negative connotations,
  • vulgar words with positive connotations.

The examples directory showcases the use of Przetak directly from Go and from several other programming languages via FFI (Foreign Function Interface).

Author

Marcin Ciura

License

Przetak is licensed under Apache License, Version 2.0.

Documentation

Overview

Package main implements a dynamically linked library for checking whether UTF-8 strings contain abusive or vulgar speech in Polish.

Directories

Path Synopsis
examples
go
Example Go program using package przetak.
Example Go program using package przetak.
Package przetak implements a library for checking whether UTF-8 strings contain abusive or vulgar speech in Polish.
Package przetak implements a library for checking whether UTF-8 strings contain abusive or vulgar speech in Polish.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL