feature

package

v0.2.0 Latest Latest Go to latest Published: Mar 12, 2016 License: BSD-3-Clause Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/shulhan/wvcgen

Links

Open Source Insights

Documentation ¶

Overview ¶

Package feature contain list of feature implementation to compute vandalism in wikipedia dataset.

Index ¶

Constants
Variables
func ComputeImpact(oldrevid, newrevid string, wordlist []string) float64
func GetAllWordList() (allWords []string)
func KullbackLeiblerDivergence(a, b string) (divergence float64)
func Register(ftr Interface, tipe int, name string)
func Round(v float64) float64
type Anonim
- func (anon *Anonim) Compute(dataset tabula.DatasetInterface)
type CharDistributionInsert
- func (ftr *CharDistributionInsert) Compute(dataset tabula.DatasetInterface)
type CharDiversity
- func (ftr *CharDiversity) Compute(dataset tabula.DatasetInterface)
type Class
- func (ftr *Class) Compute(dataset tabula.DatasetInterface)
type CommentLength
- func (ftr *CommentLength) Compute(dataset tabula.DatasetInterface)
type CompressRate
- func (ftr *CompressRate) Compute(dataset tabula.DatasetInterface)
type DigitRatio
- func (ftr *DigitRatio) Compute(dataset tabula.DatasetInterface)
type Feature
type GoodToken
- func (ftr *GoodToken) Compute(dataset tabula.DatasetInterface)
type Interface
- func GetByName(name string) Interface
type LongestCharSeq
- func (ftr *LongestCharSeq) Compute(dataset tabula.DatasetInterface)
type LongestWord
- func (ftr *LongestWord) Compute(dataset tabula.DatasetInterface)
type NonAlnumRatio
- func (ftr *NonAlnumRatio) Compute(dataset tabula.DatasetInterface)
type SizeIncrement
- func (ftr *SizeIncrement) Compute(dataset tabula.DatasetInterface)
type SizeRatio
- func (ftr *SizeRatio) Compute(dataset tabula.DatasetInterface)
type Template
- func (ftr *Template) Compute(dataset tabula.DatasetInterface)
type TermFrequency
- func (ftr *TermFrequency) Compute(dataset tabula.DatasetInterface)
type UpperLowerRatio
- func (ftr *UpperLowerRatio) Compute(dataset tabula.DatasetInterface)
type UpperToAllRatio
- func (ftr *UpperToAllRatio) Compute(dataset tabula.DatasetInterface)
type WordsAllFrequency
- func (ftr *WordsAllFrequency) Compute(dataset tabula.DatasetInterface)
type WordsAllImpact
- func (ftr *WordsAllImpact) Compute(dataset tabula.DatasetInterface)
type WordsBadFrequency
- func (ftr *WordsBadFrequency) Compute(dataset tabula.DatasetInterface)
type WordsBadImpact
- func (ftr *WordsBadImpact) Compute(dataset tabula.DatasetInterface)
type WordsBiasFrequency
- func (ftr *WordsBiasFrequency) Compute(dataset tabula.DatasetInterface)
type WordsBiasImpact
- func (ftr *WordsBiasImpact) Compute(dataset tabula.DatasetInterface)
type WordsPronounFrequency
- func (ftr *WordsPronounFrequency) Compute(dataset tabula.DatasetInterface)
type WordsPronounImpact
- func (ftr *WordsPronounImpact) Compute(dataset tabula.DatasetInterface)
type WordsSexFrequency
- func (ftr *WordsSexFrequency) Compute(dataset tabula.DatasetInterface)
type WordsSexImpact
- func (ftr *WordsSexImpact) Compute(dataset tabula.DatasetInterface)
type WordsVulgarFrequency
- func (ftr *WordsVulgarFrequency) Compute(dataset tabula.DatasetInterface)
type WordsVulgarImpact
- func (ftr *WordsVulgarImpact) Compute(dataset tabula.DatasetInterface)

Constants ¶

View Source

const (
	// RoundDigit define maximum digit for rounding float value.
	RoundDigit = float64(100000)
)

Variables ¶

View Source

var (
	// DEBUG level, set using environment FEATURE_DEBUG
	DEBUG = 0
)

View Source

var ListFeature []Interface

ListFeature is a global variables which contain all implemented features.

Functions ¶

func ComputeImpact ¶

func ComputeImpact(oldrevid, newrevid string, wordlist []string) float64

ComputeImpact return increased ratio of words in new revision compared to old revision, using

count_of_words_in_old
/
(count_of_words_in_old + count_of_words_in_new)

if no words are found in old and new revision, return 0.5.

func GetAllWordList ¶

func GetAllWordList() (allWords []string)

GetAllWordList return all categorical words used in language based features.

func KullbackLeiblerDivergence ¶

func KullbackLeiblerDivergence(a, b string) (divergence float64)

KullbackLeiblerDivergence comput and return the divergence of two string based on their character probabability.

func Register ¶

func Register(ftr Interface, tipe int, name string)

Register a feature to the list of global features.

func Round ¶

func Round(v float64) float64

Round return float value that has been rounded to `RoundDigit` after comma.

Types ¶

type Anonim ¶

type Anonim Feature

Anonim compute wether editor is login or from anonymous (logged by IP address).

func (*Anonim) Compute ¶

func (anon *Anonim) Compute(dataset tabula.DatasetInterface)

Compute if record in column is IP address then it is an anonim and set their value to 1, otherwise set to 0.

type CharDistributionInsert ¶

type CharDistributionInsert Feature

CharDistributionInsert measure divergence of the character distribution of the inserted text with respect to the expectation.

func (*CharDistributionInsert) Compute ¶

func (ftr *CharDistributionInsert) Compute(dataset tabula.DatasetInterface)

Compute character distribution of inserted text.

type CharDiversity ¶

type CharDiversity Feature

CharDiversity is a feature that measure of different character compared to the length of inserted text, given by expression

length^(1/differentchars)

func (*CharDiversity) Compute ¶

func (ftr *CharDiversity) Compute(dataset tabula.DatasetInterface)

Compute character diversity.

type Class ¶

type Class Feature

Class change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.

func (*Class) Compute ¶

func (ftr *Class) Compute(dataset tabula.DatasetInterface)

Compute change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.

type CommentLength ¶

type CommentLength Feature

CommentLength feature for compute the length of edit comment.

func (*CommentLength) Compute ¶

func (ftr *CommentLength) Compute(dataset tabula.DatasetInterface)

Compute will count number of bytes that is used in comment, NOT including the header content "/* ... */".

type CompressRate ¶

type CompressRate Feature

CompressRate is a feature that compute compression rate of inserted text.

func (*CompressRate) Compute ¶

func (ftr *CompressRate) Compute(dataset tabula.DatasetInterface)

Compute compress rate of inserted text.

type DigitRatio ¶

type DigitRatio Feature

DigitRatio is a feature that compare digit to all character.

func (*DigitRatio) Compute ¶

func (ftr *DigitRatio) Compute(dataset tabula.DatasetInterface)

Compute calculate digit ratio in new revision.

type Feature ¶

type Feature struct {
	tabula.Column
}

Feature define type that hold the feature name and values.

type GoodToken ¶

type GoodToken Feature

GoodToken count how many good token in inserted text.

func (*GoodToken) Compute ¶

func (ftr *GoodToken) Compute(dataset tabula.DatasetInterface)

Compute number of good token in inserted text.

type Interface ¶

type Interface interface {
	tabula.ColumnInterface
	Compute(dataset tabula.DatasetInterface)
}

Interface define the methods that must be implemented by feature.

func GetByName ¶

func GetByName(name string) Interface

GetByName return feature object by their name.

type LongestCharSeq ¶

type LongestCharSeq Feature

LongestCharSeq will compute maximum sequence of character at inserted text.

func (*LongestCharSeq) Compute ¶

func (ftr *LongestCharSeq) Compute(dataset tabula.DatasetInterface)

Compute maximum sequence of character at inserted text.

type LongestWord ¶

type LongestWord Feature

LongestWord find and return the longset word in inserted text.

func (*LongestWord) Compute ¶

func (ftr *LongestWord) Compute(dataset tabula.DatasetInterface)

Compute the longest word in inserted text.

type NonAlnumRatio ¶

type NonAlnumRatio Feature

NonAlnumRatio is a feature that compare non alpha-numeric to all character in inserted text.

func (*NonAlnumRatio) Compute ¶

func (ftr *NonAlnumRatio) Compute(dataset tabula.DatasetInterface)

Compute non-alphanumeric ratio with all character in inserted text.

type SizeIncrement ¶

type SizeIncrement Feature

SizeIncrement is a feature that compare the size of new with old revision by subtracting their length.

func (*SizeIncrement) Compute ¶

func (ftr *SizeIncrement) Compute(dataset tabula.DatasetInterface)

Compute the absolute size increment.

type SizeRatio ¶

type SizeRatio Feature

SizeRatio is a feature that compare the size ratio of new with old revision.

func (*SizeRatio) Compute ¶

func (ftr *SizeRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of size between new and old revision.

type Template ¶

type Template Feature

Template template to add new feature to this generator.

func (*Template) Compute ¶

func (ftr *Template) Compute(dataset tabula.DatasetInterface)

Compute describe what this feature do.

type TermFrequency ¶

type TermFrequency Feature

TermFrequency compute frequency of words in inserted text againts the new revision.

func (*TermFrequency) Compute ¶

func (ftr *TermFrequency) Compute(dataset tabula.DatasetInterface)

Compute the frequency of inserted words.

type UpperLowerRatio ¶

type UpperLowerRatio Feature

UpperLowerRatio is a feature that compare uppercase and lowercase characters.

func (*UpperLowerRatio) Compute ¶

func (ftr *UpperLowerRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of uppercase and lowercase in new revision.

type UpperToAllRatio ¶

type UpperToAllRatio Feature

UpperToAllRatio is a feature that compare uppercase with all characters.

func (*UpperToAllRatio) Compute ¶

func (ftr *UpperToAllRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of uppercase to all characters in new revision.

type WordsAllFrequency ¶

type WordsAllFrequency Feature

WordsAllFrequency compute vandalism, pronouns, bias, sex, and bad words in inserted text.

func (*WordsAllFrequency) Compute ¶

func (ftr *WordsAllFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of all words.

type WordsAllImpact ¶

type WordsAllImpact Feature

WordsAllImpact will compute the impact of vulgar, pronoun, bias, sex, and bad words between old and new revision.

func (*WordsAllImpact) Compute ¶

func (ftr *WordsAllImpact) Compute(dataset tabula.DatasetInterface)

Compute the impact of vulgar, pronoun, bias, sex, and bad words in inserted text.

type WordsBadFrequency ¶

type WordsBadFrequency Feature

WordsBadFrequency will compute frequency of bad words, colloquial words or bad writing skill words.

func (*WordsBadFrequency) Compute ¶

func (ftr *WordsBadFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of bad words.

type WordsBadImpact ¶

type WordsBadImpact Feature

WordsBadImpact will count frequency of bad words in inserted text.

func (*WordsBadImpact) Compute ¶

func (ftr *WordsBadImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency bad words in inserted text.

type WordsBiasFrequency ¶

type WordsBiasFrequency Feature

WordsBiasFrequency will count frequency of colloquial words with high bias in inserted text.

func (*WordsBiasFrequency) Compute ¶

func (ftr *WordsBiasFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of biased words.

type WordsBiasImpact ¶

type WordsBiasImpact Feature

WordsBiasImpact will count frequency of biased words in inserted text.

func (*WordsBiasImpact) Compute ¶

func (ftr *WordsBiasImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency bias words in inserted text.

type WordsPronounFrequency ¶

type WordsPronounFrequency Feature

WordsPronounFrequency will count frequency of first and second person pronoun in inserted text.

func (*WordsPronounFrequency) Compute ¶

func (ftr *WordsPronounFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of pronoun words in inserted text.

type WordsPronounImpact ¶

type WordsPronounImpact Feature

WordsPronounImpact will count frequency of pronoun words in inserted text.

func (*WordsPronounImpact) Compute ¶

func (ftr *WordsPronounImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency pronoun words in inserted text.

type WordsSexFrequency ¶

type WordsSexFrequency Feature

WordsSexFrequency will count frequency of non-vulgar, sex-related words.

func (*WordsSexFrequency) Compute ¶

func (ftr *WordsSexFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of sex related words.

type WordsSexImpact ¶

type WordsSexImpact Feature

WordsSexImpact will count frequency of sex words in inserted text.

func (*WordsSexImpact) Compute ¶

func (ftr *WordsSexImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency sex words in inserted text.

type WordsVulgarFrequency ¶

type WordsVulgarFrequency Feature

WordsVulgarFrequency will count frequency of vulgar words in inserted text.

func (*WordsVulgarFrequency) Compute ¶

func (ftr *WordsVulgarFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency vulgar words in inserted text.

type WordsVulgarImpact ¶

type WordsVulgarImpact Feature

WordsVulgarImpact will count frequency of vulgar words in inserted text.

func (*WordsVulgarImpact) Compute ¶

func (ftr *WordsVulgarImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency vulgar words in inserted text.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL