feature

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 12, 2016 License: BSD-3-Clause Imports: 11 Imported by: 0

Documentation

Overview

Package feature contain list of feature implementation to compute vandalism in wikipedia dataset.

Index

Constants

View Source
const (
	// RoundDigit define maximum digit for rounding float value.
	RoundDigit = float64(100000)
)

Variables

View Source
var (
	// DEBUG level, set using environment FEATURE_DEBUG
	DEBUG = 0
)
View Source
var ListFeature []Interface

ListFeature is a global variables which contain all implemented features.

Functions

func ComputeImpact

func ComputeImpact(oldrevid, newrevid string, wordlist []string) float64

ComputeImpact return increased ratio of words in new revision compared to old revision, using

count_of_words_in_old
/
(count_of_words_in_old + count_of_words_in_new)

if no words are found in old and new revision, return 0.5.

func GetAllWordList

func GetAllWordList() (allWords []string)

GetAllWordList return all categorical words used in language based features.

func KullbackLeiblerDivergence

func KullbackLeiblerDivergence(a, b string) (divergence float64)

KullbackLeiblerDivergence comput and return the divergence of two string based on their character probabability.

func Register

func Register(ftr Interface, tipe int, name string)

Register a feature to the list of global features.

func Round

func Round(v float64) float64

Round return float value that has been rounded to `RoundDigit` after comma.

Types

type Anonim

type Anonim Feature

Anonim compute wether editor is login or from anonymous (logged by IP address).

func (*Anonim) Compute

func (anon *Anonim) Compute(dataset tabula.DatasetInterface)

Compute if record in column is IP address then it is an anonim and set their value to 1, otherwise set to 0.

type CharDistributionInsert

type CharDistributionInsert Feature

CharDistributionInsert measure divergence of the character distribution of the inserted text with respect to the expectation.

func (*CharDistributionInsert) Compute

func (ftr *CharDistributionInsert) Compute(dataset tabula.DatasetInterface)

Compute character distribution of inserted text.

type CharDiversity

type CharDiversity Feature

CharDiversity is a feature that measure of different character compared to the length of inserted text, given by expression

length^(1/differentchars)

func (*CharDiversity) Compute

func (ftr *CharDiversity) Compute(dataset tabula.DatasetInterface)

Compute character diversity.

type Class

type Class Feature

Class change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.

func (*Class) Compute

func (ftr *Class) Compute(dataset tabula.DatasetInterface)

Compute change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.

type CommentLength

type CommentLength Feature

CommentLength feature for compute the length of edit comment.

func (*CommentLength) Compute

func (ftr *CommentLength) Compute(dataset tabula.DatasetInterface)

Compute will count number of bytes that is used in comment, NOT including the header content "/* ... */".

type CompressRate

type CompressRate Feature

CompressRate is a feature that compute compression rate of inserted text.

func (*CompressRate) Compute

func (ftr *CompressRate) Compute(dataset tabula.DatasetInterface)

Compute compress rate of inserted text.

type DigitRatio

type DigitRatio Feature

DigitRatio is a feature that compare digit to all character.

func (*DigitRatio) Compute

func (ftr *DigitRatio) Compute(dataset tabula.DatasetInterface)

Compute calculate digit ratio in new revision.

type Feature

type Feature struct {
	tabula.Column
}

Feature define type that hold the feature name and values.

type GoodToken

type GoodToken Feature

GoodToken count how many good token in inserted text.

func (*GoodToken) Compute

func (ftr *GoodToken) Compute(dataset tabula.DatasetInterface)

Compute number of good token in inserted text.

type Interface

type Interface interface {
	tabula.ColumnInterface
	Compute(dataset tabula.DatasetInterface)
}

Interface define the methods that must be implemented by feature.

func GetByName

func GetByName(name string) Interface

GetByName return feature object by their name.

type LongestCharSeq

type LongestCharSeq Feature

LongestCharSeq will compute maximum sequence of character at inserted text.

func (*LongestCharSeq) Compute

func (ftr *LongestCharSeq) Compute(dataset tabula.DatasetInterface)

Compute maximum sequence of character at inserted text.

type LongestWord

type LongestWord Feature

LongestWord find and return the longset word in inserted text.

func (*LongestWord) Compute

func (ftr *LongestWord) Compute(dataset tabula.DatasetInterface)

Compute the longest word in inserted text.

type NonAlnumRatio

type NonAlnumRatio Feature

NonAlnumRatio is a feature that compare non alpha-numeric to all character in inserted text.

func (*NonAlnumRatio) Compute

func (ftr *NonAlnumRatio) Compute(dataset tabula.DatasetInterface)

Compute non-alphanumeric ratio with all character in inserted text.

type SizeIncrement

type SizeIncrement Feature

SizeIncrement is a feature that compare the size of new with old revision by subtracting their length.

func (*SizeIncrement) Compute

func (ftr *SizeIncrement) Compute(dataset tabula.DatasetInterface)

Compute the absolute size increment.

type SizeRatio

type SizeRatio Feature

SizeRatio is a feature that compare the size ratio of new with old revision.

func (*SizeRatio) Compute

func (ftr *SizeRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of size between new and old revision.

type Template

type Template Feature

Template template to add new feature to this generator.

func (*Template) Compute

func (ftr *Template) Compute(dataset tabula.DatasetInterface)

Compute describe what this feature do.

type TermFrequency

type TermFrequency Feature

TermFrequency compute frequency of words in inserted text againts the new revision.

func (*TermFrequency) Compute

func (ftr *TermFrequency) Compute(dataset tabula.DatasetInterface)

Compute the frequency of inserted words.

type UpperLowerRatio

type UpperLowerRatio Feature

UpperLowerRatio is a feature that compare uppercase and lowercase characters.

func (*UpperLowerRatio) Compute

func (ftr *UpperLowerRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of uppercase and lowercase in new revision.

type UpperToAllRatio

type UpperToAllRatio Feature

UpperToAllRatio is a feature that compare uppercase with all characters.

func (*UpperToAllRatio) Compute

func (ftr *UpperToAllRatio) Compute(dataset tabula.DatasetInterface)

Compute ratio of uppercase to all characters in new revision.

type WordsAllFrequency

type WordsAllFrequency Feature

WordsAllFrequency compute vandalism, pronouns, bias, sex, and bad words in inserted text.

func (*WordsAllFrequency) Compute

func (ftr *WordsAllFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of all words.

type WordsAllImpact

type WordsAllImpact Feature

WordsAllImpact will compute the impact of vulgar, pronoun, bias, sex, and bad words between old and new revision.

func (*WordsAllImpact) Compute

func (ftr *WordsAllImpact) Compute(dataset tabula.DatasetInterface)

Compute the impact of vulgar, pronoun, bias, sex, and bad words in inserted text.

type WordsBadFrequency

type WordsBadFrequency Feature

WordsBadFrequency will compute frequency of bad words, colloquial words or bad writing skill words.

func (*WordsBadFrequency) Compute

func (ftr *WordsBadFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of bad words.

type WordsBadImpact

type WordsBadImpact Feature

WordsBadImpact will count frequency of bad words in inserted text.

func (*WordsBadImpact) Compute

func (ftr *WordsBadImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency bad words in inserted text.

type WordsBiasFrequency

type WordsBiasFrequency Feature

WordsBiasFrequency will count frequency of colloquial words with high bias in inserted text.

func (*WordsBiasFrequency) Compute

func (ftr *WordsBiasFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of biased words.

type WordsBiasImpact

type WordsBiasImpact Feature

WordsBiasImpact will count frequency of biased words in inserted text.

func (*WordsBiasImpact) Compute

func (ftr *WordsBiasImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency bias words in inserted text.

type WordsPronounFrequency

type WordsPronounFrequency Feature

WordsPronounFrequency will count frequency of first and second person pronoun in inserted text.

func (*WordsPronounFrequency) Compute

func (ftr *WordsPronounFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of pronoun words in inserted text.

type WordsPronounImpact

type WordsPronounImpact Feature

WordsPronounImpact will count frequency of pronoun words in inserted text.

func (*WordsPronounImpact) Compute

func (ftr *WordsPronounImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency pronoun words in inserted text.

type WordsSexFrequency

type WordsSexFrequency Feature

WordsSexFrequency will count frequency of non-vulgar, sex-related words.

func (*WordsSexFrequency) Compute

func (ftr *WordsSexFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency of sex related words.

type WordsSexImpact

type WordsSexImpact Feature

WordsSexImpact will count frequency of sex words in inserted text.

func (*WordsSexImpact) Compute

func (ftr *WordsSexImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency sex words in inserted text.

type WordsVulgarFrequency

type WordsVulgarFrequency Feature

WordsVulgarFrequency will count frequency of vulgar words in inserted text.

func (*WordsVulgarFrequency) Compute

func (ftr *WordsVulgarFrequency) Compute(dataset tabula.DatasetInterface)

Compute frequency vulgar words in inserted text.

type WordsVulgarImpact

type WordsVulgarImpact Feature

WordsVulgarImpact will count frequency of vulgar words in inserted text.

func (*WordsVulgarImpact) Compute

func (ftr *WordsVulgarImpact) Compute(dataset tabula.DatasetInterface)

Compute frequency vulgar words in inserted text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL