sentiment

package module
v0.0.0-...-c697f64 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 17, 2020 License: MIT Imports: 14 Imported by: 16

README

Sentiment

Simple, Drop In Sentiment Analysis in Golang

GoDoc wercker status

This package relies on the work done in my other package, goml, for multiclass text classification

Sentiment lets you pass strings into a function and get an estimate of the sentiment of the string (in english) using a very simple probabalistic model. The model is trained off of this dataset which is a collection of IMDB movie reviews classified by sentiment. The returned values for single word classification is the given score in {0,1}/{negative/positive} for sentiment as well as the probability on [0,1] that the word is of the expected class. For document sentiment only the class is given (floats would underflow otherwise.)

Implemented Languages

If you want to implement another language, open an issue or email me. It really is not hard (if you have a dataset.)

  • English
    • dataset: IMDB Reviews

Model

Sentiment uses a Naive Bayes classification model for prediction. There are plusses and minuses, but Naive bayes tends to do well for text classification.

Example

You can save the model trained off of the dataset to a json file using the PersistToFile(filepath string) error function so you don't have to run the training again, though it only takes about 4 seconds max.

Training, or Restoring a Pre-Trained Model:

// Train is used within the library, but you should
// usually prefer Restore because it's faster and
// you don't have to be in the project's directory
//
// model, err := sentiment.Train()

model, err := sentiment.Restore()
if err != nil {
    panic(fmt.Sprintf("Could not restore model!\n\t%v\n", err))
}

Analysis:

// get sentiment analysis summary
// in any implemented language
analysis = model.SentimentAnalysis("You're mother is an awful lady", sentiment.English) // 0

LICENSE - MIT

Documentation

Index

Constants

View Source
const (
	English            Language = "en"
	Spanish                     = "es"
	French                      = "fr"
	German                      = "de"
	Italian                     = "it"
	Arabic                      = "ar"
	Japanese                    = "ja"
	Indonesian                  = "id"
	Portugese                   = "pt"
	Korean                      = "ko"
	Turkish                     = "tr"
	Russian                     = "ru"
	Dutch                       = "nl"
	Filipino                    = "fil"
	Malay                       = "msa"
	ChineseTraditional          = "zh-tw"
	ChineseSimplified           = "zh-cn"
	Hindi                       = "hi"
	Norwegian                   = "no"
	Swedish                     = "sv"
	Finnish                     = "fi"
	Danish                      = "da"
	Polish                      = "pl"
	Hungarian                   = "hu"
	Farsi                       = "fa"
	Hebrew                      = "he"
	Urdu                        = "ur"
	Thai                        = "th"
	NoLanguage                  = ""
)

Constants hold the Twitter language codes that will correspond to models. Obviously all of these won't be used initially, but they're here for ease of extention. US English is being lumped with UK English.

View Source
const (
	// TempDirectory is the default temporary
	// directory for persisting models to disk
	TempDirectory string = "/tmp/.sentiment"
)

Variables

This section is empty.

Functions

func Asset

func Asset(name string) ([]byte, error)

Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetDir

func AssetDir(name string) ([]string, error)

AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:

data/
  foo.txt
  img/
    a.png
    b.png

then AssetDir("data") would return []string{"foo.txt", "img"} AssetDir("data/img") would return []string{"a.png", "b.png"} AssetDir("foo.txt") and AssetDir("notexist") would return an error AssetDir("") will return []string{"data"}.

func AssetInfo

func AssetInfo(name string) (os.FileInfo, error)

AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetNames

func AssetNames() []string

AssetNames returns the names of the assets.

func MustAsset

func MustAsset(name string) []byte

MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.

func PersistToFile

func PersistToFile(m Models, path string) error

PersistToFile persists a Models struct to a filepath, returning any errors

func RestoreAsset

func RestoreAsset(dir, name string) error

RestoreAsset restores an asset under the given directory

func RestoreAssets

func RestoreAssets(dir, name string) error

RestoreAssets restores an asset under the given directory recursively

func SplitSentences

func SplitSentences(r rune) bool

SplitSentences takes in a rune r and returns whether the rune is a sentence delimiter ('.', '?', or '!').

It satisfies the interface for strings.FieldsFunc()

func TrainEnglishModel

func TrainEnglishModel(modelMap Models) error

TrainEnglishModel takes in a path to the expected IMDB datasets, and a map of models to add the model to. It'll return any errors if there were any.

Types

type Analysis

type Analysis struct {
	Language  Language        `json:"lang"`
	Words     []Score         `json:"words"`
	Sentences []SentenceScore `json:"sentences,omitempty"`
	Score     uint8           `json:"score"`
}

Analysis returns the analysis of a document, splitting it into total sentiment, individual sentence sentiment, and individual word sentiment, along with the language code

type Language

type Language string

Language is a language code used for differentiating sentiment models

type Models

type Models map[Language]*text.NaiveBayes

Models holds a map from language keys to sentiment classifiers.

func Restore

func Restore() (Models, error)

Restore restores a pre-trained models from a binary asset this is the preferable method of generating a model (use it unless you want to train the model again)

This basically wraps RestoreModels.

func RestoreModels

func RestoreModels(bytes []byte) (Models, error)

RestoreModels takes in a byte of a (presumably) map[Language]LanguageModel and marshals it into a usable model that you can use to run regular, language specific sentiment analysis

func Train

func Train() (Models, error)

Train takes in a directory path to persist the model to, trains the model, and saves the model to the given file. After this is run you can run the SentimentXXX functions effectively.

Note that this must be run from within the project directory! To just get the model without re-training you should just call "Resore"

func (Models) SentimentAnalysis

func (m Models) SentimentAnalysis(sentence string, lang Language) *Analysis

SentimentAnalysis takes in a (possibly 'dirty') sentence (or any block of text,) cleans the text, finds the sentiment of each word in the text, finds the sentiment of the sentence as a whole, adn returns an Analysis struct

type Score

type Score struct {
	Word  string `json:"word"`
	Score uint8  `json:"score"`
}

Score holds the score of a singular word (differs from SentenceScore only in param names and JSON marshaling, not actualy types)

type SentenceScore

type SentenceScore struct {
	Sentence string `json:"sentence"`
	Score    uint8  `json:"score"`
}

SentenceScore holds the score of a document, which could be (and probably is) a sentence

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL