vsm

package
v0.0.0-...-4527964 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 8, 2020 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package go-vsm implements algebraic vector space model for comparison between documents and queries.

Vector space model represents documents and queries as objects (points or vector) in an N-dimensional space where each term is a dimension. So a document d with N number of terms can be represented as point or vector with coordinates d(w₁, w₂, ... wN), where w = term weight.

The term weighting scheme used in this package is the TFIDF:

w = term.Count * log(|Docs| / |{ d ∈ Docs | term ∈ d}|)

Where Docs are the the known documents in the corpus.

Vector space model uses the deviation of angles between each document vector and the query vector to check their similarities by calculating the cosine of the angle between the vectors. So for each document vector dᵢ and a query vector q:

cos0 = dᵢ•q / ||dᵢ|| * ||q||

where ||dᵢ|| = the magnitude of the document vector and ||q|| = the magnitude of the query vector.

See: http://www.minerazzi.com/tutorials/term-vector-1.pdf

Example
docs := []Document{
	{
		Sentence: "Shipment of gold damaged in a fire.",
		Class:    "d1",
	},
	{
		Sentence: "Delivery of silver arrived in a silver truck.",
		Class:    "d2",
	},
	{
		Sentence: "Shipment of gold arrived in a truck.",
		Class:    "d3",
	},
}

vsm := New(nil)

for _, doc := range docs {
	fmt.Println(vsm.StaticTraining(doc))
}

doc, err := vsm.Search("gold silver truck.")

fmt.Println(doc.Class, err)
Output:

<nil>
<nil>
<nil>
d2 <nil>

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	Sentence string
	Class    string
}

Document holds a sentence, which is tokenized and filtered in order to be represented as a vector, and a class for keywording/tagging the sentence.

type TrainingResult

type TrainingResult struct {
	Doc Document
	Err error
}

TrainingResult holds the Document used for the training process and an error which will be nil if no error occurred.

type VSM

type VSM struct {
	// contains filtered or unexported fields
}

VSM holds the corpus for the algebraic vector space model calculation.

func New

func New(t transform.Transformer) *VSM

New returns a VSM structure used for searching documents in the corpus. The Transformer is used for filtering the documents sentences and queries.

func (*VSM) DynamicTraining

func (v *VSM) DynamicTraining(ctx context.Context, docCh <-chan Document) <-chan TrainingResult

DynamicTraining receives a producer channel of Document used for dynamically augmenting the corpus. It returns a TrainingResult channel which can be used to check if an error occurred during the training process.

Example
docs := []Document{
	{
		Sentence: "Shipment of gold damaged in a fire.",
		Class:    "d1",
	},
	{
		Sentence: "Shipment of gold arrived in a truck.",
		Class:    "d3",
	},
}

vsm := New(nil)

for _, doc := range docs {
	err := vsm.StaticTraining(doc)
	fmt.Println(err)
}

docCh := make(chan Document)

go func() {
	defer close(docCh)

	// Loads document from some source dynamically
	// and sends it to the training channel.
	docCh <- Document{
		Sentence: "Delivery of silver arrived in a silver truck.",
		Class:    "d2",
	}
}()

trainCh := vsm.DynamicTraining(context.Background(), docCh)

// Waits until all documents are consumed by the training process.
for {
	_, ok := <-trainCh
	if !ok {
		break
	}
}

doc, err := vsm.Search("gold silver truck.")

fmt.Println(doc.Class, err)
Output:

<nil>
<nil>
d2 <nil>

func (*VSM) Search

func (v *VSM) Search(query string) (*Document, error)

Search returns the most similar document from the corpus with the query based on vector space model, or an error. A nil Document means there's no similarity between any document in the corpus and the query.

Example
docs := []Document{
	{
		Sentence: "Shipment of gold damaged in a fire.",
		Class:    "d1",
	},
	{
		Sentence: "Delivery of silver arrived in a silver truck.",
		Class:    "d2",
	},
	{
		Sentence: "Shipment-of-gold-arrived in a truck.",
		Class:    "d3",
	},
}

// transformer will be applied to every sentence
// and query.
transformer := runes.Map(func(r rune) rune {
	// Replaces hyphens by space.
	if unicode.Is(unicode.Hyphen, r) {
		return ' '
	}
	return r
})

vsm := New(transformer)

for _, doc := range docs {
	fmt.Println(vsm.StaticTraining(doc))
}

doc, err := vsm.Search("shipment gold in a flying truck.")

fmt.Println(doc.Class, err)
Output:

<nil>
<nil>
<nil>
d3 <nil>

func (*VSM) StaticTraining

func (v *VSM) StaticTraining(dc Document) error

StaticTraining receives Document used for augmenting the corpus. It returns error if the training was unsucessful.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL