rake

package module
v0.0.0-...-068a9e4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 9, 2019 License: MIT Imports: 5 Imported by: 0

README

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.

Original Python implementation available at: https://github.com/aneesha/RAKE

The source code is released under the MIT License.

Docs and Report Card

Example Usage

package main

import (
	"github.com/afjoseph/goRAKE"
	"fmt"
)

func main() {
	text := `The growing doubt of human autonomy and reason has created a state of moral confusion where man is left without the guidance of either revelation or reason. The result is the acceptance of a relativistic position which proposes that value judgements and ethical norms are exclusively matters of arbitrary preference and that no objectively valid statement can be made in this realm... But since man cannot live without values and norms, this relativism makes him an easy prey for irrational value systems.`

	candidates := rake.RunRake(text)

	for _, candidate := range candidates {
		fmt.Printf("%s --> %f\n", candidate.Key, candidate.Value)
	}

	fmt.Printf("\nsize: %d\n", len(candidates))
}

<!---------------------------------------------------------->
<!--output-->
<!---------------------------------------------------------->
<!--objectively valid statement --> 9.000000-->
<!--exclusively matters --> 4.000000-->
<!--arbitrary preference --> 4.000000-->
<!--easy prey --> 4.000000-->
<!--relativistic position --> 4.000000-->
<!--human autonomy --> 4.000000-->
<!--relativism makes --> 4.000000-->
<!--growing doubt --> 4.000000-->
<!--moral confusion --> 4.000000-->
<!--ethical norms --> 3.500000-->
<!--norms --> 1.500000-->
<!--made --> 1.000000-->
<!--guidance --> 1.000000-->
<!--man --> 1.000000-->
<!--result --> 1.000000-->
<!--systems --> 1.000000-->
<!--values --> 1.000000-->
<!--realm --> 1.000000-->
<!--live --> 1.000000-->
<!--judgements --> 1.000000-->
<!--reason --> 1.000000-->
<!--left --> 1.000000-->
<!--proposes --> 1.000000-->
<!--irrational --> 1.000000-->
<!--created --> 1.000000-->
<!--acceptance --> 1.000000-->
<!--revelation --> 1.000000-->
<!--state --> 1.000000-->

<!--size: 28-->

Documentation

Index

Constants

This section is empty.

Variables

View Source
var StopWordsSlice = []string{}/* 571 elements not displayed */

stop word list from SMART (Salton,1971). Available at ftp://ftp.cs.cornell.edu/pub/smart/english.stop

Functions

func CalculateWordScores

func CalculateWordScores(phraseList []string) map[string]float64

CalculateWordScores returns a map of (string,float64) that maps to a candidate word and its score in the text

func GenerateCandidateKeywordScores

func GenerateCandidateKeywordScores(phraseList []string, wordScore map[string]float64) map[string]float64

GenerateCandidateKeywordScores returns a map of (string,float64) that contains keywords and their score in the text

func GenerateCandidateKeywords

func GenerateCandidateKeywords(sentenceList []string, stopWordPattern *regexp.Regexp) []string

GenerateCandidateKeywords returns a slice of candidate keywords from a slice of sentences and a stop-words regex

func IsNumber

func IsNumber(str string) bool

IsNumber returns true if the supplied string is a number

func RegexSplitSentences

func RegexSplitSentences() *regexp.Regexp

RegexSplitSentences returns a regexp object that detects punctuation marks

func RegexSplitWords

func RegexSplitWords() *regexp.Regexp

RegexSplitWords returns a regexp object that split words

func RegexStopWords

func RegexStopWords(stopWordsSlice []string) *regexp.Regexp

RegexStopWords builds "stop-words" regex based on a slice of "stop-words"

func SeperateWords

func SeperateWords(text string) []string

SeperateWords returns a slice of all words that have a length greater than a specified number of characters.

func SetDefaultStringFloat64

func SetDefaultStringFloat64(h map[string]float64, k string, v float64) (set bool, r float64)

SetDefaultStringFloat64 is a util function that serves as a Go replacement for Python's `setDefault`: https://docs.python.org/2/library/stdtypes.html#dict.setdefault Basically, if key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

func SetDefaultStringInt

func SetDefaultStringInt(h map[string]int, k string, v int) (set bool, r int)

SetDefaultStringInt is a util function that serves as a Go replacement for Python's `setDefault`: https://docs.python.org/2/library/stdtypes.html#dict.setdefault Basically, if key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

func SplitSentences

func SplitSentences(text string) []string

SplitSentences returns a slice of sentences.

Types

type Pair

type Pair struct {
	Key   string
	Value float64
}

Pair is a simple struct for a key-value pair of string and float64

type PairList

type PairList []Pair

PairList is just a slice of Pairs

func RunRake

func RunRake(text string) PairList

RunRake wraps RunRakeI18N to respect API

func RunRakeI18N

func RunRakeI18N(text string, stopWords []string) PairList

RunRakeI18N returns a slice of key-value pairs (PairList) of a keyword and its score after running the RAKE algorithm on a given text

func (PairList) Len

func (p PairList) Len() int

func (PairList) Less

func (p PairList) Less(i, j int) bool

func (PairList) Swap

func (p PairList) Swap(i, j int)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL