wstat

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 19, 2021 License: MIT Imports: 9 Imported by: 1

README

Simple text statistic library

Go Reference

Library for quick counting the simplest statistics on the text:

  • the total number of characters,
  • the number of spaces and separators,
  • the number of punctuation symbols,
  • the number of digits,
  • the number of words,
  • allows you to perform calculations in the stream or addition of individual lines,
  • supports selection of text from HTML
  • counting the number of machine-visiting (typewritten) pages and author's pages
  • calculation of reading time for different languages and/or different speeds
stat := wstat.FromString(`text data`)
fmt.Println("reading time:", stat)

Documentation

Overview

Library for quick counting the simplest statistics on the text:

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// One typewritten page accommodates 1860 printed signs (including spaces)
	PageChars = 1_860
	// In the USSR and the Russian Federation, an account of the author's leaf
	// is equal to 40,000 printed signs (including punctuation marks, numbers
	// and spaces between words to the fields)
	AuthorChars = 40_000
)
View Source
var IgnoreHTMLTags = map[string]struct{}{
	"script": {},
	"style":  {},
	"head":   {},
	"title":  {},
}

IgnoreHTMLTags contains the list of names of the HTML tags, the contents of which are ignored.

Functions

This section is empty.

Types

type Counter

type Counter struct {
	Chars   int `json:"chars"`   // Total number of characters
	Spaces  int `json:"spaces"`  // Number of spaces and separators
	Puncts  int `json:"puncts"`  // Number of punctuations
	Numbers int `json:"numbers"` // Number of numerics
	Words   int `json:"words"`   // Number of words
}

Counter contains data with statistical counting results.

func Bytes

func Bytes(b []byte) (c Counter)

Bytes returns statistical information about the text from bytes.

func FromHTML

func FromHTML(r io.Reader) (c Counter, err error)

FromHTML extracts text from HTML and returns statistical information on text. The contents of the tag from the IgnoreHTMLTAGS list is ignored.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/mdigger/wstat"
)

func main() {
	sample := `<html>
<head>
<title>   123    </title>
<style type="text/css">
<!--
h1	{text-align:center;
	font-family:Arial, Helvetica, Sans-Serif;
	}

p	{text-indent:20px;
	}
-->
</style>
</head>
<body bgcolor = "#ffffcc" text = "#000000">
<h1>Lorem Ipsum</h1>

<p><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong><br/>
Nunc sit amet ipsum vel nunc interdum ultricies eu non augue. Donec sit amet 
nisl aliquet, ultricies enim id, malesuada libero. Ut maximus felis neque, sed 
porta est accumsan in. Curabitur tincidunt fringilla ultrices. Suspendisse 
porttitor non mauris quis tincidunt. Vivamus sit amet ante vel dui pellentesque 
mollis sit amet sit amet nibh. Vivamus id ante ultricies mi tincidunt sodales et 
pretium ex. In placerat purus vitae ligula tincidunt consectetur. Vivamus vel 
leo ut est molestie molestie non et odio. Nam a iaculis magna, sit amet accumsan 
elit. Nullam quam sapien, accumsan nec porta non, sollicitudin ut magna. 
Suspendisse sed gravida nisl. Nullam porta ultricies pellentesque. Nunc viverra 
convallis mauris, ac aliquam velit commodo in. Nulla facilisis commodo massa in 
egestas. Quisque at enim risus.</p>

<p>Nulla facilisi. Morbi odio ligula, hendrerit vitae mi ullamcorper, fermentum 
laoreet ligula. Aliquam ornare enim nec tortor sagittis faucibus. Morbi pretium 
dui at nibh placerat semper. Maecenas et libero vitae orci fringilla pretium sit 
amet a est. In at ipsum est. Sed laoreet efficitur consequat. Ut pharetra mauris 
sed mi consequat, ac suscipit dolor convallis. Vestibulum in est sollicitudin, 
mattis urna a, malesuada felis. Duis nibh lectus, viverra in aliquet sed, 
ullamcorper et justo. In et elementum sem.</p>
	
<p>Vivamus purus tellus, feugiat ac convallis sed, sollicitudin id justo. Donec 
aliquam ullamcorper ipsum, congue pretium dui interdum a. Maecenas vel neque ac 
magna ornare tempus. Pellentesque tincidunt tincidunt sollicitudin. Morbi neque 
nulla, porttitor vel sagittis quis, dapibus ut leo. In a arcu nec magna cursus 
porta. Donec fermentum dolor a augue viverra feugiat vel eu odio. Sed eu dapibus 
libero. Quisque lacus risus, accumsan ac suscipit non, molestie vel neque. 
Aliquam consequat non neque at molestie. Nunc sed erat ultrices, viverra elit 
quis, tincidunt purus. Fusce vitae diam auctor, ultricies massa at, dictum metus. 
Ut at nibh id velit sollicitudin facilisis ut sit amet dui. Sed ac sapien 
dignissim, accumsan metus et, tempor est.</p>
	
<p>Praesent mollis sagittis neque vel pellentesque. Phasellus laoreet sollicitudin 
ante quis consectetur. Pellentesque hendrerit porta commodo. Proin eget congue 
mauris. Ut nec ornare tellus, id rhoncus nibh. Donec eget elit non nunc egestas 
tempor ac quis massa. Ut nisi augue, gravida in quam aliquet, mollis varius 
augue. Nam vehicula commodo egestas. Phasellus vel odio sollicitudin, sodales 
lacus non, lobortis lorem. Quisque nisl metus, porta vitae mollis sit amet, 
semper eu nulla. Maecenas rhoncus urna ac lacus facilisis, fringilla suscipit 
libero pharetra. Aliquam ornare metus eget magna accumsan tincidunt.</p>
	
<p>In pellentesque neque vel ex sodales feugiat vel nec nibh. Nullam eleifend velit 
at enim congue tempor. Suspendisse gravida gravida enim id convallis. Class 
aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos 
himenaeos. Pellentesque rutrum orci in mi consectetur, sit amet bibendum elit 
vehicula. Curabitur tincidunt metus id ex pulvinar, in cursus dui interdum. 
Curabitur nec semper lectus, a tempor dui. Integer porta ligula nec sollicitudin 
feugiat. Vivamus ornare ligula vel elit pellentesque, id sagittis neque 
dignissim. Nulla purus nunc, fermentum ut efficitur eu, ultrices vel odio. 
Proin id accumsan nisi. Praesent sem felis, lacinia vel quam a, interdum 
fringilla velit.</p>

</body>
</html>`

	stat, err := wstat.FromHTML(strings.NewReader(sample))
	if err != nil {
		panic(err)
	}

	fmt.Println("reading time:", stat)

}
Output:

reading time: 2m36s (520 words)

func ReadFrom

func ReadFrom(r io.Reader) (c Counter, err error)

ReadFrom returns statistical information about the text from stream.

func String

func String(s string) (c Counter)

String returns statistical information about the text from string.

Example
package main

import (
	"fmt"

	"github.com/mdigger/wstat"
)

func main() {
	sample := `Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Nunc sit amet ipsum vel nunc interdum ultricies eu non augue. Donec sit amet 
nisl aliquet, ultricies enim id, malesuada libero. Ut maximus felis neque, sed 
porta est accumsan in. Curabitur tincidunt fringilla ultrices. Suspendisse 
porttitor non mauris quis tincidunt. Vivamus sit amet ante vel dui pellentesque 
mollis sit amet sit amet nibh. Vivamus id ante ultricies mi tincidunt sodales et 
pretium ex. In placerat purus vitae ligula tincidunt consectetur. Vivamus vel 
leo ut est molestie molestie non et odio. Nam a iaculis magna, sit amet accumsan 
elit. Nullam quam sapien, accumsan nec porta non, sollicitudin ut magna. 
Suspendisse sed gravida nisl. Nullam porta ultricies pellentesque. Nunc viverra 
convallis mauris, ac aliquam velit commodo in. Nulla facilisis commodo massa in 
egestas. Quisque at enim risus.

Nulla facilisi. Morbi odio ligula, hendrerit vitae mi ullamcorper, fermentum 
laoreet ligula. Aliquam ornare enim nec tortor sagittis faucibus. Morbi pretium 
dui at nibh placerat semper. Maecenas et libero vitae orci fringilla pretium sit 
amet a est. In at ipsum est. Sed laoreet efficitur consequat. Ut pharetra mauris 
sed mi consequat, ac suscipit dolor convallis. Vestibulum in est sollicitudin, 
mattis urna a, malesuada felis. Duis nibh lectus, viverra in aliquet sed, 
ullamcorper et justo. In et elementum sem.
	
Vivamus purus tellus, feugiat ac convallis sed, sollicitudin id justo. Donec 
aliquam ullamcorper ipsum, congue pretium dui interdum a. Maecenas vel neque ac 
magna ornare tempus. Pellentesque tincidunt tincidunt sollicitudin. Morbi neque 
nulla, porttitor vel sagittis quis, dapibus ut leo. In a arcu nec magna cursus 
porta. Donec fermentum dolor a augue viverra feugiat vel eu odio. Sed eu dapibus 
libero. Quisque lacus risus, accumsan ac suscipit non, molestie vel neque. 
Aliquam consequat non neque at molestie. Nunc sed erat ultrices, viverra elit 
quis, tincidunt purus. Fusce vitae diam auctor, ultricies massa at, dictum metus. 
Ut at nibh id velit sollicitudin facilisis ut sit amet dui. Sed ac sapien 
dignissim, accumsan metus et, tempor est.
	
Praesent mollis sagittis neque vel pellentesque. Phasellus laoreet sollicitudin 
ante quis consectetur. Pellentesque hendrerit porta commodo. Proin eget congue 
mauris. Ut nec ornare tellus, id rhoncus nibh. Donec eget elit non nunc egestas 
tempor ac quis massa. Ut nisi augue, gravida in quam aliquet, mollis varius 
augue. Nam vehicula commodo egestas. Phasellus vel odio sollicitudin, sodales 
lacus non, lobortis lorem. Quisque nisl metus, porta vitae mollis sit amet, 
semper eu nulla. Maecenas rhoncus urna ac lacus facilisis, fringilla suscipit 
libero pharetra. Aliquam ornare metus eget magna accumsan tincidunt.
	
In pellentesque neque vel ex sodales feugiat vel nec nibh. Nullam eleifend velit 
at enim congue tempor. Suspendisse gravida gravida enim id convallis. Class 
aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos 
himenaeos. Pellentesque rutrum orci in mi consectetur, sit amet bibendum elit 
vehicula. Curabitur tincidunt metus id ex pulvinar, in cursus dui interdum. 
Curabitur nec semper lectus, a tempor dui. Integer porta ligula nec sollicitudin 
feugiat. Vivamus ornare ligula vel elit pellentesque, id sagittis neque 
dignissim. Nulla purus nunc, fermentum ut efficitur eu, ultrices vel odio. 
Proin id accumsan nisi. Praesent sem felis, lacinia vel quam a, interdum 
fringilla velit.`

	stat := wstat.String(sample)
	fmt.Printf(`
--- stats -----------
chars:        %v
spaces:       %v
puncts:       %v
numbers:      %v
words:        %v
--- pages -----------
typewritten:  %v
author's:     %v
--- reading time ----
duration:     %v
---------------------
`,
		stat.Chars, stat.Spaces, stat.Puncts, stat.Numbers, stat.Words,
		stat.Pages(), stat.AuthorPages(), stat.Duration(228))

}
Output:

--- stats -----------
chars:        3504
spaces:       566
puncts:       111
numbers:      0
words:        518
--- pages -----------
typewritten:  2
author's:     0.0876
--- reading time ----
duration:     2m16s
---------------------

func Sum

func Sum(counters ...Counter) (c Counter)

Sum returns a new statistics counter with summared data.

func (Counter) AuthorPages

func (c Counter) AuthorPages() float32

AuthorPages returns the number of author pages that occupy processed text.

func (Counter) Duration

func (c Counter) Duration(wps int) time.Duration

Duration will return the approximate text reading time at a given speed reading (words per minute).

The average speed by languages (wps):

English — 228
Spanish — 218
France — 195
Russian — 184
Turkish — 166
Finnish — 161
Chinese — 158
Arabic — 138

func (Counter) Pages

func (c Counter) Pages() int

Pages returns an approximate number of standard typewritten pages, which takes the processed text.

func (*Counter) ReadFrom

func (c *Counter) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom calc and add statistical information about the text from stream. io.ReaderFrom interface support.

func (*Counter) Reset

func (c *Counter) Reset()

Reset resets all counters.

func (Counter) String

func (c Counter) String() string

String Returns a string with the reading time and the number of words. The reading speed uses the 200 WPS value used by default in most of the calculation algorithms for this kind.

func (*Counter) Write

func (c *Counter) Write(s []byte) (n int, err error)

Write allows you to transfer a set of bytes to account for text statistics. Supports an io.Writer interface.

func (*Counter) WriteString

func (c *Counter) WriteString(s string) (n int, err error)

Write allows you to transfer a strings to account for text statistics. Supports an io.StringWriter interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL