rndrec

package module
v0.0.0-...-12cf073 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 25, 2018 License: MIT Imports: 9 Imported by: 0

README

rndrec

MIT licensed GoDoc Language Report Card

Package rndrec is used to randomly select records from a pool based on their relative weight. For example, if the relative weight of one record is 50, on average it will be selected five times more often than a record with a relative weight of 10. This is useful for generating plausible data sets for testing purposes, for example names based on frequency or regions based on population.

Example

Given a file named "continent_population.csv" with the following contents,

Africa|1,030,400,000
Antarctica|0
Asia|4,157,300,000
Australia|36,700,000
Europe|738,600,000
North America|461,114,000
South America|390,700,000

the following call will create a weighted record sample source:

var r *SrcType
var err error

r, err = NewRandomRecordSourceFromFile("continent_population.csv", 1, '|', 0)

The integer argument following the filename is the zero-based column that contains the relative weights in numeric form. Note that the commas in these values are disregarded. The rune argument following the weight column specifies the field separator. All input records are assumed to be delimited with newlines. The final argument is the seed value for the instance's random number source. This can be used to generate repeatable sequences. time.Now().Unix() can be used if repeatable sequences are not desired.

Call r.Record() to randomly retrieve weighted records:

for row := 0; row < 8; row++ {
	for col := 0; col < 8; col++ {
		if col > 0 {
			fmt.Printf(" | ")
		}
		rec = r.Record()
		fmt.Printf("%s", rec[0])
	}
	fmt.Println("")
}

This will generate the following ouput:

South America | Asia | Asia | Africa | Asia | Asia | Asia | Asia
North America | Asia | Asia | North America | Europe | Asia | Asia | Asia
Europe | Africa | Europe | Europe | Asia | Asia | Asia | Asia
Asia | Asia | Asia | Asia | Asia | Asia | Africa | Asia
Asia | Asia | Asia | Asia | Asia | Asia | Asia | Africa
Asia | Africa | Asia | Asia | Europe | Africa | North America | North America
Asia | Europe | Africa | Europe | Asia | South America | Africa | Europe
Asia | Europe | Africa | Asia | Asia | Asia | Asia | Africa

Installation

To install the package on your system, run

go get github.com/jung-kurt/rndrec

License

rndrec is released under the MIT License.

Documentation

Overview

Package rndrec is used to randomly select records from a pool based on their relative weight. For example, if the relative weight of one record is 50, on average it will be selected five times more often than a record with a relative weight of 10. This is useful for generating plausible data sets for testing purposes, for example names based on frequency or regions based on population.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type SrcType

type SrcType struct {
	// contains filtered or unexported fields
}

SrcType is used to generate plausible random records based on a list of weighted records.

Example (EqualWeight)

Demonstrate imolicit equal weight of records

var list = [][]string{
	{"red"},
	{"green"},
	{"blue"},
}
report(list, -1)
Output:

blue: 0.33
green: 0.33
red: 0.33
Example (File)

Demonstrate selection of records from a file

var r *SrcType
var err error

r, err = NewRandomRecordSourceFromFile("data/continent_population.csv", 1, '|', 0)
if err == nil {
	srcReport(r, 1)
} else {
	fmt.Printf("%s\n", err)
}
Output:

Africa: 0.15
Asia: 0.61
Australia: 0.01
Europe: 0.11
North America: 0.07
South America: 0.06
Example (Names)

Generate dummy names based on 1990 US census data

const (
	cnLast = iota
	cnFemale
	cnMale
	cnCount
)
var filenameList = [cnCount]string{
	"data/us/name_last.csv",
	"data/us/name_first_female.csv",
	"data/us/name_first_male.csv",
}
var srcList [cnCount]*SrcType
var err error
var rnd *rand.Rand
var first, mid, last []string
var j, k int

for j = 0; j < cnCount && err == nil; j++ {
	srcList[j], err = NewRandomRecordSourceFromFile(filenameList[j], 1, '|', 0)
}
if err == nil {
	rnd = rand.New(rand.NewSource(0))
	for j = 0; j < 16; j++ {
		if rnd.Intn(5) < 4 {
			k = cnFemale
		} else {
			k = cnMale
		}
		first = srcList[k].Record()
		mid = srcList[k].Record()
		last = srcList[cnLast].Record()
		fmt.Printf("%s %s %s\n", first[0], mid[0][0:1], last[0])
	}
}
if err != nil {
	fmt.Printf("%s\n", err)
}
Output:

Kendall T Creel
Earl J Cox
Jasmin A Stein
Yolanda L Brown
Evelyn M Perkins
Sharon Y Foster
Lea S Carter
Martha A Potts
Jeannie V Ayres
Veronica B Wright
Harriet M Simmons
Janie L Colburn
Anthony P Pulliam
Teresa D Coleman
Florence C Sweeney
Sarah B Ramirez
Example (Population)

Demonstrate selection of records from a structured data source

var list = [][]string{
	{"Africa", "1,030,400,000"},
	{"Antarctica", "0"},
	{"Asia", "4,157,300,000"},
	{"Australia", "36,700,000"},
	{"Europe", "738,600,000"},
	{"North America", "461,114,000"},
	{"South America", "390,700,000"},
}
report(list, 1)
Output:

Africa: 0.15
Asia: 0.61
Australia: 0.01
Europe: 0.11
North America: 0.07
South America: 0.06
Example (Readme)

Simple demonstration for readme file

var r *SrcType
var err error
var rec []string

r, err = NewRandomRecordSourceFromFile("data/continent_population.csv", 1, '|', 0)
if err == nil {
	for row := 0; row < 8; row++ {
		for col := 0; col < 8; col++ {
			if col > 0 {
				fmt.Printf(" | ")
			}
			rec = r.Record()
			fmt.Printf("%s", rec[0])
		}
		fmt.Println("")
	}
} else {
	fmt.Printf("%s\n", err)
}
Output:

South America | Asia | Asia | Africa | Asia | Asia | Asia | Asia
North America | Asia | Asia | North America | Europe | Asia | Asia | Asia
Europe | Africa | Europe | Europe | Asia | Asia | Asia | Asia
Asia | Asia | Asia | Asia | Asia | Asia | Africa | Asia
Asia | Asia | Asia | Asia | Asia | Asia | Asia | Africa
Asia | Africa | Asia | Asia | Europe | Africa | North America | North America
Asia | Europe | Africa | Europe | Asia | South America | Africa | Europe
Asia | Europe | Africa | Asia | Asia | Asia | Asia | Africa
Example (Simple)

Simple example of selection from weighted records.

var list = [][]string{
	{"20%", "20"},
	{"30%", "30"},
	{"10%", "10"},
	{"40%", "40"},
}
report(list, 1)
Output:

10%: 0.10
20%: 0.20
30%: 0.30
40%: 0.40

func NewRandomRecordSource

func NewRandomRecordSource(recs [][]string, weightColPos int, seed int64) (src *SrcType, err error)

NewRandomRecordSource processes a list of multi-field records in which each field is a string. With one exception, one column must be an integer weight. In this column, specified by weightColPos, each occurrence of an underscore or comma is removed and the remaining string is parsed as an integer. The values in this column are relative weights; that is, a record that has a weight twice that of some other record will be selected by Record() on average twice as often. The sum of these weights does not have to be any special value. The exception to the requirement that one field be a weight is when all records are weighted equally. In this case, weightColPos can be set to -1 and records do not need to have a weight column. Records returned by the Record() method depend on a local pseudo-random number generator; seed is used to seed this generator. If any value in the column specified by weightColPos can not be parsed as an integer, or the cumulative value of weights is zero, or the number of records is zero, an error is returned. Otherwise, err is nil and src may be used to retrieve records that are distributed according to their relative weights.

func NewRandomRecordSourceFromFile

func NewRandomRecordSourceFromFile(fileStr string, weightColPos int, fieldSep rune, seed int64) (src *SrcType, err error)

NewRandomRecordSourceFromFile processes a list of multi-field records in the form of a comma-separated-value file with the filename specified by fileStr. Each record must be separated by a newline. Each field is separated by the value specified by fieldSep. For more information on the return value and the other arguments, see NewRandomRecordSource().

func NewRandomRecordSourceFromReader

func NewRandomRecordSourceFromReader(r io.Reader, weightColPos int, fieldSep rune, seed int64) (src *SrcType, err error)

NewRandomRecordSourceFromReader processes a list of multi-field records in the form of a comma-separated-value buffer that can be read with the io.Reader r. Each record must be separated by a newline. Each field is separated by the value specified by fieldSep. For more information on the return value and the other arguments, see NewRandomRecordSource().

func (*SrcType) Record

func (r *SrcType) Record() []string

Record returns a random record based on its relative weight. For example, a record with a relative weight of 40 will be returned, on average, four times as often as a record with the relative weight of 10. The returned record will be in the form of a slice of strings taken directly from the original list used to initialize the SrcType instance.

func (SrcType) String

func (r SrcType) String() string

String implements the fmt.Stringer interface

Directories

Path Synopsis
This command reads various United States census files ands generates files that are compatible with the rndrec package.
This command reads various United States census files ands generates files that are compatible with the rndrec package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL