mangle

package module
v0.0.0-...-c997607 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2015 License: BSD-2-Clause Imports: 11 Imported by: 0

README

mangle

-- import "github.com/grugnog/mangle"

Package mangle is a sanitization / data masking library for Go (golang).

Purpose

This library provides functionality to sanitize text and HTML data. This can be integrated into tools that export or manipulate databases/files so that confidential data is not exposed to staging, development or testing systems.

Getting Started

Install mangle:

go get github.com/grugnog/mangle

Try the command line tool:

go get github.com/grugnog/mangle/manglefile

# Generate a list of words using your system dictionary and download "War and Peace" for testing.
aspell -d en dump master | aspell -l en expand > corpus.txt
wget http://www.gutenberg.org/cache/epub/2600/pg2600.txt

# Run the sample text through manglefile via stdin and stdout.
cat pg2600.txt | manglefile -corpus=corpus.txt -secret=replace-with-a-secure-passphrase | less

For basic usage of mangle as a Go library for strings, see the Example below. For usage using io.Reader/io.Writer for text and html, see the manglefile source code.

Description

Secure - every non-punctuation, non-tag word is replaced, with no reuse of source words.

Fast - around 350,000 words/sec ("War and Peace" mangled in < 2s).

Deterministic - words are selected from a corpus deterministically - subsequent runs will result in the same word replacements as long as the user secret is not changed. This allows automated tests to run against the sanitized data, as well as for efficient rsync's of mangled versions of large slowly changing data sets.

Accurate - maintains a high level of resemblance to the source text, to allow for realistic testing of search indexes, page layout etc. Replacement words are all natural words from a user defined corpus. Source text word length, punctuation, title case and all caps, HTML tags and attributes are maintained.

Usage

func BuildCorpus
func BuildCorpus(scanner *bufio.Scanner) ([255][]string, error)

BuildCorpus is a helper function that reads a bufio.Scanner of words and returns an array of word lengths, each containing an array of words of that length.

func ReadCorpus
func ReadCorpus(filepath string) ([255][]string, error)

ReadCorpus is a helper function that opens and reads a corpus file of words and returns an array of word lengths, each containing an array of words of that length.

type Mangle
type Mangle struct {
	// Corpus of words to use as replacements. An array of word lengths, each
	// containing an array of words of that length.
	Corpus [255][]string
	// A sufficiently long secret, used as a salt so rainbow tables cannot be
	// used to reverse the hashes.
	Secret string
}

Mangle is used to configure an instance prior to mangling.

func (Mangle) MangleHTML
func (m Mangle) MangleHTML(r io.Reader, w io.Writer) error

MangleHTML operates on HTML using an io interface, preserving all HTML tags (including tag attributes), but mangling all content around and between tags.

func (Mangle) MangleIO
func (m Mangle) MangleIO(r io.Reader, w io.Writer) error

MangleIO operates on an io interface, parsing as plain text, and is preferable for long strings.

func (Mangle) MangleString
func (m Mangle) MangleString(s string) string

MangleString operates on strings, and is preferable if you have many short strings to operate on.

Documentation

Overview

Package mangle is a sanitization / data masking library for Go (golang).

Purpose

This library provides functionality to sanitize text and HTML data. This can be integrated into tools that export or manipulate databases/files so that confidential data is not exposed to staging, development or testing systems.

Getting Started

Install mangle:

go get github.com/grugnog/mangle

Try the command line tool:

go get github.com/grugnog/mangle/manglefile

# Generate a list of words using your system dictionary and download "War and Peace" for testing.
aspell -d en dump master | aspell -l en expand > corpus.txt
wget http://www.gutenberg.org/cache/epub/2600/pg2600.txt

# Run the sample text through manglefile via stdin and stdout.
cat pg2600.txt | manglefile -corpus=corpus.txt -secret=replace-with-a-secure-passphrase | less

For basic usage of mangle as a Go library for strings, see the Example below. For usage using io.Reader/io.Writer for text and html, see the manglefile source code.

Description

Secure - every non-punctuation, non-tag word is replaced, with no reuse of source words.

Fast - around 350,000 words/sec ("War and Peace" mangled in < 2s).

Deterministic - words are selected from a corpus deterministically - subsequent runs will result in the same word replacements as long as the user secret is not changed. This allows automated tests to run against the sanitized data, as well as for efficient rsync's of mangled versions of large slowly changing data sets.

Accurate - maintains a high level of resemblance to the source text, to allow for realistic testing of search indexes, page layout etc. Replacement words are all natural words from a user defined corpus. Source text word length, punctuation, title case and all caps, HTML tags and attributes are maintained.

Example

Reads a corpus from a file, initializes a Mangle, runs a string through it and prints the output.

package main

import (
	"fmt"
	"github.com/grugnog/mangle"
	"log"
)

func main() {
	corpus, err := mangle.ReadCorpus("corpus.txt")
	if err != nil {
		log.Fatalf("Corpus read error: %s", err)
	}
	mangler := mangle.Mangle{Corpus: corpus, Secret: "replace-with-a-secure-passphrase"}
	out := mangler.MangleString("Hello world!")
	fmt.Println(out)
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildCorpus

func BuildCorpus(scanner *bufio.Scanner) ([255][]string, error)

BuildCorpus is a helper function that reads a bufio.Scanner of words and returns an array of word lengths, each containing an array of words of that length.

func ReadCorpus

func ReadCorpus(filepath string) ([255][]string, error)

ReadCorpus is a helper function that opens and reads a corpus file of words and returns an array of word lengths, each containing an array of words of that length.

Types

type Mangle

type Mangle struct {
	// Corpus of words to use as replacements. An array of word lengths, each
	// containing an array of words of that length.
	Corpus [255][]string
	// A sufficiently long secret, used as a salt so rainbow tables cannot be
	// used to reverse the hashes.
	Secret string
}

Mangle is used to configure an instance prior to mangling.

func (Mangle) MangleHTML

func (m Mangle) MangleHTML(r io.Reader, w io.Writer) error

MangleHTML operates on HTML using an io interface, preserving all HTML tags (including tag attributes), but mangling all content around and between tags.

func (Mangle) MangleIO

func (m Mangle) MangleIO(r io.Reader, w io.Writer) error

MangleIO operates on an io interface, parsing as plain text, and is preferable for long strings.

func (Mangle) MangleString

func (m Mangle) MangleString(s string) string

MangleString operates on strings, and is preferable if you have many short strings to operate on.

Directories

Path Synopsis
Simple command line text sanitization / data masking tool.
Simple command line text sanitization / data masking tool.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL