textplain

package module
v0.2.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2024 License: MIT Imports: 8 Imported by: 1

README

Textplain

This project began as a port of the html_to_plaintext logic from github.com/premailer/premailer and applies the same basic set of rules for generating a text/plain copy of an email, given the text/html version

Usage

myHTML := `<html><body>Hello World</body></html>`
myPlaintext := textplain.Convert(myHTML, textplain.DefaultLineLength)

By default it applies a word wrapping algorithm that is also supplied standalone.

wrapped := textplain.WordWrap("hello world, here is some text", 15)

Options

Two plaintexters are supplied:

converter := textplain.NewTreeConverter()

Uses the x/net/html package to parse the supplied html into a tree, and performs a single-pass conversion to plaintext. This is the best performing option, and recommended for general usage.

The library still includes the older converter option

converter := textplain.NewRegexpConverter()

is the most "true to premailer" implementation, and uses regular expressions, which is largely problematic as it needs to both compile those regexps and regular expressions in the Go world use mutexes which limit concurrency

Documentation

Index

Constants

View Source
const (
	DefaultLineLength = 65
)

Defaults

Variables

View Source
var (
	ErrBodyNotFound = errors.New("could not find a `body` element in your html document")
)

Well-defined errors

Functions

func Convert

func Convert(document string, lineLength int) (string, error)

Convert is a convenience method so the library can be used without initializing a converter because this library relies heavily on regexp objects, it may act as a bottlneck to concurrency due to thread-safety mutexes in *regexp.Regexp internals

func MustConvert added in v0.2.0

func MustConvert(document string, lineLength int) string

func WordWrap

func WordWrap(txt string, lineLength int) string

WordWrap searches for logical breakpoints in each line (whitespace) and tries to trim each line to the specified length Note: this diverges from the regex approach in premailer, which I found to be significantly slower in cases with long unbroken lines https://github.com/premailer/premailer/blob/7c94e7a/lib/premailer/html_to_plain_text.rb#L116

Types

type Converter

type Converter interface {
	Convert(string, int) (string, error)
}

func NewRegexpConverter added in v0.2.0

func NewRegexpConverter() Converter

New textplain converter object

func NewTreeConverter added in v0.2.0

func NewTreeConverter() Converter

type RegexpConverter added in v0.2.0

type RegexpConverter struct {
	// contains filtered or unexported fields
}

func (*RegexpConverter) Convert added in v0.2.0

func (t *RegexpConverter) Convert(document string, lineLength int) (string, error)

Convert returns a text-only version of supplied document in UTF-8 format with all HTML tags removed

type TreeConverter added in v0.2.0

type TreeConverter struct{}

func (*TreeConverter) Convert added in v0.2.0

func (t *TreeConverter) Convert(document string, lineLength int) (string, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL