plaintext

package
v0.0.0-...-90dfc71 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 6, 2016 License: MIT, MIT Imports: 6 Imported by: 10

README

plaintext

Build Status Go Report Card GoDoc Coverage license

Extract human languages in plain UTF-8 text from computer code and markup

The output is (or should be) line-preserving, meaning, no new lines are added or subtracted.

<p>
foo
</p>

becomes


foo

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func InspectImageAlt

func InspectImageAlt(opt *HTMLText) error

InspectImageAlt is a sample for options WIP

func StripTemplate

func StripTemplate(raw []byte) []byte

StripTemplate is a WIP on remove golang template markup from a file

Types

type Extractor

type Extractor interface {
	Text([]byte) []byte
}

Extractor is an interface for extracting plaintext

func ExtractorByFilename

func ExtractorByFilename(filename string) (Extractor, error)

ExtractorByFilename returns an plaintext extractor based on filename heuristic

type GolangText

type GolangText struct {
}

GolangText extracts plaintext from Golang and other similar C or Java like files

Need to study. https://godoc.org/github.com/fluhus/godoc-tricks

Does not process embedded code blocks

func NewGolangText

func NewGolangText() (*GolangText, error)

NewGolangText creates a new extractor

func (*GolangText) Text

func (p *GolangText) Text(raw []byte) []byte

Text satisfies the Extractor interface

ReplaceGo is a specialized routine for correcting Golang source files. Currently only checks comments, not identifiers for spelling.

Other items:

  • check strings, but need to ignore
  • import "statements" blocks
  • import ( "blocks" )
  • skip first comment (line 0) if build comment

type HTMLText

type HTMLText struct {
	InspectImageAlt bool
}

HTMLText extracts plain text from HTML markup

func NewHTMLText

func NewHTMLText(options ...func(*HTMLText) error) (*HTMLText, error)

NewHTMLText creates a new HTMLText extractor, using options.

func (*HTMLText) Text

func (p *HTMLText) Text(raw []byte) []byte

Text satisfies the plaintext.Extractor interface

type Identity

type Identity struct {
}

Identity provides a pass-through plain text extractor

func NewIdentity

func NewIdentity() (*Identity, error)

NewIdentity creates an identity-extractor

func (*Identity) Text

func (p *Identity) Text(raw []byte) []byte

Text satisfies the plaintext.Extractor interface

type MarkdownText

type MarkdownText struct {
	Extractor Extractor
}

MarkdownText extracts plain text from markdown sources

func NewMarkdownText

func NewMarkdownText(options ...func(*MarkdownText) error) (*MarkdownText, error)

NewMarkdownText creates a new extractor

func (*MarkdownText) Text

func (p *MarkdownText) Text(text []byte) []byte

Text extracts text from a markdown source

type ScriptText

type ScriptText struct {
}

ScriptText extract plaintext from "generic script" languages that use the '#' character to denote a comment line It's not so smart. TODO: add support for Ruby, multi-line comment

http://www.tutorialspoint.com/ruby/ruby_comments.htm

func NewScriptText

func NewScriptText() (*ScriptText, error)

NewScriptText creates a new file extractor

func (*ScriptText) Text

func (p *ScriptText) Text(text []byte) []byte

Text extracts plaintext

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL