unhtml

package module

v1.0.14 Latest Latest Go to latest Published: Feb 25, 2024 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ancientlore/unhtml

Links

Open Source Insights

README ¶

UNHTML

Package unhtml is designed to remove HTML tags from text and do minor formatting updates. It is intended to avoid changing the text very much - in particular so that it doesn't mess up formatted plain text when HTML tags are not present. This allows you to run the converter on data that you received without needing to bother checking if HTML tags are in there.

That said, the package is primarily intended to handle minor HTML snippets. It isn't a full-fledged formatter. Below are tags that are ignored, handled, and intentionally skipped.

Ignored tags:

(comments)
DOCTYPE
abbr
acronym
address
area
article
aside
audio
b
base
basefont
bdi
bdo
big
body
button
canvas
caption
center
cite
code
col
colgroup
datalist
dd
del
details
dfn
dialog
dir
dl
dt
em
fieldset
figcaption
figure
font
footer
form
header
html
i
input
ins
kbd
keygen
label
legend
main
map
mark
menu
menuitem
meter
nav
noframes
noscript
optgroup
option
output
param
progress
q
rp
rt
ruby
s
samp
section
select
small
source
span
strike
strong
sub
summary
sub
summary
sup
tbody
textarea
tfoot
thead
time
track
tt
u
var
video
wbr

Skipped tags:

applet
embed
frame
frameset
head
iframe
link
meta
object
script
style
title

Handled tags:

a
blockquote
br
div
h1 to h6
hr
img
li
ol
p
pre
table
td
th
tr

Documentation ¶

Overview ¶

Package unhtml is designed to remove HTML tags from text and do minor formatting updates. It is intended to avoid changing the text very much - in particular so that it doesn't mess up formatted plain text when HTML tags are not present. This allows you to run the converter on data that you received without needing to bother checking if HTML tags are in there.

That said, the package is primarily intended to handle minor HTML snippets. It isn't a full-fledged formatter. Below are tags that are ignored, handled, and intentionally skipped.

Ignored tags:

(comments)
DOCTYPE
abbr
acronym
address
area
article
aside
audio
b
base
basefont
bdi
bdo
big
body
button
canvas
caption
center
cite
code
col
colgroup
datalist
dd
del
details
dfn
dialog
dir
dl
dt
em
fieldset
figcaption
figure
font
footer
form
header
html
i
input
ins
kbd
keygen
label
legend
main
map
mark
menu
menuitem
meter
nav
noframes
noscript
optgroup
option
output
param
progress
q
rp
rt
ruby
s
samp
section
select
small
source
span
strike
strong
sub
summary
sub
summary
sup
tbody
textarea
tfoot
thead
time
track
tt
u
var
video
wbr

Skipped tags:

applet
embed
frame
frameset
head
iframe
link
meta
object
script
style
title

Handled tags:

a
blockquote
br
div
h1 to h6
hr
img
li
ol
p
pre
table
td
th
tr

Index ¶

func HtmlToText(in io.Reader, out io.Writer) error
func HtmlToTextString(in string) (string, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HtmlToText ¶

func HtmlToText(in io.Reader, out io.Writer) error

HtmlToText converts the HTML in the reader to text in the writer.

func HtmlToTextString ¶

func HtmlToTextString(in string) (string, error)

HtmlToTextString converts a string of HTML into a string of plain text.

Types ¶

This section is empty.

Source Files ¶

View all Source files

html2text.go

Directories ¶

Path	Synopsis
cmd
unhtml

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL