unhtml: github.com/ancientlore/unhtml Index | Files

package unhtml

import "github.com/ancientlore/unhtml"

Package unhtml is designed to remove HTML tags from text and do minor formatting updates. It is intended to avoid changing the text very much - in particular so that it doesn't mess up formatted plain text when HTML tags are not present. This allows you to run the converter on data that you received without needing to bother checking if HTML tags are in there.

That said, the package is primarily intended to handle minor HTML snippets. It isn't a full-fledged formatter. Below are tags that are ignored, handled, and intentionally skipped.

Ignored tags:

(comments)
DOCTYPE
abbr
acronym
address
area
article
aside
audio
b
base
basefont
bdi
bdo
big
body
button
canvas
caption
center
cite
code
col
colgroup
datalist
dd
del
details
dfn
dialog
dir
dl
dt
em
fieldset
figcaption
figure
font
footer
form
header
html
i
input
ins
kbd
keygen
label
legend
main
map
mark
menu
menuitem
meter
nav
noframes
noscript
optgroup
option
output
param
progress
q
rp
rt
ruby
s
samp
section
select
small
source
span
strike
strong
sub
summary
sub
summary
sup
tbody
textarea
tfoot
thead
time
track
tt
u
var
video
wbr

Skipped tags:

applet
embed
frame
frameset
head
iframe
link
meta
object
script
style
title

Handled tags:

a
blockquote
br
div
h1 to h6
hr
img
li
ol
p
pre
table
td
th
tr

Index

Package Files

html2text.go

func HtmlToText Uses

func HtmlToText(in io.Reader, out io.Writer) error

HtmlToText converts the HTML in the reader to text in the writer.

func HtmlToTextString Uses

func HtmlToTextString(in string) (string, error)

HtmlToTextString converts a string of HTML into a string of plain text.

Package unhtml imports 5 packages (graph). Updated 2018-10-23. Refresh now. Tools for package owners.