html2gemini

package module
v0.0.0-...-18379cc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 23, 2022 License: MIT Imports: 11 Imported by: 5

README

html2gemini

A Go library to converts HTML into Gemini text/gemini (gemtext)

This is forked from https://jaytaylor.com/html2text with the following changes:

  • output text/gemini format
  • use footnote style references

Introduction

Turns HTML into text/gemini to be served over gemini, or incorporated into a client.

html2gemini is a simple golang package for rendering HTML into plaintext.

Download the package

go get github.com/LukeEmmet/html2gemini

Example usage

See https://github.com/LukeEmmet/html2gmi which is a practical command line application that uses this library. Also see https://github.com/LukeEmmet/duckling-proxy which is an HTTP via Gemini proxy server so you can browse the web from any Gemini client that supports scheme-specific proxies.

To simplify the html passed to this library, you could simplify or sanitise it first, for example using https://github.com/philipjkim/goreadability

Unit-tests

Running the unit-tests is straightforward and standard:

go test

License

Permissive MIT license.

Contact

Email: luke [at] marmaladefoo [dot] com

If you appreciate this library please feel free to drop me a line and tell me, and please send a note of appreciation to Jay Taylor (url below) who wrote the original html2text on which this is based, and who should receive most of the credit.

https://jaytaylor.com/html2text

Documentation

Overview

Example
inputHTML := `
<html>
	<head>
		<title>My Mega Service</title>
		<link rel=\"stylesheet\" href=\"main.css\">
		<style type=\"text/css\">body { color: #fff; }</style>
	</head>

	<body>
		<div class="logo">
			<a href="http://jaytaylor.com/"><img src="/logo-image.jpg" alt="Mega Service"/></a>
		</div>

		<h1>Welcome to your new account on my service!</h1>

		<p>
			Here is some more information:

			<ul>
				<li>Link 1: <a href="https://example.com">Example.com</a></li>
				<li>Link 2: <a href="https://example2.com">Example2.com</a></li>
				<li>Something else</li>
			</ul>
		</p>

		<table>
			<thead>
				<tr><th>Header 1</th><th>Header 2</th></tr>
			</thead>
			<tfoot>
				<tr><td>Footer 1</td><td>Footer 2</td></tr>
			</tfoot>
			<tbody>
				<tr><td>Row 1 Col 1</td><td>Row 1 Col 2</td></tr>
				<tr><td>Row 2 Col 1</td><td>Row 2 Col 2</td></tr>
			</tbody>
		</table>

<pre>
Preformatted content    with    spaces
    and indentation
</pre>
	</body>
</html>`

ctx := NewTraverseContext(Options{PrettyTables: true, LinkEmitFrequency: 100})
text, err := FromString(inputHTML, *ctx)
if err != nil {
	panic(err)
}
fmt.Println(text)
Output:

Mega Service [1]

# Welcome to your new account on my service!

Here is some more information:

* Link 1: Example.com [2]
* Link 2: Example2.com [3]
* Something else

```
+-------------+-------------+
|  HEADER 1   |  HEADER 2   |
+-------------+-------------+
| Row 1 Col 1 | Row 1 Col 2 |
| Row 2 Col 1 | Row 2 Col 2 |
+-------------+-------------+
|  FOOTER 1   |  FOOTER 2   |
+-------------+-------------+
```

```
Preformatted content    with    spaces
   and indentation

```

=> http://jaytaylor.com/ [1] http://jaytaylor.com/
=> https://example.com [2] https://example.com
=> https://example2.com [3] https://example2.com

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func FromHTMLNode

func FromHTMLNode(doc *html.Node, ctx TextifyTraverseContext) (string, error)

FromHTMLNode renders text output from a pre-parsed HTML document.

func FromReader

func FromReader(reader io.Reader, ctx TextifyTraverseContext) (string, error)

FromReader renders text output after parsing HTML for the specified io.Reader.

func FromString

func FromString(input string, ctx TextifyTraverseContext) (string, error)

FromString parses HTML from the input string, then renders the text form.

Types

type Options

type Options struct {
	PrettyTables                bool                 // Turns on pretty ASCII rendering for table elements.
	PrettyTablesOptions         *PrettyTablesOptions // Configures pretty ASCII rendering for table elements.
	OmitLinks                   bool                 // Turns on omitting links
	CitationStart               int                  //Start Citations from this number (default 1)
	CitationMarkers             bool                 //use footnote style citation markers
	LinkEmitFrequency           int                  //emit gathered links after approximately every n paras (otherwise when new heading, or blockquote)
	NumberedLinks               bool                 // number the links [1], [2] etc to match citation markers
	EmitImagesAsLinks           bool                 //emit referenced images as links e.g. <img src=href>
	ImageMarkerPrefix           string               //prefix when emitting images
	EmptyLinkPrefix             string               //prefix when emitting empty links (e.g. <a href=foo><img src=bar></a>
	ListItemToLinkWordThreshold int                  //max number of words in a list item having a single link that is converted to a plain gemini link
}

Options provide toggles and overrides to control specific rendering behaviors.

func NewOptions

func NewOptions() *Options

NewOptions creates Options with default settings

type PrettyTablesOptions

type PrettyTablesOptions struct {
	AutoFormatHeader     bool
	AutoWrapText         bool
	ReflowDuringAutoWrap bool
	ColWidth             int
	ColumnSeparator      string
	RowSeparator         string
	CenterSeparator      string
	HeaderAlignment      int
	FooterAlignment      int
	Alignment            int
	ColumnAlignment      []int
	NewLine              string
	HeaderLine           bool
	RowLine              bool
	AutoMergeCells       bool
	Borders              tablewriter.Border
}

PrettyTablesOptions overrides tablewriter behaviors

func NewPrettyTablesOptions

func NewPrettyTablesOptions() *PrettyTablesOptions

NewPrettyTablesOptions creates PrettyTablesOptions with default settings

type TextifyTraverseContext

type TextifyTraverseContext struct {
	// contains filtered or unexported fields
}

traverseTableCtx holds text-related context.

func NewTraverseContext

func NewTraverseContext(options Options) *TextifyTraverseContext

func (*TextifyTraverseContext) CheckFlushCitations

func (ctx *TextifyTraverseContext) CheckFlushCitations()

FlushCitations emits a list of Gemini links gathered up to this point, if the para count exceeds the emit frequency

func (*TextifyTraverseContext) FlushCitations

func (ctx *TextifyTraverseContext) FlushCitations()

func (*TextifyTraverseContext) ResetCitationCounters

func (ctx *TextifyTraverseContext) ResetCitationCounters()

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL