md

package module
v0.0.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2020 License: MIT Imports: 13 Imported by: 2

README

html-to-markdown

gopher stading on top of a machine that converts a box of html to blocks of markdown

Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.

Installation

go get github.com/JohannesKaufmann/html-to-markdown

Usage

import "github.com/JohannesKaufmann/html-to-markdown"

converter := md.NewConverter("", true, nil)

html = `<strong>Important</strong>`

markdown, err := converter.ConvertString(html)
if err != nil {
  log.Fatal(err)
}
fmt.Println("md ->", markdown)

If you are already using goquery you can pass a selection to Convert.

markdown, err := converter.Convert(selec)

Options

The third parameter to md.NewConverter is *md.Options.

For example you can change the character that is around a bold text ("**") to a different one (for example "__") by changing the value of StrongDelimiter.

opt := &md.Options{
  StrongDelimiter: "__", // default: **
  // ...
}
converter := md.NewConverter("", true, opt)

For all the possible options look at godocs and for a example look at the example.

Adding Rules

converter.AddRules(
  md.Rule{
    Filter: []string{"del", "s", "strike"},
    Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
      // You need to return a pointer to a string (md.String is just a helper function).
      // If you return nil the next function for that html element
      // will be picked. For example you could only convert an element
      // if it has a certain class name and fallback if not.
      content = strings.TrimSpace(content)
      return md.String("~" + content + "~")
    },
  },
  // more rules
)

For more information have a look at the example add_rules.

Using Plugins

If you want plugins (github flavored markdown like striketrough, tables, ...) you can pass it to Use.

import "github.com/JohannesKaufmann/html-to-markdown/plugin"

// Use the `GitHubFlavored` plugin from the `plugin` package.
converter.Use(plugin.GitHubFlavored())

Or if you only want to use the Strikethrough plugin. You can change the character that distinguishes the text that is crossed out by setting the first argument to a different value (for example "~~" instead of "~").

converter.Use(plugin.Strikethrough(""))

For more information have a look at the example github_flavored.

Writing Plugins

Have a look at the plugin folder for a reference implementation. The most basic one is Strikethrough.

Other Methods

Godoc

func (c *Converter) Keep(tags ...string) *Converter

Determines which elements are to be kept and rendered as HTML.

func (c *Converter) Remove(tags ...string) *Converter

Determines which elements are to be removed altogether i.e. converted to an empty string.

Issues

If you find HTML snippets (or even full websites) that don't produce the expected results, please open an issue!

Documentation

Overview

Package md converts html to markdown.

converter := md.NewConverter("", true, nil)

html = `<strong>Important</strong>`

markdown, err := converter.ConvertString(html)
if err != nil {
  log.Fatal(err)
}
fmt.Println("md ->", markdown)

Or if you are already using goquery:

markdown, err := converter.Convert(selec)

Index

Constants

This section is empty.

Variables

View Source
var Timeout = time.Second * 10

Timeout for the http client

Functions

func DomainFromURL

func DomainFromURL(rawURL string) string

DomainFromURL removes the path from the url.

func IsBlockElement

func IsBlockElement(e string) bool

func IsInlineElement

func IsInlineElement(e string) bool

func String

func String(text string) *string

String is a helper function to return a pointer.

Types

type AdvancedResult

type AdvancedResult struct {
	Header   string
	Markdown string
	Footer   string
}

type Converter

type Converter struct {
	Before func(selec *goquery.Selection)
	// contains filtered or unexported fields
}

Converter is initialized by NewConverter.

func NewConverter

func NewConverter(domain string, enableCommonmark bool, options *Options) *Converter

NewConverter initializes a new converter and holds all the rules.

  • `domain` is used for links and images to convert relative urls ("/image.png") to absolute urls.
  • CommonMark is the default set of rules. Set enableCommonmark to false if you want to customize everything using AddRules and DONT want to fallback to default rules.

func (*Converter) AddRules

func (c *Converter) AddRules(rules ...Rule) *Converter

AddRules adds the rules that are passed in to the converter.

func (*Converter) Convert

func (c *Converter) Convert(selec *goquery.Selection) string

Convert returns the content from a goquery selection. If you have a goquery document just pass in doc.Selection.

func (*Converter) ConvertBytes

func (c *Converter) ConvertBytes(bytes []byte) ([]byte, error)

ConvertBytes returns the content from a html byte array.

func (*Converter) ConvertReader

func (c *Converter) ConvertReader(reader io.Reader) (bytes.Buffer, error)

ConvertReader returns the content from a reader and returns a buffer.

func (*Converter) ConvertResponse

func (c *Converter) ConvertResponse(res *http.Response) (string, error)

ConvertResponse returns the content from a html response.

func (*Converter) ConvertString

func (c *Converter) ConvertString(html string) (string, error)

ConvertString returns the content from a html string. If you already have a goquery selection use `Convert`.

func (*Converter) ConvertURL

func (c *Converter) ConvertURL(url string) (string, error)

ConvertURL returns the content from the page with that url.

func (*Converter) Keep

func (c *Converter) Keep(tags ...string) *Converter

Keep certain html tags in the generated output.

func (*Converter) Remove

func (c *Converter) Remove(tags ...string) *Converter

Remove certain html tags from the source.

func (*Converter) Sanitize added in v0.0.8

func (c *Converter) Sanitize(html string) string

func (*Converter) Use

func (c *Converter) Use(plugins ...Plugin) *Converter

Use can be used to add additional functionality to the converter. It is used when its not sufficient to use only rules for example in Plugins.

type Options

type Options struct {
	PreSanitize bool //sanitise the input before go with the conversion

	// "setext" or "atx"
	// default: "atx"
	HeadingStyle string

	// Any Thematic break
	// default: "* * *"
	HorizontalRule string

	// "-", "+", or "*"
	// default: "-"
	BulletListMarker string

	// "indented" or "fenced"
	// default: "indented"
	CodeBlockStyle string

	// “` or ~~~
	// default: “`
	Fence string

	// _ or *
	// default: _
	EmDelimiter string

	// ** or __
	// default: **
	StrongDelimiter string

	// inlined or referenced
	// default: inlined
	LinkStyle string

	// full, collapsed, or shortcut
	// default: full
	LinkReferenceStyle string
}

Options to customize the output. You can change stuff like the character that is used for strong text.

type Plugin

type Plugin func(conv *Converter) []Rule

Plugin can be used to extends functionality beyond what is offered by commonmark.

type Rule

type Rule struct {
	Filter              []string
	Replacement         func(content string, selec *goquery.Selection, options *Options) *string
	AdvancedReplacement func(content string, selec *goquery.Selection, options *Options) (res AdvancedResult, skip bool)
}

Rule to convert certain html tags to markdown.

md.Rule{
  Filter: []string{"del", "s", "strike"},
  Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
    // You need to return a pointer to a string (md.String is just a helper function).
    // If you return nil the next function for that html element
    // will be picked. For example you could only convert an element
    // if it has a certain class name and fallback if not.
    return md.String("~" + content + "~")
  },
}

Directories

Path Synopsis
Package escape escapes characters that are commonly used in markdown like the * for strong/italic.
Package escape escapes characters that are commonly used in markdown like the * for strong/italic.
examples
Package plugin contains all the rules that are not part of Commonmark like GitHub Flavored Markdown.
Package plugin contains all the rules that are not part of Commonmark like GitHub Flavored Markdown.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL