Documentation ¶
Overview ¶
Converts HTML content into markdown content
Index ¶
- Constants
- func CleanText(content string) string
- func PrintUnicodeRunes(content string)
- type ConfluenceSelectionConverter
- func (c *ConfluenceSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
- func (c *ConfluenceSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
- func (c *ConfluenceSelectionConverter) FindTitle(doc *goquery.Document) string
- func (c *ConfluenceSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
- type DocumentConverter
- type FindDocumentSelection
- type FindSelection
- type FindText
- type GoogleSelectionConverter
- func (c *GoogleSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
- func (c *GoogleSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
- func (c *GoogleSelectionConverter) FindTitle(doc *goquery.Document) string
- func (c *GoogleSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
- type HTMLSelectionConverter
- func (c *HTMLSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
- func (c *HTMLSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
- func (c *HTMLSelectionConverter) FindTitle(doc *goquery.Document) string
- func (c *HTMLSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
- type HandleSelection
- type SelectionCallback
- type SelectionConverter
- type SelectionConverterConfig
- type SelectionToMD
- type Transformer
- func (t *Transformer) RemoveScripts(elm *goquery.Selection)
- func (t *Transformer) ReplaceAll(elm *goquery.Selection)
- func (t *Transformer) ReplaceAnchor(i int, s *goquery.Selection)
- func (t *Transformer) ReplaceAnchors(elm *goquery.Selection)
- func (t *Transformer) ReplaceBold(i int, s *goquery.Selection)
- func (t *Transformer) ReplaceBolds(elm *goquery.Selection)
- func (t *Transformer) ReplaceImage(i int, s *goquery.Selection)
- func (t *Transformer) ReplaceImages(elm *goquery.Selection)
- func (t *Transformer) ReplaceInlineCode(i int, s *goquery.Selection)
- func (t *Transformer) ReplaceInlineCodes(elm *goquery.Selection)
- func (t *Transformer) ReplaceItalic(i int, s *goquery.Selection)
- func (t *Transformer) ReplaceItalics(elm *goquery.Selection)
- func (t *Transformer) ToList(list *goquery.Selection) markdown.List
- func (t *Transformer) ToTable(table *goquery.Selection) markdown.Table
- func (t *Transformer) Transform(pattern string, elm *goquery.Selection, callbacks ...SelectionCallback)
- func (t *Transformer) Transforms(i int, s *goquery.Selection, callbacks ...SelectionCallback)
Constants ¶
const DefaultSearchPattern = "p,span,hr,h1,h2,h3,h4,h5,h6,ul,ol,div,table"
DefaultSearchPattern defines a default pattern to search for elements that will contain content for the markdown document
Variables ¶
This section is empty.
Functions ¶
func CleanText ¶
CleanText removes newlines, replaces common unicode characters with ascii, removes any other non-common ascii values, and trims any whitespace.
func PrintUnicodeRunes ¶
func PrintUnicodeRunes(content string)
PrintUnicodeRunes finds all non-ascii characters in the string and prints out the unicode character point. This is useful for debugging to find unicode characters that need to be handled by the CleanText function.
Types ¶
type ConfluenceSelectionConverter ¶
type ConfluenceSelectionConverter struct { Transformer *Transformer RootElementFinder FindDocumentSelection TitleFinder FindText ContentSelector FindSelection ContentSelectorHandler HandleSelection }
ConfluenceSelectionConverter converts the Confluence HTML page to markdown. Tags controls which HTML tags will be searched when looking for content. If not set, then the defaultTags will be used.
func NewConfluenceSelectionConverter ¶
func NewConfluenceSelectionConverter(conf SelectionConverterConfig) *ConfluenceSelectionConverter
NewConfluenceSelectionConverter intializes a ConfluenceSelectionConverter with default function calls.
func (*ConfluenceSelectionConverter) FindContentElements ¶
func (c *ConfluenceSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
FindContentElements finds the selections that that should be iterated over for content
func (*ConfluenceSelectionConverter) FindRootElement ¶
func (c *ConfluenceSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
FindRootElement finds the root element.
func (*ConfluenceSelectionConverter) FindTitle ¶
func (c *ConfluenceSelectionConverter) FindTitle(doc *goquery.Document) string
FindTitle finds the title of the document.
func (*ConfluenceSelectionConverter) HandleMatchedSelection ¶
func (c *ConfluenceSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
HandleMatchedSelection handles matched selections from FindContentElements.
type DocumentConverter ¶
type DocumentConverter struct {
SelectionConv SelectionConverter
}
DocumentConverter is a struct that can convert an HTML document into a markdown document
func (*DocumentConverter) DocumentToMarkdown ¶
func (c *DocumentConverter) DocumentToMarkdown(doc *goquery.Document) *markdown.Doc
DocumentToMarkdown converts the HTML doc to markdown
func (*DocumentConverter) SelectionToMarkdown ¶
func (c *DocumentConverter) SelectionToMarkdown(elm *goquery.Selection, docConf markdown.DocConfig) *markdown.Doc
SelectionToMarkdown creates a new markdown document, and searches for content to add to the markdown doc. It hands off handling of matched selections to the SelectionConverter since it depends heavily on the HTML structure of the original document.
type FindDocumentSelection ¶
FindDocumentSelection is a callable that finds DOM elements in the given the Document
type FindSelection ¶
FindSelection is a callable that finds DOM elements in the given selection
type GoogleSelectionConverter ¶
type GoogleSelectionConverter struct { Transformer *Transformer RootElementFinder FindDocumentSelection TitleFinder FindText ContentSelector FindSelection ContentSelectorHandler HandleSelection }
GoogleSelectionConverter converts the Google Doc HTML page to markdown
func NewGoogleSelectionConverter ¶
func NewGoogleSelectionConverter(conf SelectionConverterConfig) *GoogleSelectionConverter
NewGoogleSelectionConverter intializes a GoogleSelectionConverter with default function calls.
func (*GoogleSelectionConverter) FindContentElements ¶
func (c *GoogleSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
FindContentElements finds the selections that that should be iterated over for content
func (*GoogleSelectionConverter) FindRootElement ¶
func (c *GoogleSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
FindRootElement finds the root element.
func (*GoogleSelectionConverter) FindTitle ¶
func (c *GoogleSelectionConverter) FindTitle(doc *goquery.Document) string
FindTitle finds the title of the document.
func (*GoogleSelectionConverter) HandleMatchedSelection ¶
func (c *GoogleSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
HandleMatchedSelection handles matched selections from FindContentElements.
type HTMLSelectionConverter ¶
type HTMLSelectionConverter struct { Transformer *Transformer RootElementFinder FindDocumentSelection TitleFinder FindText ContentSelector FindSelection ContentSelectorHandler HandleSelection }
HTMLSelectionConverter converts generic HTML pages to markdown
func NewHTMLSelectionConverter ¶
func NewHTMLSelectionConverter(conf SelectionConverterConfig) *HTMLSelectionConverter
NewHTMLSelectionConverter intializes a HTMLSelectionConverter with default function calls.
func (*HTMLSelectionConverter) FindContentElements ¶
func (c *HTMLSelectionConverter) FindContentElements(s *goquery.Selection) *goquery.Selection
FindContentElements finds the selections that that should be iterated over for content
func (*HTMLSelectionConverter) FindRootElement ¶
func (c *HTMLSelectionConverter) FindRootElement(doc *goquery.Document) *goquery.Selection
FindRootElement finds the root element.
func (*HTMLSelectionConverter) FindTitle ¶
func (c *HTMLSelectionConverter) FindTitle(doc *goquery.Document) string
FindTitle finds the title of the document.
func (*HTMLSelectionConverter) HandleMatchedSelection ¶
func (c *HTMLSelectionConverter) HandleMatchedSelection(i int, elm *goquery.Selection, mdDoc *markdown.Doc, toMD SelectionToMD)
HandleMatchedSelection handles matched selections from FindContentElements.
type HandleSelection ¶
HandleSelection is a callable that is given a selection, a markdown document to add to, and a callable to convert child elements to markdown documents
type SelectionCallback ¶
SelectionCallback is a function that handles a goquery.Selection
type SelectionConverter ¶
type SelectionConverter interface { FindRootElement(*goquery.Document) *goquery.Selection FindTitle(*goquery.Document) string FindContentElements(*goquery.Selection) *goquery.Selection HandleMatchedSelection(int, *goquery.Selection, *markdown.Doc, SelectionToMD) }
SelectionConverter is an interface that converts a style of HTML document to markdown. The interface allows for customization to handle a specific and known HTML structure.
type SelectionConverterConfig ¶
type SelectionConverterConfig struct { Transformer *Transformer RootElementFinder FindDocumentSelection TitleFinder FindText ContentSelector FindSelection ContentSelectorHandler HandleSelection }
SelectionConverterConfig contains parameters that a SelectionConvert will can use to be more customizable
type SelectionToMD ¶
SelectionToMD is a callable that converts a selection to a markdown document
type Transformer ¶
type Transformer struct {
Format string
}
Transformer converts HTML DOM elements into markdown elements
func (*Transformer) RemoveScripts ¶
func (t *Transformer) RemoveScripts(elm *goquery.Selection)
RemoveScripts removes any script, style, or link tags from the DOM element.
func (*Transformer) ReplaceAll ¶
func (t *Transformer) ReplaceAll(elm *goquery.Selection)
ReplaceAll runs all the default replacement functions
func (*Transformer) ReplaceAnchor ¶
func (t *Transformer) ReplaceAnchor(i int, s *goquery.Selection)
ReplaceAnchor replaces the DOM element in place with a markdown link.
func (*Transformer) ReplaceAnchors ¶
func (t *Transformer) ReplaceAnchors(elm *goquery.Selection)
ReplaceAnchors finds all child "a" tags and replaces them in place with markdown links.
func (*Transformer) ReplaceBold ¶
func (t *Transformer) ReplaceBold(i int, s *goquery.Selection)
ReplaceBold replaces the DOM element in place with the text content wrapped in "**".
func (*Transformer) ReplaceBolds ¶
func (t *Transformer) ReplaceBolds(elm *goquery.Selection)
ReplaceBolds finds all child "strong" tags and replaces them in place with markdown bold.
func (*Transformer) ReplaceImage ¶
func (t *Transformer) ReplaceImage(i int, s *goquery.Selection)
ReplaceImage replaces the DOM element in place with a markdown image link. If the Transformer is rendering for Hugo, then will replace with a Hugo figure shortcode.
func (*Transformer) ReplaceImages ¶
func (t *Transformer) ReplaceImages(elm *goquery.Selection)
ReplaceImages finds all child "img" tags and replaces them in place with markdown image links.
func (*Transformer) ReplaceInlineCode ¶
func (t *Transformer) ReplaceInlineCode(i int, s *goquery.Selection)
ReplaceInlineCode replaces the DOM element in place with text content wrapped in "`".
func (*Transformer) ReplaceInlineCodes ¶
func (t *Transformer) ReplaceInlineCodes(elm *goquery.Selection)
ReplaceInlineCodes finds all child "code" tags and replaces them in place with text content wrapped in "`".
func (*Transformer) ReplaceItalic ¶
func (t *Transformer) ReplaceItalic(i int, s *goquery.Selection)
ReplaceItalic replaces the DOM element in place with the text content wrapped in "_".
func (*Transformer) ReplaceItalics ¶
func (t *Transformer) ReplaceItalics(elm *goquery.Selection)
ReplaceItalics finds all child "em" tags and replaces them in place with markdown italics.
func (*Transformer) ToList ¶
func (t *Transformer) ToList(list *goquery.Selection) markdown.List
ToList transforms the "ul" or "ol" dom element to a markdown List.
func (*Transformer) ToTable ¶
func (t *Transformer) ToTable(table *goquery.Selection) markdown.Table
ToTable transforms the "table" dom element to a markdown Table.
func (*Transformer) Transform ¶
func (t *Transformer) Transform(pattern string, elm *goquery.Selection, callbacks ...SelectionCallback)
Transform finds all elements matching the pattern and calls each given callback on each child element.
func (*Transformer) Transforms ¶
func (t *Transformer) Transforms(i int, s *goquery.Selection, callbacks ...SelectionCallback)
Transforms calls each callback on the given DOM element.