htmltable

package module
v0.0.0-...-999ccee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2024 License: MIT Imports: 7 Imported by: 0

README

HTML table data extractor for Go

htmltable enables structured data extraction from HTML tables, requiring almost no external dependencies except x/net/html

Installation

go get github.com/cel-edward/go-htmltable

Usage

Pass an html string into New() or NewFromString(). []*Table is returned, where Table.Data is of form [][]string.

rowspans and colspans are 'demerged', with the contained value copied into each spanned cell.

An example html and result can be found in parse_test.go

Notes

Strings values within returned tables are stripped of surrounding whitespace.

Whitespace is inserted between multiple divs contained in a <td>. For example, if a <td> cell has two elements <div> text 1</div> <div>text2</div> inside, the resulting text produced is text 1 text 2.

Credits

This is a heavily modified fork of github.com/nfx/go-htmltable, designed for use with CEL algorithms.

The main parsing algorithm has been completely rewritten as did not reliably function for our use cases, particularly with complex row/colspans. Returned types are also adjusted.

Documentation

Overview

htmltable enables structured data extraction from HTML tables and URLs

Index

Constants

This section is empty.

Variables

View Source
var Logger func(_ context.Context, msg string, fields ...any)

Logger is a very simplistic structured logger, than should be overriden by integrations.

Functions

This section is empty.

Types

type Parser

type Parser struct {
	Tables []*Table
	// contains filtered or unexported fields
}

type Table

type Table struct {
	Data [][]string
}

Table contains the 2D slice of string data parsed from html.

Each string value is stripped of whitespace.

func New

func New(r io.Reader) ([]*Table, error)

New returns an instance of the page with possibly more than one table

func NewFromString

func NewFromString(r string) ([]*Table, error)

NewFromString is same as New(ctx.Context, io.Reader), but from string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL