article

package module
v0.0.0-...-92c3d03 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2016 License: LGPL-3.0 Imports: 6 Imported by: 0

README

Build Status GoDoc Built with Spacemacs

golibri/article

Get Text Content from HTML. An Article gets constructed through processing a HTML page. The relevant content is stripped from all the useless junk and markup and stored as Fulltext. Works best with blog posts or news articles, but even a tweet should suffice.

Given an HTML string of any "content"-site, this module:

  • Fulltext: extracts the relevant paragraphs as plain text
  • Language: determines the language of the text (fallback: en)
  • Description: summarizes the text into a short snippet of upto 3 sentences

installation

go get -u github.com/golibri/article

usage

import "github.com/golibri/article"

func main() {
    // ...get HTML string somewhere, e.g.: with golibri/fetch
    a := article.Parse("website-html-string")
    // a is an Article object, see below
}

data fields

type Article struct {
    Language    string
    Description string
    Fulltext    string
}

license

LGPLv3. (You can use it in commercial projects as you like, but improvements/bugfixes must flow back to this lib.)

Documentation

Overview

Extract Text Content from a HTML string

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Article

type Article struct {
	Language    string
	Description string
	Fulltext    string
}

func Parse

func Parse(s string) Article

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL