htmlarticle

package module
v0.0.0-...-5e99676 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2017 License: Apache-2.0 Imports: 10 Imported by: 0

README

Html2Article

PHP实现提取正文算法 Html2Article

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatTag

func FormatTag(bodyText string) (string, error)

2 格式化标签,剔除匹配标签中的回车符

func GetArticle

func GetArticle(html string) (string, error)

从给定的Html原始文本中获取正文信息

func GetContent

func GetContent(bodyText string) (string, error)

从body标签文本中分析正文内容

func GetLink(html, re string) (map[string]string, error)

func GetRegion

func GetRegion(html, re string) (string, error)

div的区域

func GetTitle

func GetTitle(html string) (string, error)

1

func SaveJar

func SaveJar(cookie []*http.Cookie) error

func SetHeader

func SetHeader(req *http.Request) *http.Request

Types

type HTTPClient

type HTTPClient struct {
	Conn   http.Client
	Domain string
}

func HttpNew

func HttpNew(Domain string) (s *HTTPClient, err error)

func (*HTTPClient) Do

func (s *HTTPClient) Do(url string, data string) ([]byte, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL