parser

package
v0.0.0-...-82d1e71 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 1, 2021 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package parser parses websites and generates tree structures (i.e. maps) of their connections.

Index

Constants

This section is empty.

Variables

View Source
var (
	Client *http.Client
)

Functions

func ConstructLinksTreeForNode

func ConstructLinksTreeForNode(node *Link, limitWidth int, limitDepth int, curDepth int, wg *sync.WaitGroup) error

ConstructLinksTreeForNode Parses and constructs a tree map of urls from main node

Types

type HttpPage

type HttpPage struct {
	Path string
}

HttpPage represents web page with html content

func (HttpPage) GetBasePath

func (Adapter HttpPage) GetBasePath() string

GetBasePath Returns base path of a page

func (HttpPage) GetContent

func (Adapter HttpPage) GetContent() (string, error)

GetContent Returns page content

func (HttpPage) GetPath

func (Adapter HttpPage) GetPath() string

GetPath Returns absolute path of a page

type Link struct {
	Value    string   `json:"value"`    // Full url
	Info     LinkInfo `json:"info"`     // Additional information object
	Children []Link   `json:"children"` // Slice of Links found on this link's url
}

Link represents a parsed site URL, it is a node, children are links found on parent node's html page response.

func ConstructTreeForUrl

func ConstructTreeForUrl(url string, maxWidth int, maxDepth int) (Link, error)

ConstructTreeForUrl Generates tree of links for url with boundaries

type LinkInfo

type LinkInfo struct {
	Id         int    `json:"id"`          // Unique id for the node in a tree
	ShortValue string `json:"value_short"` // Shorthand value/name
	Depth      int    `json:"depth"`       // Depth of the node in a tree
	Width      int    `json:"width"`       // Position of node in children slice (0 for each first node child)
}

LinkInfo represents some additional information regarding link (it's position, index and so on)

type Page

type Page interface {
	GetContent() (string, error)
	GetPath() string
	GetBasePath() string
}

Page represents any parsable object (http page, file e.t.c.)

Any page should provide ability to determine it's content by path

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL