Documentation ¶
Index ¶
- func Crawl(endpoint string, approximateMaxNodes int32, parallelism int, msDelay int, ...)
- func Run(endpoint string, isValidCrawlLink IsValidCrawlLinkFunction, ...)
- func ServeMetrics()
- func UpdateMetrics(numberOfNodesAdded int, currDepth int)
- type AddEdgeFunction
- type ConnectToDBFunction
- type FilterPageFunction
- type GetNewNodeFunction
- type IsValidCrawlLinkFunction
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Crawl ¶
func Crawl( endpoint string, approximateMaxNodes int32, parallelism int, msDelay int, isValidCrawlLink IsValidCrawlLinkFunction, addEdgesIfDoNotExist AddEdgeFunction, filterPage FilterPageFunction, )
crawls a domain and saves relatives links to a db
func Run ¶
func Run( endpoint string, isValidCrawlLink IsValidCrawlLinkFunction, connectToDB ConnectToDBFunction, addEdgesIfDoNotExist AddEdgeFunction, getNewNode GetNewNodeFunction, filterPage FilterPageFunction, )
crawls until approximateMaxNodes nodes is reached
func UpdateMetrics ¶
updates prometheus and internal metrics
Types ¶
type AddEdgeFunction ¶
add edge to graph in DB return 'true' if edge already exists
type ConnectToDBFunction ¶
type ConnectToDBFunction func() error
establishes initial connection to DB
type FilterPageFunction ¶
type FilterPageFunction func(e *colly.HTMLElement) (*colly.HTMLElement, error)
filters page down to more specific element
type GetNewNodeFunction ¶
retrieves new node if current expires
type IsValidCrawlLinkFunction ¶
check if valid url string for crawling
Click to show internal directories.
Click to hide internal directories.