crawler

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2023 License: MIT Imports: 4 Imported by: 0

README

article-crawler

Description

This package is a simple web crawler written in Go, that extracts text content from a given URL by recursively traversing the HTML document tree and selecting certain HTML tags. The tags selected for extraction include p, h1, h2, h3, h4, h5, h6, ul, ol, pre, and blockquote.

Installation

To use this package, you will need to have Go installed on your system. Once you have Go installed, you can add the package to your project using the following command:

go get github.com/STRockefeller/article-crawler

Usage

To use the crawler, simply call the Crawl function with the URL you want to crawl as its argument. The function will return a string containing the extracted text content.

package main

import (
	"fmt"
	"github.com/STRockefeller/article-crawler"
)

func main() {
	url := "https://example.com"
	text := crawler.Crawl(url)
	fmt.Println(text)
}

Contributing

Contributions to this package are welcome. If you find a bug or have a feature request, please open an issue or submit a pull request.

License

This package is licensed under the MIT license. See the LICENSE file for more information.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Crawl

func Crawl(url string) (string, error)

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL