crawler

package module

v1.0.1 Latest Latest Go to latest Published: Apr 24, 2023 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/STRockefeller/article-crawler

Links

Open Source Insights

README ¶

article-crawler

Description

This package is a simple web crawler written in Go, that extracts text content from a given URL by recursively traversing the HTML document tree and selecting certain HTML tags. The tags selected for extraction include p, h1, h2, h3, h4, h5, h6, ul, ol, pre, and blockquote.

Installation

To use this package, you will need to have Go installed on your system. Once you have Go installed, you can add the package to your project using the following command:

go get github.com/STRockefeller/article-crawler

Usage

To use the crawler, simply call the Crawl function with the URL you want to crawl as its argument. The function will return a string containing the extracted text content.

package main

import (
	"fmt"
	"github.com/STRockefeller/article-crawler"
)

func main() {
	url := "https://example.com"
	text := crawler.Crawl(url)
	fmt.Println(text)
}