hnscraper

package module
v0.0.0-...-930ec52 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 6, 2021 License: MIT Imports: 7 Imported by: 0

README

hnscraper

Go Reference

Web scraper for HackerNews. While HackerNews has a fantastic API, maybe you'd prefer to scrape the pages directly instead?

Using hnscraper is simple. If you want to request a single page, use ScrapePage():

package main

import (
  "fmt"

  "github.com/thetallpaul/hnscraper"
)

func main() {
  pageTwo := hnscraper.ScapePage(2)

  // Prints the first title on the second page
  fmt.Println(pageTwo.Posts[0].Title)
}

If you want to select an inclusive range of pages, use ScrapeMultPages():

pages := hnscraper.ScrapeMultPages(2, 4)

// Prints the number of votes on every post for pages 2-4
for _, page := range pages {
  for _, post := range page.Posts {
    fmt.Printf("Page: %d, Rank: %d, Votes: %d\n", page.Num, post.Rank, post.Votes)
  }
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Page

type Page struct {
	Posts     []Post    // All the posts on the page
	Num       int       // The page number. Page 1 is the homepage/mainpage
	Retrieved time.Time // The time the request for the page was completed
}

A Page is an entire page on HackerNews. It holds every Post on the page and tracks when the retrieval was done.

func ScrapeMultPages

func ScrapeMultPages(startPage, endPage int) ([]Page, error)

ScrapeMultPages scrapes all pages from the starting page number to the ending page number, inclusive.

func ScrapePage

func ScrapePage(pageNum int) (Page, error)

ScrapePage scrapes a single page from HackerNews. Use '1' for the homepage/mainpage.

type Post

type Post struct {
	Rank        int       // The rank of the post, ie. rank 2 means it's the second highest post on the site
	Title       string    // The title of the post
	Score       int       // How many 'points' the post has received from voting
	By          string    // The username of the user that submitted the post
	URL         string    // The url link that the post is linking to
	NumComments int       // How many comments were made on the post at the time of access
	TimePosted  time.Time // Timestamp when the post was submitted
}

A Post is a single HackerNews post and the attributes associated with it.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL