archiveis

package module
v0.0.0-...-7a04128 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2018 License: MIT Imports: 15 Imported by: 20

README

archiveis

Documentation Build Status Report Card

About

archive.is is a golang package for archiving web pages via archive.is.

Please be mindful and responsible and go easy on them, we want archive.is to last forever!

Created by Jay Taylor.

Also see: archive.org golang package

TODO
  • Add timeout to .Capture.
  • Consider unifying to single binary
Requirements
  • Go version 1.9 or newer
Installation
go get jaytaylor.com/archive.is/...
Usage
Command-line programs
archive.is <url>

Archive a fresh new copy of an HTML page

archive.is-snapshots <url>

Search for existing page snapshots

Search query examples:

  • microsoft.com for snapshots from the host microsoft.com
  • *.microsoft.com for snapshots from microsoft.com and all its subdomains (e.g. www.microsoft.com)
  • http://twitter.com/burgerking for snapshots from exact url (search is case-sensitive)
  • http://twitter.com/burg* for snapshots from urls starting with http://twitter.com/burg
Go package interfaces
Capture URL HTML Page Content

capture.go:

package main

import (
	"fmt"

	"github.com/jaytaylor/archive.is"
)

var captureURL = "https://jaytaylor.com/"

func main() {
	archiveURL, err := archiveis.Capture(captureURL)
	if err != nil {
		panic(err)
	}
	fmt.Printf("Successfully archived %v via archive.is: %v\n", captureURL, archiveURL)
}

// Output:
//
// Successfully archived https://jaytaylor.com/ via archive.is: https://archive.is/i2PiW
Search for Existing Snapshots

search.go:

package main

import (
    "fmt"
    "time"

    "github.com/jaytaylor/archive.is"
)

var searchURL = "https://jaytaylor.com/"

func main() {
    snapshots, err := archiveis.Search(searchURL, 10*time.Second)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%# v\n", snapshots)
}

// Output:
//
//
Running the test suite
go test ./...
License

Permissive MIT license, see the LICENSE file for more information.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	BaseURL               = "https://archive.is"                                                                                                       // Overrideable default package value.
	HTTPHost              = "archive.is"                                                                                                               // Overrideable default package value.
	UserAgent             = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36" // Overrideable default package value.
	DefaultRequestTimeout = 10 * time.Second                                                                                                           // Overrideable default package value.
	DefaultPollInterval   = 5 * time.Second                                                                                                            // Overrideable default package value.

)

Functions

func Capture

func Capture(u string, cfg ...Config) (string, error)

Capture archives the provided URL using the archive.is service.

Types

type Config

type Config struct {
	Anyway         bool          // Force archival even if there is already a recent snapshot of the page.
	Wait           bool          // Wait until the crawl has been completed.
	WaitTimeout    time.Duration // Max time to wait for crawl completion.  Default is unlimited.
	PollInterval   time.Duration // Interval between crawl completion checks.  Defaults to 5s.
	RequestTimeout time.Duration // Overrides default request timeout.
	SubmitID       string        // Accepts a user-provided submitid.
}

Config settings for page capture client behavior.

type Snapshot

type Snapshot struct {
	URL          string
	ThumbnailURL string
	Timestamp    time.Time
}

Snapshot represents an instance of a URL page snapshot on archive.is.

func Search(url string, timeout time.Duration) ([]Snapshot, error)

Search for URL snapshots.

Directories

Path Synopsis
_examples
cmd module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL