robots

package
v0.0.0-...-4337d54 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2013 License: MIT Imports: 6 Imported by: 1

Documentation

Overview

A package for parsing robots.txt

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CleanInput

func CleanInput(s string) string

func GetRobotsTxtUrl

func GetRobotsTxtUrl(rawurl string) string

GetRobotsTxtUrl returns the location of robots.txt given a URL that points to somewhere on the server.

Types

type RobotsTxt

type RobotsTxt struct {
	DisallowAll, AllowAll bool
	// User-agents to disallowed URLs
	Rules Rules
	Url   *url.URL
	// contains filtered or unexported fields
}

func NewRobotsTxtFromUrl

func NewRobotsTxtFromUrl(rawurl string) *RobotsTxt

func (*RobotsTxt) Allowed

func (r *RobotsTxt) Allowed(ua, rawurl string) bool

Ask if a specific UserAgent and URL that it wants to crawl is an allowed action. BUG(ChuckHa): Will fail when UserAgent: * and Disallow: / followed by UserAgent: Squidbot and Disallow:

func (*RobotsTxt) GetRobotsTxtFromUrl

func (r *RobotsTxt) GetRobotsTxtFromUrl(robotsUrl string)

Actually get the contents from some robots.txt url.

func (*RobotsTxt) NotAllowed

func (r *RobotsTxt) NotAllowed(ua, rawurl string) bool

type Rules

type Rules map[string][]string

func GetRules

func GetRules(contents string) Rules

func (Rules) Add

func (r Rules) Add(key, value string)

Notes

Bugs

  • Will fail when UserAgent: * and Disallow: / followed by UserAgent: Squidbot and Disallow:

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL