goodbots

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2021 License: Apache-2.0 Imports: 13 Imported by: 0

README

goodbots - trust but verify

goodbots verifies the IP addresses of respectful crawlers like Googlebot by performing reverse dns and forward dns lookups.

  1. Given an IP address (ex. 66.249.87.225)
  2. It performs a reverse dns lookup to get a hostname (ex. crawl-203-208-60-1.googlebot.com)
  3. Then does a forward dns lookup on the hostname to get an IP (ex. 66.249.87.225)
  4. It compares the 1st IP to the 2nd IP
  5. If they match, goodbots outputs the IP and hostname

Getting Started


How to install/build goodbots

Clone the repo:

git clone git@github.com:eywu/goodbots.git

Change to the /cmd/goodbots directory:

cd goodbots/cmd/goodbots

Build the binary/executable main.go file:

go build

How to use goodbots

If you've built the main.go file that comes with goodbots above, you can simply feed goodbots IPs via standard-in.

Test a single IP

echo "203.208.60.1" | ./goodbots

Test a range of IPs

prips 203.208.40.1 203.208.80.1 | ./goodbots

Test a list of IPs from a text or csv file

./goodbots < ip-list.txt

note: The CSV or text file expects only an IP on its own line.

Example:

66.249.87.224
203.208.23.146
203.208.23.126
203.208.60.227
Saving the results

goodbots prints to standard out with tab delimiters, so you can capture the output with an [output redirect] (https://www.codecademy.com/learn/learn-the-command-line/modules/learn-the-command-line-redirection/cheatsheet).

Saving verified bot IPs and hosts to a filed named saved-results.tsv ./goodbots < ip-list.txt > saved-results.tsv

DNS Resolvers

goodbots randomly selects a different public DNS resolver for each DNS lookup to reduce the chances of being blocked or throttled by your DNS provider if you have lots of IPs to verify.

It uses these DNS providers:

Supported Crawlers

Currently verifying the domain name is a little imprecise. goodbots looks for just the domain name to match and does not match the TLD.

Future improvements will test for more precise domains based on the crawlers specifications.

  • googlebot
    • .googlebot.
    • .google.
  • msnbot
    • .msn.
  • bingbot
    • .msn.
  • pinterest
    • .pinterest.
  • yandex
    • .yandex.
  • baidu
    • .baidu.
  • coccoc
    • .coccoc.

Make it go faster!

By default we only set the concurrency of requests to 10. If you want to speed up the work, you can increase that number by modifying the main.go file before building the binary/executable.

Other usage of goodbots

In building goodbots, we created a general purpose function for simply resolving the hostnames of any IP address.

In main.go you can uncomment the line that calls ResolveNames() and comment out the GoodBots() function call.

This will not perform a forward DNS lookup to verify the hostname resolves to the same IP address. Additionally, it will output errors to the TSV output when it encounters IPs that error out when requesting the hostname.

➜  goodbots git:(main) ✗ prips -i 50 66.100.0.0 66.200.0.0 | ./goodbots
66.100.0.50	(error)	lookup 50.0.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
...
66.100.1.144	(error)	lookup 144.1.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
66.100.0.150	WebGods
66.100.0.250	(error)	lookup 250.0.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
...
66.100.4.76	(error)	lookup 76.4.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
66.100.4.126	mail.esai.com

Written in Golang gopher Gopher courtesy of Gopherize.me

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ForwardDNS

func ForwardDNS(host string) (string, error)

func GoodBots

func GoodBots(cc int64, ctx context.Context, r io.Reader, w io.Writer) error

func ResolveNames

func ResolveNames(cc int64, ctx context.Context, r io.Reader, w io.Writer) error

func ReverseDNS

func ReverseDNS(ip string) ([]string, error)

Types

This section is empty.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL