adscraper

package module
v0.0.0-...-5597ac0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2017 License: MIT Imports: 16 Imported by: 0

README

adscraper

All you need to scrape raw ads.

Provide a file with keywords, get ads from Google search and store them in a database.

Setup

1. Install Go

Go to golang.org downloads page, download the binary for your architecture and then follow the installation instructions. It's probably as simple as (running with super user privileges)

$ tar -C /usr/local -xzf go$VERSION.$OS-$ARCH.tar.gz

It's recommended to use the default installation location and not set the GOROOT environment variable.

The package was built with Go version 1.8.x.

2. Add Go's install location to your PATH.

Add this to your shell's init scripts.

$ export PATH=$PATH:/usr/local/go/bin
3. Install PostgreSQL 9.6

Go to the official PostgreSQL downloads page. Follow the instructions for your operating system. If you're on Linux, PostgreSQL will most likely be already included in your package manager sources. If you're on macOS you can use Homebrew.

Don't forget to install both the server and the client libraries.

4. Clone the project
$ git clone git@github.com:gkats/adscraper.git
5. Set the GOPATH environment variable

You can set the GOPATH environment variable to whatever you like, however the project assumes that it's cloned under $(GOPATH)/src/github.com/gkats/. There is a helper file provided to automatically set the GOPATH variable.

Go to the root directory and source the .gopath file.

$ . .gopath

This sets the GOPATH environment variable to the project root.

Happy hacking!

Contribute

The package contains a Makefile for building with GNU Make. There are various targets in the Makefile. The default just builds (compiles) the package.

Before you run any Makefile rules, you need to set your GOPATH. The GOPATH variable's value is tightly related to the way you've set up your project directory hierarchy. It is recommended that you follow the $(GOPATH)/src/github.com/gkats/adscraper directory structure. All you have to do then is source .gopath from the project root. While you can source the file only once, we'll include the directive for every run of make.

  1. To install the package run
$ . .gopath && make install
# or
$ GOPATH=/path/to/gopath make install

The above command also runs gofmt and govet before installing the package.

  1. To build the package run
$ . .gopath && make build
# or
$ GOPATH=/path/to/gopath make build
  1. You can also run gofmt and govet
$ . .gopath && make fmt
# or
$ GOPATH=/path/to/gopath make fmt
$ . .gopath && make vet
# or
$ GOPATH=/path/to/gopath make vet

Test

Not much here at this point. Run with

$ . .gopath && go test

Run

There are three separate programs bundled in the repo.

keywords The main keywords program reads keywords from a file and stores them into the database. Once installed, you can invoke the program with

$ $(GOPATH)/bin/keywords -f absolute/path/to/keywords/file -d user:password\@host:port/database

You need to create a keywords file first. For an example see the sample ./keywords.dat.sample.

When you're in doubt just run $ $(GOPATH)/bin/keywords --help.

server This is an HTTP server used to read keywords, store ads and update the keywords scraping data. Run the program with

$ $(GOPATH)/bin/server -d user:password\@host:port/database

Run $ $(GOPATH)/bin/server --help for more information.

adscraper The application that scrapes raw ads from google results. It performs a request to get least scraped keywords (random), queries google for results and then posts them back to the server. Run it with

$ $(GOPATH)/bin/adscraper -h https://server.hostname

Run $ $(GOPATH)/bin/adscraper --help for more information.

License

The license is MIT. Feel free to fork this and use it.

Due to the frequent changes in the Google search results page, some or all code in this package might not work.

Finally, please consult with Google search's terms of use before using this library.

Documentation

Index

Constants

View Source
const UA = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"

Variables

This section is empty.

Functions

func NewServer

func NewServer(store Store) *server

func NewURL

func NewURL(s string) string

Types

type Ad struct {
	ID        int64
	H1        string
	H2        string
	Path      string
	Desc      string
	Rest      sql.NullString
	Raw       sql.NullString
	Position  int
	CreatedAt string
	UpdatedAt string
}

func Scrape

func Scrape(url string) ([]*Ad, error)

func (*Ad) GetRaw

func (ad *Ad) GetRaw() string

func (*Ad) GetRest

func (ad *Ad) GetRest() string

func (*Ad) SetRaw

func (ad *Ad) SetRaw(s string)

func (*Ad) SetRest

func (ad *Ad) SetRest(s string)

type AdKeyword

type AdKeyword struct {
	ID            int64
	AdId          int64
	KeywordId     int64
	Position      int
	PositionCount int
	CreatedAt     string
	UpdatedAt     string
}

type AdWriter

type AdWriter interface {
	Upsert(*Ad, *keywords.Keyword) error
}

func NewWriter

func NewWriter(s Store) AdWriter

type Client

type Client struct {
	*http.Client
	// contains filtered or unexported fields
}

func NewClient

func NewClient(host string) *Client

func (*Client) GetKeywords

func (c *Client) GetKeywords() ([]*keywords.Keyword, error)

func (*Client) PatchKeyword

func (c *Client) PatchKeyword(id int64) error

func (*Client) PostAdKeywords

func (c *Client) PostAdKeywords(ad *Ad, k *keywords.Keyword) error

type Store

type Store interface {
	Close() error
	QueryRow(string, ...interface{}) *sql.Row
	Query(string, ...interface{}) (*sql.Rows, error)
	Begin() (*sql.Tx, error)
}

func NewStore

func NewStore(url string) (Store, error)

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL