kijiji-scrape

command module

v0.0.0-...-ae2c004 Latest Latest Go to latest Published: Aug 27, 2015 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bentranter/kijiji-scrape

Links

Open Source Insights

README ¶

Kijiji Scrape

Get an email within five minutes of when the Kijiji item you want gets posted. (Not really though)

Usage

Make sure you have Go installed on your system, and that your system is capable of handling the wildcard * operator at the filesystem level. After that, install dependencies:

$ go get github.com/yhat/scrape
$ go get golang.org/x/net/html
$ go get gopkg.in/redis.v3

Once you've got those, you can just $ go run. Open your browser to port 3000 one it starts.

You'll also need Redis installed on your system. This repo uses the default Redis config, so if you've ran $ brew install redis or something similar, you can just $ redis-server and be on your way.

License

Apache v2.0. See the license file for more info.

Architecture

The idea is pretty simple:

Receive a request from the client.
Scrape a site (so far Kijiji is the only site implemented).
Look for one or more keywords.
Respond to the client immediately with the results (so that they feel like this web app is actually working).
Save the query in Redis and get it to puke the ID back to you.
Start a goroutine that looks for updated keywords every five/ten minutes, and give it that ID.
Once a new result is found, email the user, and stop the routine. Ask them if they want to continue looking or stop. The "continue looking" link is just whatever.com/redisID?token=someCryptographicallyStrongToken&resume={boolean}. From there,
1. If the user wants to keep looking, just get the Redis ID from the URL, verify it's not forged using the token, get the query info from Redis, and start a new GoRoutine.
2. If they don't want to keep looking, delete the query from Redis.

Why use Redis? Because I didn't think this through and I'm sure the server will crash (since I'm just going to be reckless and put this RAM intensive piece of trash on a $5 DO droplet with no RAM lol), so at least if it fails I can restart without losing all the queries.

Documentation ¶

Overview ¶

Package main implements a web scraper. It's basically a service that monitors sites like Kijiji or Etsy, and notifies you when a new ad matching the stuff you want gets posted. Here's my implmentation idea:

2 pages: GET homepage and POST homepage
Add URL, email, keyword in field
start goroutine that looks until it finds match
email link once match is found, stop goroutine
in email, ask 'look again'? If yes, restart that goroutine, if no, end.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL