scraper

command module
v0.0.0-...-07315ce Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2016 License: BSD-2-Clause Imports: 7 Imported by: 0

README

intro

Before you collect data from any where make sure you are not breaking legal or ethical rules. it's important to be good a good virtual citizen and protect our internet planet.

if you are intending to gain from your data collection exploits i suppose you need to check with the owners and comply with relevant ownerships rights.

The code in this module is part of a community project to make available local data.The project is lead by me and the idea is to free local data through simple http GET services

The project will also test out new build and deployment ideas

the data

freely local data is hard to accesss in a modern format {jason and REST } so we decided do to freethe largest set of local data that we can find; game stats of the best footbal club in the world; the Eagles.

Logic

With GO things just work. it's the best langauge i have ever worked with. The logic of the code is straightforward (the way it should be).

  • use http.GET to get hold of html pages of interest (if you are scraping html).

  • use html tokeniser to get to the content

  • use regexp and strings libraries to format data into own format

  • Marshal to Json

License

Free to use.

known issues

none. but go ahead and break it but please let me know.

to do - time permiting

I wrote the code over a weekend and ran out of time to make it 'perfect'.

will come back to it and do the following [feel free to fork it and improve pls]

  • write the json output into a file
  • simplfy code and remove deep nested logic
  • improve error handling
  • improve documentation

Documentation

Overview

Copyright 2016 abdulrashid2@gmail.com. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL