Documentation ¶
Overview ¶
webhog is a package that stores and downloads a given URL (including js, css, and images) for offline use and uploads it to a given AWS-S3 account.
Index ¶
- Constants
- Variables
- func ArchiveFinalFiles(entDir string) (string, error)
- func Create(m Model) error
- func Cursor(m Model) *mgo.Collection
- func DeleteEntity(entity Entity, r render.Render)
- func Destroy(m Model, query interface{}) error
- func Entities(params martini.Params, r render.Render)
- func ExtractData(entity *Entity, url string)
- func Find(m Model, query interface{}) *mgo.Query
- func GetEntity(params martini.Params, r render.Render)
- func KeyRequired() martini.Handler
- func LoadConfig() error
- func LoadDB()
- func LoadRoutes()
- func NewEntityDir() (err error)
- func ParseHTML(n *html.Node, entity *Entity, done chan bool)
- func Register(m Model)
- func Scrape(url Url, r render.Render)
- func StoreHTML(html bytes.Buffer, entDir string) (err error)
- func StoreResource(resource, attr, entDir string) (name string, err error)
- func Update(m Model, query, updates interface{}) error
- func UploadEntity(dir string, entity *Entity) (string, error)
- type Entity
- type Model
- type Url
Constants ¶
const ( CompleteStatus = "complete" ParsingStatus = "parsing" UploadingStatus = "uploading" ErrorStatus = "error" )
Entity progression status's.
Variables ¶
var Config = new(configuration)
var Conn = new(connection)
Global var to hold the DB connection
var EntityDir string
Stored temporary directory for the entity files.
var ExpirationTime = time.Hour * 168
Set a URL's expiration time to 1 week before it needs to be reprocessed.
var Models = []Model{}
Hold a reference to all models.
Functions ¶
func ArchiveFinalFiles ¶
Create a tar.gz compressed dir and add in found files for upload.
func Cursor ¶
func Cursor(m Model) *mgo.Collection
func DeleteEntity ¶
func ExtractData ¶
Make a GET request to the given URL and start parsing its HTML.
func KeyRequired ¶
func LoadConfig ¶
func LoadConfig() error
func LoadRoutes ¶
func LoadRoutes()
func ParseHTML ¶
Parse the HTML - pull the href/src attributes for js, css, and images for download.
func StoreResource ¶
Stores the given js / css / img file into the given tempdir with a temp name.
Types ¶
type Entity ¶
type Entity struct { Id bson.ObjectId `bson:"_id,omitempty" json:"id"` UUID string `bson:"uuid" json:"uuid"` Url string `bson:"url" json:"url"` AwsLink string `bson:"aws_link,omitempty" json:"aws_link"` Status string `bson:"status" json:"status"` CreatedAt time.Time `bson:"created_at" json:"created_at"` }
Entity is a representation of a webpage and it's corresponding UUID that's stored on AWS-S3