crawler

package
v0.0.0-...-3cfcab7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2018 License: MIT Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Direction

type Direction struct {
	//Name is the name of the route - typically the names of the first
	// and last stops separated by a dash e.g "Ж.к.Западен парк - Метростанция Витоша".
	// But there are really strange examples including middle stops.
	Name string `json:"name"`
	//Unique ID for the direction of the line disregarding operation.
	// The ID is taken from schedules.sofiatraffic.bg.
	ID string `json:"id"`
}

Direction represent a line direction. Typically a line has 2 directions, but there could be more e.g 3 - 8.

type Line

type Line struct {
	//Name is the name of the line - e.g "85", "44-Б", "7-А", etc.
	Name string `json:"name"`

	//Transportation is denoting the type of Transportation of a line e.g Tram.
	Transportation `json:"transportation_type"`

	//OperationIDMap is mapping between Operation and OperationID.
	// This is needed because each line has different number of Operation modes.
	OperationIDMap `json:"operation_id_map"`

	//OperationIDRoutesMap is entry point to rest of the data for a given line
	// namely a list of all of its routes.
	OperationIDRoutesMap `json:"operation_routes_map"`
}

Line contains all the useful (and not so useful) information about a public transportation line

func (*Line) String

func (l *Line) String() string

type Operation

type Operation int

Operation is used to denote different Line Operation modes [Normal, Pre Holiday, Holiday].

const (
	//Normal operation mode denotes regular everyday operation of the line,normally Weekdays.
	// But it also can be all week.
	Normal Operation = iota

	//PreHoliday operation mode denotes a period (usually a day) before big holiday or usually Saturday if
	// there is a different times of operation for Holiday.
	PreHoliday

	//Holiday operation mode denotes holiday mode of operation and times.
	// Usually Sunday, but can be any official holiday, also used when there is no difference in
	// time of operation with PreHoliday.
	Holiday
)

func (Operation) String

func (o Operation) String() string

type OperationID

type OperationID string

OperationID is unique ID taken from schedules.sofiatraffic.bg, which is a combination of line and its Operation mode. One line can have multiple OperationIDs so we use OperationIDMap.

type OperationIDMap

type OperationIDMap map[Operation]OperationID

OperationIDMap is mapping between Operation and OperationID. This is needed because each line has different number of Operation modes raging from 0 modes to 3 different modes and each one has unique OperationID.

func (*OperationIDMap) String

func (o *OperationIDMap) String() string

type OperationIDRoutesMap

type OperationIDRoutesMap map[OperationID]Routes

OperationIDRoutesMap maps Line OperationID to list of Line Routes

func (*OperationIDRoutesMap) String

func (o *OperationIDRoutesMap) String() string

type Route

type Route struct {
	Direction `json:"direction"`
	Stops     `json:"stops"`
}

Route is a concept which is composed of a direction and list of stops. Name of the route is Name of the direction.

func (Route) String

func (r Route) String() string

type Routes

type Routes []Route

Routes is simple list of routes. It was created as alias to simplify printing and usage instead of slice.

func (Routes) String

func (r Routes) String() string

type ScheduleID

type ScheduleID string

ScheduleID is string in this format : {OperationID}/{DirectionID}/(without leading zeros){Stop.Sign}. It can be used to query information about the stop schedule from an internal server.

type ScheduleTimes

type ScheduleTimes []string

ScheduleTimes is a list of times in x:XX time string format e.g [5:13 6:49 10:23 23:01].

type SofiaTrafficCrawler

type SofiaTrafficCrawler struct {

	//List of active lines that were found during crawling
	Lines []Line

	//Map with keys unique string of type {OperationID}/{DirectionID}/{StopSign}
	Schedules map[ScheduleID]ScheduleTimes

	//List of active stops found on virtual tables site during crawling
	VirtualTableStops []VirtualTableStop

	//Map between those found stops and a string of comma separated times of arrival of the next vehicle
	VirtualTableStopsTimes map[VirtualTableStop]string
	// contains filtered or unexported fields
}

SofiaTrafficCrawler struct keep all useful data that is extracted during different crawls.

func NewSofiaTrafficCrawler

func NewSofiaTrafficCrawler(redisPool *redis.Pool) *SofiaTrafficCrawler

NewSofiaTrafficCrawler creates an initialized NewSofiaTrafficCrawler struct that all crawler functions use. It takes already created pool of redis connections that it uses for persistence The data for is accessible trough the structure of the crawler.

func (*SofiaTrafficCrawler) CrawlLines

func (s *SofiaTrafficCrawler) CrawlLines()

CrawlLines starts a new crawl from schedules.sofiatraffic.bg as seed link and search for all links that match all transportation groups of links. Then for each found link, it parses the useful information and puts it into Lines variable on the SofiaTrafficCrawler struct. In the end it saves that information in redis

func (*SofiaTrafficCrawler) CrawlSchedules

func (s *SofiaTrafficCrawler) CrawlSchedules(forNumberOfLines int)

CrawlSchedules starts a new crawl by first building all the needed links from Lines. If it is an empty list - it loads it if it can from redis. The pages it crawls are from direct link from which gives only the schedules for one stop id. When crawling it saves the information corresponding to ScheduleID - which is list of time of day (24 hours) to a map which in the end saves to redis. It takes an int as a forNumberOfLines parameter which says how many of the found lines you want to crawl. If forNumberOfLines is 0, it crawls all the lines for schedule information

func (*SofiaTrafficCrawler) CrawlVirtualTablesStopsForTimes

func (s *SofiaTrafficCrawler) CrawlVirtualTablesStopsForTimes(forNumberOfStops int)

CrawlVirtualTablesStopsForTimes stats a new crawl by using VirtualTableStops If it is an empty list - it loads it if it can from redis. It uses simpler and faster crawler which visits only 1 type of page process it's HTML by looking for specific ordering and extract comma separated times string It takes as int parameter forNumberOfStops which says how many of the already loaded virtual stops to crawl. If the parameter is 0, then it crawls all the available stops

type Stop

type Stop struct {
	//Regular name of a of a stop e.g `Метростанция "Витоша"`.
	// Extracted from m.sofiatraffic.bg/schedules/.
	Name string `json:"name"`

	//Capital name of the same stop e.g  `МЕТРОСТАНЦИЯ ВИТОША`.
	// Extracted from schedules.sofiatraffic.bg.
	CapitalName string `json:"capital_name"`

	//Sign represents an ID which is marked on each actual stop sign in the real life.
	// It is second most common way to refer to a stop after its name, e.g 0910.
	// This is bridging matcher between  schedules.sofiatraffic.bg and m.sofiatraffic.bg/schedules/.
	Sign string `json:"sign"`

	//ID is a unique ID for a stop.
	// Extracted from schedules.sofiatraffic.bg.
	ID string `json:"id"`

	//VirtualTableStop is a mapped (if that mapping exist) virtual tables entry.
	// It can be used to query the given stop for real time data such as comma separated times.
	VirtualTableStop `json:"vt_stop"`
}

Stop represents the main data for a traffic stop with two names one capital and one normal extracted from two sources, a sign, ID and a entry for the same stop as a virtual tables stop if one exists.

func (*Stop) String

func (s *Stop) String() string

type Stops

type Stops []Stop

Stops is a simple list of stops. It was created as alias to simplify printing and usage instead of slice.

func (Stops) String

func (s Stops) String() string

type Transportation

type Transportation int

Transportation represents all possible types of transportation that are supported.

const (
	//Tram is representing all tramway lines.
	Tram Transportation = iota

	//Bus is representing all urban bus lines and all suburban bus lines.
	Bus

	//Trolleybus is representing all trolleybus transportation.
	Trolley
)

Note that the order is not random and Tram should have Transportation = 0, Bus = 1 and Trolley = 2. Because those integers are used by Virtual Tables site as ids for queries. Also Subway is not supported for this version.

func (Transportation) String

func (t Transportation) String() string

type VirtualTableStop

type VirtualTableStop struct {
	//Unique ID for a stop on m.sofiatraffic.bg logic.
	StopID string `json:"stop"`

	//Line ID for the given stop on m.sofiatraffic.bg logic.
	LineID string `json:"lid"`

	//Route ID for the given line on m.sofiatraffic.bg logic.
	RouteID string `json:"rid"`

	//TransportationType matches Line.Transportation but is kept here for easy string access.
	TransportationType string `json:"vt"`
}

VirtualTableStop is used to make a request for real-time-sh times for a given stop. It is saved in Stop.VirtualTableStop, and there is a list of those only for query purposes.

func (VirtualTableStop) String

func (v VirtualTableStop) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL