Documentation ¶
Index ¶
- func SyncMapDecode(m *sync.Map, file *os.File) error
- func SyncMapEncode(m *sync.Map, file *os.File) error
- type Frontier
- func (f *Frontier) DecrHost(host string)
- func (f *Frontier) DecrHostActive(host string)
- func (f *Frontier) GetActiveHostCount(host string) (value int)
- func (f *Frontier) GetHostCount(host string) (value int)
- func (f *Frontier) GetHostsCount() (value int64)
- func (f *Frontier) IncrHost(host string)
- func (f *Frontier) IncrHostActive(host string)
- func (f *Frontier) Init(jobPath string, loggingChan chan *FrontierLogMessage, workers int, ...) (err error)
- func (f *Frontier) IsHostInPool(host string) bool
- func (f *Frontier) Load()
- func (f *Frontier) Save()
- func (f *Frontier) Start()
- type FrontierLogMessage
- type Item
- type Pair
- type PoolItem
- type Seencheck
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Frontier ¶
type Frontier struct { Paused *utils.TAtomBool FinishingQueueWriter *utils.TAtomBool FinishingQueueReader *utils.TAtomBool IsQueueWriterActive *utils.TAtomBool IsQueueReaderActive *utils.TAtomBool JobPath string // PullChan and PushChan are respectively the channels used for workers // to get new URLs to archive, and the channel to push the discovered URLs // to the frontier PullChan chan *Item PushChan chan *Item // Queue is a local queue storing all the URLs to crawl // it's a prefixed queue, basically one sub-queue per host Queue *goque.PrefixQueue // QueueCount store the number of URLs currently queued QueueCount *ratecounter.Counter // HostPool is an struct that contains a map and a Mutex. // the map contains all the different hosts that Zeno crawled, // with a counter for each, going through that map gives us // the prefix to query from the queue HostPool *sync.Map UseSeencheck bool Seencheck *Seencheck LoggingChan chan *FrontierLogMessage }
Frontier holds all the data for a frontier
func (*Frontier) DecrHostActive ¶
func (*Frontier) GetActiveHostCount ¶
func (*Frontier) GetHostCount ¶
GetCount return the counter of the key
func (*Frontier) GetHostsCount ¶
func (*Frontier) IncrHostActive ¶
func (*Frontier) Init ¶
func (f *Frontier) Init(jobPath string, loggingChan chan *FrontierLogMessage, workers int, useSeencheck bool) (err error)
Init ininitialize the components of a frontier
func (*Frontier) IsHostInPool ¶
IsHostInPool return true if the Host is in the pool
func (*Frontier) Load ¶
func (f *Frontier) Load()
Load take the path to the frontier's hosts pool and status dump it decodes that file and load it in the job's frontier
type FrontierLogMessage ¶
type Item ¶
type Item struct { ID string Hash uint64 Hop uint8 Host string Type string Redirect int URL *url.URL ParentItem *Item LocallyCrawled uint64 }
Item is crawl-able object
func IsSeedList ¶
IsSeedList validates if the path is a seed list, and return an array of frontier.Item made of the seeds if it can
Click to show internal directories.
Click to hide internal directories.