Directories ¶
Path | Synopsis |
---|---|
Package commoncrawl - package to parse commoncrawl wat files and save links and pages to files, sorted and split for further processing
|
Package commoncrawl - package to parse commoncrawl wat files and save links and pages to files, sorted and split for further processing |
Package config - configuration for the crawler, including ignored file extensions, domains, TLDs and query strings
|
Package config - configuration for the crawler, including ignored file extensions, domains, TLDs and query strings |
Package fileutils provides utility functions for working with files and directories
|
Package fileutils provides utility functions for working with files and directories |
Click to show internal directories.
Click to hide internal directories.