config

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2015 License: BSD-3-Clause Imports: 5 Imported by: 2

Documentation

Overview

Package config takes care of the configuration file parsing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// CloneDir is the path to the folder where all repositories are cloned.
	CloneDir string `json:"clone_dir"`

	// TarRepos tells whether repositories shall be stored as tar archives.
	TarRepos bool `json:"tar_repositories"`

	// TmpDir can be used to specify a temporary working directory. If
	// left unspecified, the default system temporary directory will be used.
	// If you have a ramdisk, you are advised to use it here.
	TmpDir string `json:"tmp_dir"`

	// TmpDirFileSizeLimit can be used to specify the maximum size in GB of an
	// object to be temporarily placed in TmpDir for processing. Files of size
	// larger than this value will not be processed in TmpDir.
	TmpDirFileSizeLimit float64 `json:"tmp_dir_file_size_limit"`

	// MaxFetcherWorkers defines the maximum number of workers for the
	// repositories fetching task.
	// It defaults to 1 but if your machine has good I/O throughput and a good
	// CPU, you probably want to increase this conservative value for
	// performance reasons. Note that fetching is I/O and networked bound
	// more than CPU bound and hence you probably do not want to increase this
	// value too much.
	MaxFetcherWorkers uint `json:"max_fetcher_workers"`

	// FetchTimeInterval corresponds to the time to wait betweeb 2 full
	// repositories fetching periods.
	FetchTimeInterval string `json:"fetch_time_interval"`

	// FetchLanguages is the list of programming languages to fetch.
	// If the list is empty or nil, the fetcher will fetch all repositories,
	// independently of the language.
	FetchLanguages []string `json:"fetch_languages"`

	// ThrottlerWaitTime can be used to specify how much time to wait, in
	// seconds, before resuming normal operations if the error rate is too high
	// (defaults to 1800).
	ThrottlerWaitTime uint `json:"throttler_wait_time"`

	// SlidingWindowSize can be used to specify the sliding window size to
	// consider for error throttling (defaults to 60).
	SlidingWindowSize uint `json:"throttler_sliding_window_size"`

	// LeakInterval corresponds to the time, in milliseconds, the throttler
	// waits before discarding an error (defaults to 1000, ie 1 second).
	LeakInterval uint `json:"throttler_leak_interval"`

	// Crawlers is a group of crawlers configuration.
	Crawlers []CrawlerConfig `json:"crawlers"`

	// CrawlingTimeInterval corresponds to the time to wait between 2 full
	// crawling periods.
	CrawlingTimeInterval string `json:"crawling_time_interval"`

	// Database is the database configuration.
	Database DatabaseConfig `json:"database"`
}

Config is the main configuration structure.

func ReadConfig

func ReadConfig(path string) (*Config, error)

ReadConfig reads a JSON formatted configuration file, verifies the values of the configuration parameters and fills the Config structure.

type CrawlerConfig

type CrawlerConfig struct {
	// Type defines the crawler type (eg: "github").
	Type string `json:"type"`

	// Languages is the list of programming languages of interest.
	Languages []string `json:"languages"`

	// Limit limits the number of repositories to crawl. Set this value to 0 to
	// not use a limit. Otherwise, crawling will stop when "limit" repositories
	// have been fetched.
	// Note that the behavior is slightly different whether UseSearchAPI is set
	// to true or not. When using the search API, this limit correspond to the
	// number of repositories to crawl per language listed in "languages".
	// Otherwise, this is a global limit, regardless of the language.
	Limit int64 `json:"limit"`

	// SinceID corresponds to the repository ID (eg: GitHub repository ID in
	// the case of the github crawler) from which to start querying repositories.
	// Note that this value is ignored when using the search API.
	SinceID int `json:"since_id"`

	// Fork indicate whether "fork" repositories need to be crawled or not.
	Fork bool `json:"fork"`

	// OAuthAccessToken is the API token. If not provided, crawld will work but
	// the number of API call is usually limited to a low number.
	// For instance, in the case of the GitHub crawler, unauthenticated
	// requests are limited to 60 per hour where authenticated requests goes up
	// to 5000 per hour.
	OAuthAccessToken string `json:"oauth_access_token"`

	// UseSearchAPI specifies whether to use the search API or not. The number
	// of results returned by a search API is usually limited. For instance,
	// the GitHub search API limits the results to 1000 repositories.
	// In the case of the github crawler, this means that the maximum number of
	// repositories that can be crawled is 1000 per language (the github crawler
	// orders the results by repository popularity with regard to the number of
	// stars). When a lot of data is wanted, this option shall therefore be set
	// to false.
	UseSearchAPI bool `json:"use_search_api"`
}

CrawlerConfig is a configuration for a crawler.

type DatabaseConfig

type DatabaseConfig struct {
	// HostName is the hostname, or IP address, of the database server.
	HostName string `json:"hostname"`

	// Port is the PostgreSQL port.
	Port uint `json:"port"`

	// UserName is the PostgreSQL user that has access to the database.
	UserName string `json:"username"`

	// Password is the password of the database user.
	Password string `json:"password"`

	// DBName is the database name.
	DBName string `json:"dbname"`

	// SSLMode defines the SSL mode for the connection to the database.
	// Refer to sslModes for the possible values and their meaning.
	SSLMode string `json:"ssl_mode"`
}

DatabaseConfig is a configuration for PostgreSQL database connection information

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL