prox

package module
v0.0.0-...-927f575 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2020 License: MIT Imports: 15 Imported by: 0

README

Prox

DEPRECATED

Turns out the library has some significant memory issues and you should likely not use it in your project. A newer, faster and more reliable version of this library is currently being developed, but it is closed source.

Prox is a simple Go package for locating open proxy servers. It works by congregating proxies from several different providers on the internet and allows access to them using a simple API. It is the successor to my previous package, proxyfinder.

GoDoc Go Report Card

Setup

Assuming you have a proper go install, you can just run

$ go get -u github.com/ollybritton/prox

This will install the package, as well as the prox command line tool. The MaxMind GeoIP database is embedded into the binary, which means no additional setup is required. Although this does increase the size (20M for the binary!), I would argue that this is a neccessary evil as it reduces setup and besides, the database would need to stored somewhere anyway.

Command-Line Tool

There are a few more useful features of the command line tool:

$ prox status # Check status of providers
$ prox find # Print proxies to the terminal

For help about a specific command, just do

$ prox help <command name>

Usage

The library provides a high level interface (Pools) and a lower level interface Providers to the proxy providers. For most uses, the higher-level Pool implementation is better.

Key Terms

A Provider is just a source of proxies. It can be a website or a static list stored somewhere on the machine. The following providers are available.

Name Source
FreeProxyLists http://freeproxylists.com/
ProxyScrape https://proxyscrape.com/`
Static Stored in providers/data/proxies, accessed using go-bindata
High Level (Pools)

Pools are simply a collection of providers combined together that can keep track of proxies that have been used and those that haven't. There are two types of pools, SimplePools and ComplexPools.

Simple Pool

To create a new SimplePool, use the NewSimplePool function supplied with a name or a list of names of providers.

pool := prox.NewSimplePool(prox.FreeProxyLists)

You then need to load the pool:

pool := prox.NewSimplePool(prox.FreeProxyLists)
if err := pool.Load(); err != nil {
    panic(err)
}

By default, loading the proxies will take a maximum of about 15 seconds. Most of the time, it is much faster than this. The following methods are then available:

proxy, err := pool.New() // Fetch a new, unused proxy. Will error if there are no unused proxies left.
proxy, err := pool.Random() // Fetch a random proxy, used or unused. It will still be marked as used so you won't be able to access this proxy with pool.New()

pool.SetTimeout(10 * time.Second) // Set the maximum timeout of the proxy list.

pool.SizeAll() // Get the amount of proxies in the pool.
pool.SizeUnused() // Get the amount of unused proxies in the pool.

pool.Filter(
    prox.FilterAllowCountries([]string{"GB", "US"}) // Only allow the specified countries in the pool
    prox.FilterDisallowCountries([]string{"GB", "US"}) // Allow anything but the specified countries.
    prox.FilterProxyConnection() // Only allow proxies that can be connected to. If they take longer than 10 seconds to connect to, they are PRESUMED TO BE WORKING.
    prox.FilterProxySpeed(5 * time.Second) // Only allow proxies that can be connected to in the given timeframe. Presumed to not be working if it takes longer than the timeout.
    prox.FilterProxyTypes("HTTP", "SOCKS4", "SOCKS5") // Only allow proxies of those types in the pool.
)

Note that a filter only applies to the proxies that are currently loaded. If you call .Load() again, proxies which don't fit the filters given are still allowed into the pool.

The proxies themselves (the ones returned after a call to .New() or .Random()) have to following methods:

proxy, err := pool.New()
if err != nil {
    panic(err)
}

canConnect := proxy.CheckConnection() // Checks a proxy can be connected to. Again, it is PRESUMED TO BE WORKING if it cannot connect in 10 seconds. This isn't ideal.
canConnectSpeed := proxy.CheckSpeed(5 * time.Second) // Checks a proxy can be connected to in a given timeframe. 
httpClient := proxy.Client() // Gets the proxy as a *http.Client.
proxy.PrettyPrint() // Prints a proxy's info.
Complex Pool

ComplexPools are like SimplePools, but contain more options for things such as automatically refreshing the pool if it is empty and having fallback providers for if the primary ones do not work.

A ComplexPool can be created with the NewComplexPool function or NewPool for short.

pool := prox.NewPool(
    prox.UseProvider(prox.FreeProxyLists), // Use a provider, or...
    prox.UseProviders(prox.FreeProxyLists, prox.ProxyScrape) // a list of providers

    prox.UseFallbackProviders(prox.Static), // Provider to "fall back" on if the primary providers do not work or return an error.
    prox.OptionFallbackToBackupProviders(true), // Toggle this option. By default it is true.

    prox.OptionFallbackToCached(true), // Keep a backup of the previously loaded proxies. If the providers can't be accessed, use the cached list of proxies instead. Defaults to false.

    prox.OptionReloadWhenEmpty(true), // If there are no proxies left in the pool when .New() or .Random() are called, load the proxies again. Defaults to false.

    prox.OptionAddFilters(
        // filters are identical to the ones used in SimplePool
        // These filters will be called everytime the pool is loaded, unlike SimplePool
    )
)

The following methods are then available for the ComplexPool

err := pool.Load() // Load the proxies.


proxy, err := pool.New() // Fetch a new, unused proxy. Will error if there are no unused 
proxy, err := pool.Random() // Fetch a random proxy, used or unused. It will still be marked as used so you won't be able to access this proxy with pool.New()

pool.Option([options go here]) // Set another option on the pool
pool.Filter([filter name]) // Apply a filter to the proxies in the pool. These are not permanent.

pool.SetTimeout(10 * time.Second) // Set a timeout for fetching the proxies.

pool.SizeAll() // Size of all proxies.
pool.SizeUnused() // Size of unused proxies.

err := pool.ApplyCache() // Use the previously available cache. It will error if there is not a cache available.
Low Level (Providers & Sets)

A lower level interface to the proxy providers is also available, available through the providers/ package. In reality, the Pool implementation wraps the providers package to provide the additional functionality.

The providers.Proxy type

The providers.Proxy type is basically identical to the prox.Proxy type. The only reason the prox.Proxy type exists is to provide additional functionality to the lower level implementation (like the .Client() method) and prevent circular imports.

It has the following fields:

type Proxy struct {
    URL      *url.URL `json:"url"`
    Provider string   `json:"providers"`
    Country  string   `json:"country"`

    Used bool
}
The providers.Set type

The providers.Set type is a concurrency-safe set implementation which can store proxies. It means that proxies can be stored asynchronously and not cause race conditions.

set := providers.NewSet()

set.Add(p providers.Proxy) // Add a proxy to the set
set.All() // Get all the proxies in the set as a slice
set.In(p providers.Proxy) // Check if a proxy is the set
set.Length() // Get the length of the set
set.Remove(p providers.Proxy) // Remove a proxy from the set
The providers.Provider type

This is the definition of a provider. It has the following type signature:

type Provider func(*providers.Set, timeout time.Duration) ([]providers.Proxy, error)

Simply put, it is a function which takes a set and a timeout and returns a list of proxies and an error if one occurs.

For example, the implementation of FreeProxyLists is:

func FreeProxyLists(proxies *providers.Set, timeout time.Duration) ([]providers.Proxy, error) {
    // do stuff like proxies.Add()
    return proxies.All(), nil
}

The reason it is implemented this way is so that the list of proxies can be accessed as the function runs.

set := providers.NewSet()
go FreeProxyLists(set, 10 * time.Seconds)

for {
    println(set.Length())
    time.Sleep(1 * time.Second)
}

// Output:
// 0
// 0
// 0
// 231
// 873
// 2563
// ...

Bugs

  • HTTPS proxies

    Creating HTTPS clients doesn't work, always either

    • net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), or
    • proxyconnect tcp: tls: first record does not look like a TLS handshake

    The consequence of this is that all HTTPS proxies are marked as unaccessible when filtering. To remove all HTTPS proxies so that you don't run into this problem

    pool.Filter(prox.FilterProxyTypes("HTTP", "SOCKS4", "SOCKS5"))
    

This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var FreeProxyLists = Provider{"FreeProxyLists", providers.FreeProxyLists}

FreeProxyLists defines the 'FreeProxyLists' provider.

View Source
var GetProxyList = Provider{"GetProxyList", providers.GetProxyList}

GetProxyList defines the 'GetProxyList' provider.

View Source
var Providers = map[string]Provider{
	"FreeProxyLists": FreeProxyLists,
	"ProxyScrape":    ProxyScrape,
	"GetProxyList":   GetProxyList,
	"Static":         Static,
}

Providers is a global variable which allows translation between the names of providers and the provider functions themselves.

View Source
var ProxyScrape = Provider{"ProxyScrape", providers.ProxyScrape}

ProxyScrape defines the 'ProxyScrape' provider.

View Source
var Static = Provider{"Static", providers.Static}

Static defines the 'Static' provider.

Functions

func ApplyFilters

func ApplyFilters(proxies []providers.Proxy, filters []Filter) []providers.Proxy

ApplyFilters will apply filters to a list of proxies, and will return a new proxy list.

func FreezeProvider

func FreezeProvider(providerName string, timeout time.Duration) providers.Provider

FreezeProvider will gather proxies from the provider given one last time and then use those instead of new ones.

func InitLog

func InitLog(l *logrus.Logger)

InitLog initialises the logger with options specified.

Types

type ComplexPool

type ComplexPool struct {
	Config struct {
		FallbackToBackupProviders bool
		FallbackToCached          bool

		ReloadWhenEmpty bool
	}

	All    *providers.Set
	Unused *providers.Set

	CacheAvailable bool
	CacheAll       *providers.Set
	CacheUnused    *providers.Set
	// contains filtered or unexported fields
}

ComplexPool is an implementation of a pool with lots of extra settings, including filtering and ensuring the pool always has proxies to provide.

func NewComplexPool

func NewComplexPool(opts ...Option) *ComplexPool

NewComplexPool creates a new complex pool from the options given and using defaults if options aren't provided.

func NewPool

func NewPool(opts ...Option) *ComplexPool

NewPool creates a new complex pool from the options given and using defaults if options aren't provided. It's an alias for NewComplexPool.

func (*ComplexPool) ApplyCache

func (pool *ComplexPool) ApplyCache() error

ApplyCache will revert the pool to the previous cache.

func (*ComplexPool) Fetch

func (pool *ComplexPool) Fetch() error

Fetch fetches the proxies from it's internal providers and stores them.

func (*ComplexPool) FetchFallback

func (pool *ComplexPool) FetchFallback() error

FetchFallback fetches the proxies from it's fallback providers and stores them.

func (*ComplexPool) Filter

func (pool *ComplexPool) Filter(filters ...Filter)

Filter applies the filter to the proxies inside the pool.

func (*ComplexPool) Load

func (pool *ComplexPool) Load() error

Load will fetch the proxies like a call to Fetch(), but, depending on options, it will fallback to a proxy cache or use the fallback providers.

func (*ComplexPool) New

func (pool *ComplexPool) New() (Proxy, error)

New fetches a new, unused proxy. Depending on options, it will attempt to reload the proxy pool if there are no proxies left inside the pool.

func (*ComplexPool) NewFromCountries

func (pool *ComplexPool) NewFromCountries(countries []string) (Proxy, error)

NewFromCountries gets a new, unused proxy whose location is one of the countries specified.. Depending on options, it will attempt to reload the proxy pool if there are no proxies left inside the pool.

func (*ComplexPool) Option

func (pool *ComplexPool) Option(opts ...Option) (err error)

Option sets the pool options specified.

func (*ComplexPool) Random

func (pool *ComplexPool) Random() (Proxy, error)

Random fetches a random proxy. It doesn't care if the proxy has been used already. It still marks a proxy as used.

func (*ComplexPool) SetTimeout

func (pool *ComplexPool) SetTimeout(timeout time.Duration)

SetTimeout sets a timeout for the provider. By default, it is set to 15s by NewSimplePool.

func (*ComplexPool) SizeAll

func (pool *ComplexPool) SizeAll() int

SizeAll finds the amount of proxies that are currently loaded, used or unused.

func (*ComplexPool) SizeUnused

func (pool *ComplexPool) SizeUnused() int

SizeUnused finds the amount of proxies that are currently unused.

type Filter

type Filter func(p *Proxy) bool

Filter is a function that will either allow or not allow a proxy. A filter returns true if a proxy "succeeds", and false if it is not allowed.

func FilterAllowCountries

func FilterAllowCountries(countries []string) Filter

FilterAllowCountries creates a filter that only allows the countries specified. The countries provided must be in the ISO Alpha-2 format (GB, US, etc...)

func FilterDisallowCountries

func FilterDisallowCountries(countries []string) Filter

FilterDisallowCountries creates a filter that does not let countries from the list to be present.

func FilterProxyConnection

func FilterProxyConnection() Filter

FilterProxyConnection creates a filter that will only disallow a proxy if it is not working. A timeout of 10 seconds is applied, but if a proxy does timeout it is not marked as not working.

func FilterProxySpeed

func FilterProxySpeed(speed time.Duration) Filter

FilterProxySpeed creates a filter that only allows proxies if they can make a successful request in a given timeframe.

func FilterProxyTypes

func FilterProxyTypes(ptypes ...string) Filter

FilterProxyTypes creates a filter that only allows specific types of proxies, such as HTTP or SOCKS5.

type Option

type Option func(*ComplexPool) error

Option is an option that can be provided to configure a complex pool. See https://commandcenter.blogspot.com/2014/01/self-referential-functions-and-design.html for more info

func OptionAddFilter

func OptionAddFilter(filter Filter) Option

OptionAddFilter adds a single filter to the pool.

func OptionAddFilters

func OptionAddFilters(filters ...Filter) Option

OptionAddFilters adds a list of filters to the pool

func OptionFallbackToBackupProviders

func OptionFallbackToBackupProviders(setting bool) Option

OptionFallbackToBackupProviders sets the option to use the fallback providers if there is an error during loading.

func OptionFallbackToCached

func OptionFallbackToCached(setting bool) Option

OptionFallbackToCached sets the option to use cached proxies when there is an error during loading.

func OptionReloadWhenEmpty

func OptionReloadWhenEmpty(setting bool) Option

OptionReloadWhenEmpty sets the option to attempt to load new proxies into the pool if there are no proxies left in the pool on a call to .Random() or .New()

func UseFallbackProvider

func UseFallbackProvider(provider Provider) Option

UseFallbackProvider adds a provider that will only be used if the other providers fail. If the provider name is invalid, it will panic.

func UseFallbackProviders

func UseFallbackProviders(givenProviders ...Provider) Option

UseFallbackProviders adds providers that will only be used if the other providers do not work. If any of the provider names are invalid, it will panic.

func UseProvider

func UseProvider(provider Provider) Option

UseProvider will add a provider to pool. If the provider name is invalid, it will panic.

func UseProviders

func UseProviders(givenProviders ...Provider) Option

UseProviders will adds providers to the pool. If any of the provider names is invalid, it will panic.

type Pool

type Pool interface {
	Load() error
	Filter() error

	SizeAll() int
	SizeUnused() int

	New() (Proxy, error)
	Random() (Proxy, error)
}

Pool represents a collection/store of proxies.

type Provider

type Provider struct {
	Name             string
	InternalProvider providers.Provider
}

Provider is a wrapper around the providers.Provider type, giving information about the provider along with the actual provider function itself.

func GetProvider

func GetProvider(providerName string) Provider

GetProvider gets the provider by name.

func GetProviders

func GetProviders(providerNames ...string) []Provider

GetProviders gets multiple providers by name.

func MultiProvider

func MultiProvider(givenProviders ...Provider) Provider

MultiProvider creates a new hybrid-provider from a set of existing ones. It will fetch all the proxies from all providers asynchronously.

type Proxy

type Proxy struct {
	URL      *url.URL
	Provider string
	Country  string
	// contains filtered or unexported fields
}

Proxy holds information about a proxy. It is like providers.Proxy, but it contains more methods.

func CastProxy

func CastProxy(p providers.Proxy) *Proxy

CastProxy will convert a providers.Proxy type into prox.Proxy type.

func NewProxy

func NewProxy(rawip string, provider string, country string) (Proxy, error)

NewProxy will create a new proxy type.

func (*Proxy) AsHTTPClient

func (p *Proxy) AsHTTPClient() (*http.Client, error)

AsHTTPClient will return the proxy as a http.Client struct. It panics if the proxy's type is not HTTP.

func (*Proxy) AsHTTPSClient

func (p *Proxy) AsHTTPSClient() (*http.Client, error)

AsHTTPSClient will return the proxy as a http.Client struct. It panics if the proxy's type is not HTTPS.

func (*Proxy) AsSOCKS4Client

func (p *Proxy) AsSOCKS4Client() (*http.Client, error)

AsSOCKS4Client will return the proxy as a http.Client struct. It panics if the proxy's type is not SOCKS4.

func (*Proxy) AsSOCKS5Client

func (p *Proxy) AsSOCKS5Client() (*http.Client, error)

AsSOCKS5Client will return the proxy as a http.Client struct. It panics if the proxy's type is not SOCKS5.

func (*Proxy) CheckConnection

func (p *Proxy) CheckConnection() bool

CheckConnection checks that a connection to a proxy can be formed. It will still mark a proxy as successful even if it times out. If you want to filter proxies that timeout, use CheckSpeed(10 * time.Second), which is equivalent.

func (*Proxy) CheckSpeed

func (p *Proxy) CheckSpeed(timeout time.Duration) bool

CheckSpeed checks that a connection to proxy can be formed. It accepts a timeout, and will mark a proxy as unavailable if it doesn't respond within that time.

func (*Proxy) Client

func (p *Proxy) Client() (*http.Client, error)

Client gets the http.Client associated with the given proxy.

func (*Proxy) PrettyPrint

func (p *Proxy) PrettyPrint()

PrettyPrint prints some information in a nicely-formatted way.

type SimplePool

type SimplePool struct {
	All    *providers.Set
	Unused *providers.Set
	// contains filtered or unexported fields
}

SimplePool is an implementation of a pool without much added functionality. It is a simple wrapper for a provider.

func NewSimplePool

func NewSimplePool(givenProviders ...Provider) *SimplePool

NewSimplePool returns a new a new SimplePool struct.

func (*SimplePool) Filter

func (pool *SimplePool) Filter(filters ...Filter)

Filter applies the filter to the proxies inside the pool.

func (*SimplePool) Load

func (pool *SimplePool) Load() error

Load fetches the proxies from it's internal provider and stores them.

func (*SimplePool) New

func (pool *SimplePool) New() (Proxy, error)

New fetches a new, unused proxy. It returns an error if there are no unused proxies left.

func (*SimplePool) Random

func (pool *SimplePool) Random() (Proxy, error)

Random fetches a random proxy. It doesn't care if the proxy has been used already. It still marks a proxy as used.

func (*SimplePool) SetTimeout

func (pool *SimplePool) SetTimeout(timeout time.Duration)

SetTimeout sets a timeout for the provider. By default, it is set to 15s by NewSimplePool.

func (*SimplePool) SizeAll

func (pool *SimplePool) SizeAll() int

SizeAll finds the amount of proxies that are currently loaded, used or unused.

func (*SimplePool) SizeUnused

func (pool *SimplePool) SizeUnused() int

SizeUnused finds the amount of proxies that are currently unused.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL