spider

package module
v0.0.0-...-4127dee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 10, 2016 License: MIT Imports: 16 Imported by: 0

README

DHT网络爬虫

Build Status

高性能DHT网络爬虫,5美金的单核,768MB的VPS上,每秒处理UDP请求超过12K,内存占用不超过100MB,每天抓取数千万去重infohash

主要特点

  • 内存复用
  • map去重
  • id 均匀分散
  • 动态调整find_node速率
  • 限速

示例

参考example

安装

go get github.com/btlike/spider

参考项目

流量图

附一张流量图

常见问题

终于运行起了爬虫,但运行没几分钟,各种linux问题出现了,最开始应该是ulimit问题,这个问题很好解决,参考这个文章。然后会出现开始大量报出:nf_conntrack: table full, dropping packet。这个问题参考这个文章。原因就是,

nf_conntrack/ip_conntrack 跟 nat 有关,用来跟踪连接条目,它会使用一个哈希表来记录 established 的记录。nf_conntrack 在 2.6.15 被引入,而 ip_conntrack 在 2.6.22 被移除,如果该哈希表满了,就会出现:nf_conntrack: table full, dropping packet。

解决办法很简单,我们让某些端口的流量不要被记录即可。假如我们运行100个节点,而节点监听的端口是20000到20099,我们只需要执行以下命令即可。

iptables -A INPUT -m state --state UNTRACKED -j ACCEPT
iptables -t raw -A PREROUTING -p udp -m udp --dport 20000 -j NOTRACK
...... //从端口20000一直到20099,每个端口一行
iptables -t raw -A PREROUTING -p udp -m udp --dport 20099 -j NOTRACK

Documentation

Index

Constants

View Source
const (
	TableSize    = 1024
	HasFoundSize = 100000
)

define const

Variables

View Source
var BOOTSTRAP = []string{
	"router.bittorrent.com:6881",
	"router.utorrent.com:6881",
	"dht.transmissionbt.com:6881",
}

BOOTSTRAP define

View Source
var RateLimit int64 = 100

RateLimit limit speed

Functions

func ConvertByteStream

func ConvertByteStream(nodes []*KNode) []byte

ConvertByteStream convert node to bytes

func Monitor

func Monitor()

Monitor the network

func Neightor

func Neightor(id, tableID string) string

Neightor get neighbor

Types

type AnnounceData

type AnnounceData struct {
	Infohash       string
	IP             net.IP
	Port           int
	ImpliedPort    int
	IsAnnouncePeer bool
}

AnnounceData define data to storage

type DhtNode

type DhtNode struct {
	// contains filtered or unexported fields
}

DhtNode define

func NewDhtNode

func NewDhtNode(id *ID, outHashIDChan chan AnnounceData, address string) *DhtNode

NewDhtNode create node

func (*DhtNode) FindNode

func (dhtNode *DhtNode) FindNode(v map[string]interface{}, args map[string]string, node *KNode)

FindNode find node

func (*DhtNode) NodeFinder

func (dhtNode *DhtNode) NodeFinder()

NodeFinder node finder

func (*DhtNode) Run

func (dht *DhtNode) Run()

Run spider

type ID

type ID []byte

ID define

func GenerateID

func GenerateID() ID

GenerateID get id

func GenerateIDList

func GenerateIDList(count int64) (ids []ID)

GenerateIDList for uniform

func (ID) Int

func (id ID) Int() *big.Int

Int get int

func (ID) Neighbor

func (id ID) Neighbor(tableID ID) ID

Neighbor get neighbor

func (ID) String

func (id ID) String() string

type KNode

type KNode struct {
	ID   ID
	IP   net.IP
	Port int
}

KNode define

func ParseBytesStream

func ParseBytesStream(data []byte) []*KNode

ParseBytesStream parse bytes to node

type KRPC

type KRPC struct {
	Dht   *DhtNode
	Types map[string]action
	// contains filtered or unexported fields
}

KRPC define

func NewKRPC

func NewKRPC(dhtNode *DhtNode) *KRPC

NewKRPC create krpc

func (*KRPC) Decode

func (krpc *KRPC) Decode(data []byte, val map[string]interface{}, raddr *net.UDPAddr) error

Decode message

func (*KRPC) EncodingNodeResult

func (krpc *KRPC) EncodingNodeResult(tid string, token string, nodes []byte) ([]byte, error)

EncodingNodeResult message

func (*KRPC) EncodingNormalResult

func (krpc *KRPC) EncodingNormalResult(tid string, id string) ([]byte, error)

EncodingNormalResult ping

func (*KRPC) GenTID

func (krpc *KRPC) GenTID() uint32

GenTID get id

func (*KRPC) Query

func (krpc *KRPC) Query(msg *KRPCMessage)

Query message

func (*KRPC) Response

func (krpc *KRPC) Response(msg *KRPCMessage)

Response message

type KRPCMessage

type KRPCMessage struct {
	T      string
	Y      string
	Addion interface{}
	Addr   *net.UDPAddr
}

KRPCMessage define

type KTable

type KTable struct {
	Nodes []*KNode

	//用于响应find_node请求
	Snodes []*KNode
	// contains filtered or unexported fields
}

KTable define table

func (*KTable) Pop

func (table *KTable) Pop() *KNode

Pop node

func (*KTable) Put

func (table *KTable) Put(node *KNode)

Put node to table

type Network

type Network struct {
	Dht       *DhtNode
	Conn      *net.UDPConn
	RateLimit *ratelimit.Bucket
}

Network define

func NewNetwork

func NewNetwork(dhtNode *DhtNode, address string) *Network

NewNetwork create network

func (*Network) Init

func (nw *Network) Init(address string)

Init it

func (*Network) Listening

func (nw *Network) Listening()

Listening on

func (*Network) Send

func (nw *Network) Send(m []byte, addr *net.UDPAddr) error

Send data

type Query

type Query struct {
	Y string
	A map[string]interface{}
}

Query define

type Response

type Response struct {
	R map[string]interface{}
}

Response define

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL