import "github.com/xladykiller/gcse"
Package gcse is the core supporting library for go-code-search-engine (GCSE). Its exported types and functions are mainly for sub packages. If you want some of the function, copy the code away.
crawler crawling packages
indexer creating index data for web-server
server providing web services, including home/top/search services.
project Read Write ------- ---- ----- crawler fnCrawlerDB fnCrawlerDB
fnDocDB fnDocDB DBOutSegments
indexer DBOutSegments IndexSegments
server IndexSegments
config.go crawler.go crawlerdb.go data.go db.go index.go ranking.go segment.go text.go utils.go
const ( KindIndex = "index" IndexFn = KindIndex + ".gob" KindDocDB = "docdb" FnCrawlerDB = "crawler" KindPackage = "package" KindPerson = "person" KindToCheck = "tocheck" FnToCrawl = "tocrawl" FnPackage = "package" FnPerson = "person" // key: RawString, value: DocInfo FnDocs = "docs" FnNewDocs = "newdocs" )
const ( IndexTextField = "text" IndexNameField = "name" IndexPkgField = "pkg" )
const (
DOCS_PARTS = 128
)
var ( ServerAddr = ":8080" ServerRoot = villa.Path("./server/") LoadTemplatePass = "" AutoLoadTemplate = false DataRoot = villa.Path("./data/") CrawlerDBPath = DataRoot.Join(FnCrawlerDB) DocsDBPath = DataRoot.Join(FnDocs) // producer: server, consumer: crawler ImportPath villa.Path ImportSegments Segments // producer: crawler, consumer: indexer DBOutPath villa.Path DBOutSegments Segments // producer: indexer, consumer: server. // server never delete index segments, indexer clear updated segments. IndexPath villa.Path IndexSegments Segments // configures of crawler CrawlByGodocApi = true CrawlGithubUpdate = true CrawlerDuePerRun = 1 * time.Hour /* Increase this to ignore etag of last versions to crawl and parse all packages. ChangeLog: 0 First version 1 Add TestImports/XTestImports to Imports 2 Parse markdown readme to text before selecting synopsis from it 3 Add exported tokens to indexes 4 Move TestImports/XTestImports out of Imports, to TestImports 4 A bug of checking CrawlerVersion is fixed */ CrawlerVersion = 5 )
var ( ErrPackageNotModifed = errors.New("package not modified") ErrInvalidPackage = errors.New("invalid package") )
AppendPackages appends a list packages to imports folder for crawler backend to read
func DumpMemStats()
func FetchAllPackagesInGodoc(httpClient doc.HttpClient) ([]string, error)
FetchAllPackagesInGodoc fetches the list of all packages on godoc.org
func GenHttpClient(proxy string) doc.HttpClient
Returns a new instance of DocInfo as a sophie.Sophier
Returns a new instance of *NewDocAction as a Sophier
core project of a packaage
* CrawlerDB including all crawler entires database.
LoadCrawlerDB loads PackageDB and PersonDB and returns a new *CrawlerDB
AppendPackage appends a package. If the package did not exist in either PackageDB or Docs, shedulet it (immediately).
AppendPerson appends a person to the PersonDB, schedules to crawl immediately for a new person
SchedulePackage schedules a package to be crawled at a specific time.
SchedulePerson schedules a person to be crawled at a specific time.
Sync syncs both PackageDB and PersonDB. Returns error if any of the sync failed.
type CrawlingEntry struct { ScheduleTime time.Time // if gcse.CrawlerVersion is different from this value, etag is ignored Version int Etag string }
func (c *CrawlingEntry) WriteTo(w sophie.Writer) error
type DocDB interface { Sync() error Export(root villa.Path, kind string) error Get(key string, data interface{}) bool Put(key string, data interface{}) Delete(key string) Iterate(output func(key string, val interface{}) error) error }
type DocInfo struct { Name string Package string Author string LastUpdated time.Time StarCount int Synopsis string Description string ProjectURL string ReadmeFn string ReadmeData string Imports []string TestImports []string Exported []string // exported tokens(funcs/types) }
DocInfo is the information stored in backend docDB
type HitInfo struct { DocInfo Imported []string TestImported []string ImportantSentences []string AssignedStarCount float64 StaticScore float64 TestStaticScore float64 StaticRank int // zero-based }
HitInfo is the information provided to frontend
Count returns the number of entries in the DB
Export saves the data to some space, but not affecting the modified property.
Get fetches an entry of specified key. data is a pointer. Return false if not exists
* If Action equals NDA_DEL, DocInfo is undefined.
func (nda *NewDocAction) WriteTo(w sophie.Writer) error
type Package struct { Package string Name string Synopsis string Doc string ProjectURL string StarCount int ReadmeFn string ReadmeData string Imports []string TestImports []string Exported []string // exported tokens(funcs/types) References []string Etag string }
Package stores information from crawler
func (db PackedDocDB) Get(key string, data interface{}) bool
func (db PackedDocDB) Iterate( output func(key string, val interface{}) error) error
func (db PackedDocDB) Put(key string, data interface{})
type Segment interface { Name() string Join(name string) villa.Path IsDone() bool Done() error ListFiles() ([]villa.Path, error) Remove() error }
type Segments interface { Watch(watcher *fsnotify.Watcher) error ListAll() ([]Segment, error) // all done ListDones() ([]Segment, error) // max done FindMaxDone() (Segment, error) // generates an arbitrary new segment GenNewSegment() (Segment, error) // generates a segment greated than all existence GenMaxSegment() (Segment, error) // clear ClearUndones() error }
type TokenIndexer struct { index.TokenIndexer sync.RWMutex // contains filtered or unexported fields }
TokenIndexer is thread-safe.
func NewTokenIndexer(root villa.Path, kind string) *TokenIndexer
func (ti *TokenIndexer) IdsOfToken(token string) []string
func (ti *TokenIndexer) LastModified() time.Time
func (ti *TokenIndexer) Load() error
func (ti *TokenIndexer) Modified() bool
func (ti *TokenIndexer) Put(id string, tokens villa.StrSet)
func (ti *TokenIndexer) Sync() error
func (ti *TokenIndexer) TokensOfId(id string) []string
Path | Synopsis |
---|---|
crawler | GCSE Crawler background program. |
exps | |
indexer | |
mergedocs | |
server | GCSE HTTP server. |
tocrawl | |
tools |
Package gcse imports 33 packages (graph). Updated 2017-04-10. Refresh now. Tools for package owners.