Documentation ¶
Index ¶
- Constants
- Variables
- func MoveAndCompress(src, dst string) error
- func MustGlob(pattern string) []string
- func PrependSchema(s string) string
- func UserHomeDir() string
- type About
- type Client
- type Description
- type DirLaster
- type Doer
- type GetRecord
- type HTTPError
- type Harvest
- type Header
- type Identify
- type Interval
- type Laster
- type ListIdentifiers
- type ListMetadataFormats
- type ListRecords
- type ListSets
- type Metadata
- type MetadataFormat
- type MultiError
- type OAIError
- type Record
- type Repository
- type Request
- type RequestNode
- type Response
- type Set
- type Values
Constants ¶
const ( // DefaultTimeout on requests. DefaultTimeout = 5 * time.Minute // DefaultMaxRetries is the default number of retries on a single request. DefaultMaxRetries = 8 )
const Day = 24 * time.Hour
Day has 24 hours.
const Version = "0.1.24"
Version of tools.
Variables ¶
var ( // StdClient is the standard lib http client. StdClient = Client{Doer: http.DefaultClient} // DefaultClient is the more resilient client, that will retry and timeout. DefaultClient = Client{Doer: CreateDoer(DefaultTimeout, DefaultMaxRetries)} // DefaultUserAgent to identify crawler, some endpoints do not like the Go // default (https://golang.org/src/net/http/request.go#L462), e.g. // https://calhoun.nps.edu/oai/request. DefaultUserAgent = fmt.Sprintf("metha/%s", Version) // ControlCharReplacer helps to deal with broken XML: http://eprints.vu.edu.au/perl/oai2. Add more // weird things to be cleaned before XML parsing here. Another faulty: // http://digitalcommons.gardner-webb.edu/do/oai/?from=2016-02-29&metadataPr // efix=oai_dc&until=2016-03-31&verb=ListRecords. Replace control chars // outside XML char range. ControlCharReplacer = strings.NewReplacer( "\u0001", "", "\u0002", "", "\u0003", "", "\u0004", "", "\u0005", "", "\u0006", "", "\u0007", "", "\u0008", "", "\u0009", "", "\u000B", "", "\u000C", "", "\u000E", "", "\u000F", "", "\u0010", "", "\u0011", "", "\u0012", "", "\u0013", "", "\u0014", "", "\u0015", "", "\u0016", "", "\u0017", "", "\u0018", "", "\u0019", "", "\u001A", "", "\u001B", "", "\u001C", "", "\u001D", "", "\u001E", "", "\u001F", "") )
var ( // BaseDir is where all data is stored. BaseDir = filepath.Join(UserHomeDir(), ".metha") // ErrAlreadySynced only signals completion. ErrAlreadySynced = errors.New("already synced") // ErrInvalidEarliestDate for unparsable earliest date. ErrInvalidEarliestDate = errors.New("invalid earliest date") )
Functions ¶
func MoveAndCompress ¶ added in v0.1.6
MoveAndCompress will move src to dst, gzipping in the process.
func PrependSchema ¶
PrependSchema prepends http, if its missing.
Types ¶
type About ¶
type About struct {
Body []byte `xml:",innerxml" json:"body,omitempty"`
}
About has addition record information.
type Client ¶
type Client struct {
Doer Doer
}
Client can execute requests.
func CreateClient ¶
CreateClient creates a client with timeout and retry properties.
type Description ¶
type Description struct {
Body []byte `xml:",innerxml"`
}
Description holds information about a set.
func (Description) GoString ¶
func (desc Description) GoString() string
GoString is a formatter for Description content.
type DirLaster ¶
DirLaster extract the maximum value from the files of a directory. The values are extracted per file via TransformFunc, which gets a filename and returns a token. The tokens are sorted and the lexikographically largest element is returned.
type GetRecord ¶
type GetRecord struct {
Record Record `xml:"record,omitempty" json:"record,omitempty"`
}
GetRecord returns a single record.
type Harvest ¶
type Harvest struct { BaseURL string Format string Set string From string Until string MaxRequests int DisableSelectiveHarvesting bool CleanBeforeDecode bool IgnoreHTTPErrors bool MaxEmptyResponses int SuppressFormatParameter bool DailyInterval bool Identify *Identify Started time.Time // Protects the (rare) case, where we are in the process of renaming // harvested files and get a termination signal at the same time. sync.Mutex }
Harvest contains parameters for a mass-download. MaxRequests and CleanBeforeDecode are switches to handle broken token implementations and funny chars in responses. Some repos do not support selective harvesting (e.g. zvdd.org/oai2). Set "DisableSelectiveHarvesting" to try to grab metadata from these repositories. From and Until must always be given with 2006-01-02 layout. TODO(miku): make zero type work (lazily run identify).
func NewHarvest ¶
NewHarvest creates a new harvest. A network connection will be used for an initial Identify request.
func (*Harvest) DateLayout ¶
DateLayout converts the repository endpoints advertised granularity to Go date format strings.
type Header ¶
type Header struct { Status string `xml:"status,attr" json:"status,omitempty"` Identifier string `xml:"identifier,omitempty" json:"identifier,omitempty"` DateStamp string `xml:"datestamp,omitempty" json:"datestamp,omitempty"` SetSpec []string `xml:"setSpec,omitempty" json:"setSpec,omitempty"` }
A Header is part of other requests.
type Identify ¶
type Identify struct { RepositoryName string `xml:"repositoryName,omitempty" json:"repositoryName,omitempty"` BaseURL string `xml:"baseURL,omitempty" json:"baseURL,omitempty"` ProtocolVersion string `xml:"protocolVersion,omitempty" json:"protocolVersion,omitempty"` AdminEmail []string `xml:"adminEmail,omitempty" json:"adminEmail,omitempty"` EarliestDatestamp string `xml:"earliestDatestamp,omitempty" json:"earliestDatestamp,omitempty"` DeletedRecord string `xml:"deletedRecord,omitempty" json:"deletedRecord,omitempty"` Granularity string `xml:"granularity,omitempty" json:"granularity,omitempty"` Description []Description `xml:"description,omitempty" json:"description,omitempty"` }
Identify reports information about a repository.
type Interval ¶
Interval represents a span of time.
func (Interval) DailyIntervals ¶ added in v0.1.14
DailyIntervals segments a given interval into daily chunks.
func (Interval) MonthlyIntervals ¶
MonthlyIntervals segments a given interval into montly chunks.
type ListIdentifiers ¶
type ListIdentifiers struct { Headers []Header `xml:"header,omitempty" json:"header,omitempty"` ResumptionToken string `xml:"resumptionToken,omitempty" json:"resumptionToken,omitempty"` }
ListIdentifiers lists headers only.
type ListMetadataFormats ¶
type ListMetadataFormats struct {
MetadataFormat []MetadataFormat `xml:"metadataFormat,omitempty" json:"metadataFormat,omitempty"`
}
ListMetadataFormats lists supported metadata formats.
type ListRecords ¶
type ListRecords struct { Records []Record `xml:"record" json:"record"` ResumptionToken string `xml:"resumptionToken" json:"resumptionToken"` }
ListRecords lists records.
type ListSets ¶
type ListSets struct { Set []Set `xml:"set,omitempty" json:"set,omitempty"` ResumptionToken string `xml:"resumptionToken,omitempty" json:"resumptionToken,omitempty"` }
ListSets lists available sets. TODO(miku): resumptiontoken can have expiration date, etc.
type Metadata ¶
type Metadata struct {
Body []byte `xml:",innerxml"`
}
Metadata contains the actual metadata, conforming to varying schemas.
func (Metadata) MarshalJSON ¶
MarshalJSON marshals the metadata body.
type MetadataFormat ¶
type MetadataFormat struct { MetadataPrefix string `xml:"metadataPrefix,omitempty" json:"metadataPrefix,omitempty"` Schema string `xml:"schema,omitempty" json:"schema,omitempty"` MetadataNamespace string `xml:"metadataNamespace,omitempty" json:"metadataNamespace,omitempty"` }
MetadataFormat holds information about a format.
type MultiError ¶
type MultiError struct {
Errors []error
}
MultiError collects a number of errors.
func (*MultiError) Error ¶
func (e *MultiError) Error() string
Error formats all error strings into a single string.
type OAIError ¶
type OAIError struct { Code string `xml:"code,attr" json:"code,omitempty"` Message string `xml:",chardata" json:"message,omitempty"` }
OAIError is an OAI protocol error.
type Record ¶
type Record struct { Header Header `xml:"header,omitempty" json:"header,omitempty"` Metadata Metadata `xml:"metadata,omitempty" json:"metadata,omitempty"` About About `xml:"about,omitempty" json:"about,omitempty"` }
Record represents a single record.
type Repository ¶
type Repository struct {
BaseURL string
}
Repository represents an OAI endpoint.
func (Repository) Formats ¶
func (r Repository) Formats() ([]MetadataFormat, error)
Formats returns a list of metadata formats.
type Request ¶
type Request struct { BaseURL string Verb string Identifier string MetadataPrefix string From string Until string Set string ResumptionToken string CleanBeforeDecode bool SuppressFormatParameter bool }
A Request can express any request, that can be sent to an OAI server. Not all combination of values will yield valid requests.
type RequestNode ¶
type RequestNode struct { Verb string `xml:"verb,attr" json:"verb,omitempty"` Set string `xml:"set,attr" json:"set,omitempty"` MetadataPrefix string `xml:"metadataPrefix,attr" json:"metadataPrefix,omitempty"` }
RequestNode carries the request information into the response.
type Response ¶
type Response struct { ResponseDate string `xml:"responseDate,omitempty" json:"responseDate,omitempty"` Request RequestNode `xml:"request,omitempty" json:"request,omitempty"` Error OAIError `xml:"error,omitempty" json:"error,omitempty"` GetRecord GetRecord `xml:"GetRecord,omitempty" json:"GetRecord,omitempty"` Identify Identify `xml:"Identify,omitempty" json:"Identify,omitempty"` ListIdentifiers ListIdentifiers `xml:"ListIdentifiers,omitempty" json:"ListIdentifiers,omitempty"` ListMetadataFormats ListMetadataFormats `xml:"ListMetadataFormats,omitempty" json:"ListMetadataFormats,omitempty"` ListRecords ListRecords `xml:"ListRecords,omitempty" json:"ListRecords,omitempty"` ListSets ListSets `xml:"ListSets,omitempty" json:"ListSets,omitempty"` }
Response is the envelope. It can hold any OAI response kind.
func (*Response) GetResumptionToken ¶
GetResumptionToken returns the resumption token or an empty string if it does not have a token
func (*Response) HasResumptionToken ¶
HasResumptionToken determines if the request has a ResumptionToken.
type Set ¶
type Set struct { SetSpec string `xml:"setSpec,omitempty" json:"setSpec,omitempty"` SetName string `xml:"setName,omitempty" json:"setName,omitempty"` SetDescription Description `xml:"setDescription,omitempty" json:"setDescription,omitempty"` }
A Set has a spec, name and description.
type Values ¶
Values enhances the builtin url.Values.
func (Values) EncodeVerbatim ¶
EncodeVerbatim is like Encode(), but does not escape the keys and values.