sherlock

package module
v0.7.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2024 License: Apache-2.0 Imports: 33 Imported by: 3

README

Sherlock

Illustration of Sherlock Holmes and Watson in a train car, by Sidney Paget. From Arthur Conan Doyle's 1892 book 'The Adventure of Silver Blaze'

GoDoc Version Build Status Go Report Card Codecov

Relentless Metadata Inspector

Sherlock is a Go library that inspects a URL for any and all available metadata, pulling from whatever metadata formats are available, and returning it as an ActivityStreams 2.0 document.

The goal is to have a standard interface into all web content, regardless of competing data standards.

Supported Formats

ActivityPub/ActivityStreams

MicroFormats

Open Graph

In Progress

🚧 WebFinger

🚧 JSON-LD (Linked)

🚧 Twitter Metadata

🚧 Microdata

🚧 RDFa

🚧 oEmbed data provider

Using Sherlock
client := sherlock.NewClient()

// If you only have a URL, then pass it in to .Load()
result, err := client.Load("https://my-url-here")

// If you have already downloaded a file, then pass it to .Parse()
result, err := sherlock.ParseHTML("https://original-url", &bytes.Buffer)

Using Sherlock with Hannibal

Sherlock can also be used as an http client for Hannibal, the ActivityPub library for Go. This allows many other online resources to look like they're ActivityPub-enabled.

Documentation

Overview

Package sherlock is a library for extracting metadata from web pages. It uses as many methods as possible to extract page data, including: - ActivityStreams/JSON-LD - Open Graph - Microformats2

Coming Soon.. - HTML Meta Tags - oEmbed - JSON-LD - Twitter Cards?

Index

Constants

View Source
const ContentType = "Content-Type"

ContentType is the string used in the HTTP header to designate a MIME type

View Source
const ContentTypeActivityPub = "application/activity+json"

ContentTypeActivityPub is the standard MIME type for ActivityPub content

View Source
const ContentTypeAtom = "application/atom+xml"

ContentTypeAtom is the standard MIME Type for Atom Feeds

View Source
const ContentTypeForm = "application/x-www-form-urlencoded"

ContentTypeForm is the standard MIME Type for Form encoded content

View Source
const ContentTypeHTML = "text/html"

ContentTypeHTML is the standard MIME type for HTML content

View Source
const ContentTypeJSON = "application/json"

ContentTypeJSON is the standard MIME Type for JSON content

View Source
const ContentTypeJSONFeed = "application/feed+json"

ContentTypeJSONFeed is the standard MIME Type for JSON Feed content https://en.wikipedia.org/wiki/JSON_Feed

View Source
const ContentTypeJSONLD = "application/ld+json"

ContentTypeJSONLD is the standard MIME Type for JSON-LD content https://en.wikipedia.org/wiki/JSON-LD

View Source
const ContentTypeJSONResourceDescriptor = "application/jrd+json"

ContentTypeJSONResourceDescriptor is the standard MIME Type for JSON Resource Descriptor content which is used by WebFinger: https://datatracker.ietf.org/doc/html/rfc7033#section-10.2

View Source
const ContentTypePlain = "text/plain"

ContentTypePlain is the default plaintext MIME type

View Source
const ContentTypeRSS = "application/rss+xml"

ContentTypeRSS is the standard MIME Type for RSS Feeds

View Source
const ContentTypeXML = "application/xml"

ContentTypeXML is the standard MIME Type for XML content

View Source
const FormatActivityStream = "ACTIVITYSTREAM"
View Source
const FormatJSONFeed = "JSONFEED"
View Source
const FormatMicroFormats = "MICROFORMATS"
View Source
const FormatRSS = "RSS"
View Source
const HTTPHeaderAccept = "Accept"

HTTPHeaderAccept is the string used in the HTTP header to request a response be encoded as a MIME type

View Source
const HTTPHeaderCacheControl = "Cache-Control"
View Source
const HTTPHeaderLink = "Link"
View Source
const LinkRelationAlternate = "alternate"
View Source
const LinkRelationFeed = "feed"
View Source
const LinkRelationHub = "hub"
View Source
const LinkRelationIcon = "icon"
View Source
const LinkRelationSelf = "self"
View Source
const LoadDocumentTypeActor = 1
View Source
const LoadDocumentTypeCollection = 2
View Source
const LoadDocumentTypeDocument = 3
View Source
const LoadDocumentTypeUnknown = 0

Variables

This section is empty.

Functions

func IsValidAddress added in v0.6.5

func IsValidAddress(address string) bool

IsValidAddress returns TRUE for all values that Sherlock THINKS it SHOULD be able to prorcess. This includes: @username@host.tld and https://host.tld/username addresses. IMPORTANT: Just because this function returns TRUE does NOT mean that the address is valid. It just means that it looks like a valid format, but it will still need to be checked.

func ParseOEmbed

func ParseOEmbed(reader io.Reader, data mapof.Any)

Types

type Client

type Client struct {
	UserAgent     string          // User-Agent string to send with every request
	RemoteOptions []remote.Option // Additional options to pass to the remote library
}

Client implements the hannibal/streams.Client interface, and is used to load JSON-LD documents from remote servers. The sherlock client maps additional meta-data into a standard ActivityStreams document.

func NewClient

func NewClient(options ...ClientOption) Client

NewClient returns a fully initialized Client object

func (Client) Load

func (client Client) Load(url string, options ...any) (streams.Document, error)

Load retrieves a document from a remote server and returns it as a streams.Document It uses either the "Actor" or "Document" methods of generating it ActivityStreams result. "Document" treats the URL as a single ActivityStreams document, translating OpenGraph, MicroFormats, and JSON-LD into an ActivityStreams equivalent. "Actor" treats the URL as an Actor, translating RSS, Atom, JSON, and MicroFormats feeds into an ActivityStream equivalent.

func (*Client) WithOptions added in v0.6.0

func (client *Client) WithOptions(options ...ClientOption)

WithOptions applies one or more ClientOption functions to the client

type ClientOption added in v0.6.0

type ClientOption func(*Client)

ClientOption defines a functional option that modifies a Client object

func WithRemoteOptions added in v0.6.0

func WithRemoteOptions(middleware ...remote.Option) ClientOption

WithRemoteOptions is a ClientOption that appends one or more remote.Option objects to the Client object RemoteOptions are executed on every remote request

func WithUserAgent added in v0.6.0

func WithUserAgent(userAgent string) ClientOption

WithUserAgent is a ClientOption that sets the UserAgent property on the Client object

type LoadConfig added in v0.6.0

type LoadConfig struct {
	DocumentType     int
	MaximumRedirects int
	DefaultValue     map[string]any
}

func NewLoadConfig added in v0.6.0

func NewLoadConfig(options ...any) LoadConfig

type LoadOption added in v0.6.0

type LoadOption func(*LoadConfig)

func AsActor added in v0.6.0

func AsActor() LoadOption

func AsCollection added in v0.6.0

func AsCollection() LoadOption

func AsDocument added in v0.6.0

func AsDocument() LoadOption

func WithDefaultValue added in v0.6.0

func WithDefaultValue(defaultValue map[string]any) LoadOption

func WithMaximumRedirects added in v0.6.0

func WithMaximumRedirects(maximumRedirects int) LoadOption

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL