centiment

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 14, 2018 License: BSD-3-Clause Imports: 22 Imported by: 0

README

🤖 centiment

GoDoc Build Status

Centiment is a service that performs sentiment analysis of tweets using Google's Natural Language APIs. It was designed with the goal of searching for cryptocurrency tweets, but can be used to analyze and aggregate sentiments for any search terms.

  • It will search Twitter for tweets matching the configured search terms, and store the aggregate "sentiment" (negative, neutral or positive) and magnitude each time it runs a search.
  • Search terms can be easily added without writing code via cmd/centimentd/search.toml
  • The aggregate results are made available via a REST API.

The goal is to see whether written sentiment about cryptocurrencies has correlation with prices - e.g. does a negative sentiment predict or otherwise reinforce a drop in price?

Usage

Centiment relies on Google's Natural Language APIs and Firestore, but otherwise can run anywhere provided it can reach these services.

At a minimum, you'll need to:

Running Locally

You can run Centiment locally with a properly configured Go toolchain and Service Account credentials saved locally.

# Fetch Centiment & its dependencies
go get github.com/elithrar/centiment/...

# Initialize the Firebase SDK & create the required indexes
centiment/ $ firebase login
centiment/ $ firebase deploy --only firestore:indexes

# Set the required configuration as env. variables, or pass via flags (see: `centiment --help`)
export TWITTER_CONSUMER_KEY="key"; \
  export TWITTER_CONSUMER_SECRET="secret"; \
  export TWITTER_ACCESS_TOKEN="at"; \
  export TWITTER_ACCESS_KEY="ak"; \
  export CENTIMENT_PROJECT_ID="your-gcp-project-id"; \
  export GOOGLE_APPLICATION_CREDENTIALS="/path/to/creds.json";

# Run centimentd (the server) in the foreground, provided its on your PATH:
$ centimentd
Deploy to App Engine Flexible

App Engine Flexible makes running Centiment fairly easy: no need to set up or secure an environment.

  • git clone or go get this repository: git clone https://github.com/elithrar/centiment.git
  • Copy app.example.yaml to app.yaml and add your Twitter API keys under env_variables - important: don't check these credentials into your source-code! The .gitignore file included in the repo should help to prevent that.

The service can then be deployed via:

centiment $ cd cmd/centimentd
cmd/centimentd $ gcloud app deploy
Cost

Some notes on running this yourself:

  • The default app.example.yaml included alongside is designed to use the minimum set of resources on App Engine Flex. Centiment is extremely efficient (it's written in Go) and runs quickly on a single CPU core + 600MB RAM. At the time of writing (Jan 2018), running a 1CPU / 1GB RAM / 10GB disk App Engine Flex instance for a month is ~USD$44/month.
  • Cloud Function pricing is fairly cheap for our use-case: if you're running a search every 10 minutes, that's 6 times an hour * 730 hours per month = 4380 invocations per search term per month. That falls into the free tier of Cloud Functions pricing.
  • The Natural Language API is where the majority of the costs will lie if you choose to run Centiment more aggressively (more tweets, more often). Searching for up to 50 tweets (per search term) every 10 minutes is 219,000 Sentiment Analysis records per month, and results in a total of USD$219 per search term per month (as of Jan 2018), excluding the small free tier (first 5k)

Note: Make sure to do the math before tweaking the CENTIMENT_RUN_INTERVAL or CENTIMENT_MAX_TWEETS environmental variables, or adding additional search terms to cmd/centimentd/search.toml.

Using BigQuery for Analysis

In order to make analysis easier, you can import data directly into BigQuery after each run via a Cloud Function that is triggered from every database write.

Pre-requisites

You'll need to:

  • Create a BigQuery dataset called "Centiment" and a table called "sentiments". You can opt to use different names, but you will need to make sure to use config:set within the Firebase SDK so that our function works.
# Create an empty table with our schema using the bq CLI tool (installed with the gcloud SDK)
centiment/ $ bq mk --schema bigquery.schema.json -t centiment.sentiments
centiment $ cd _functions
# Log into your Google Cloud Platform account
_functions $ firebase login
# Set the dataset and table names
_functions $ firebase functions:config:set centiment.dataset="Centiment" centiment.table="sentiments"
# Deploy this secific function.
_functions $ firebase deploy --only functions:sentimentsToBQ
# Done!
Docker

TODO(matt): Create a Dockerfile - for this FROM alpine:latest

Running Elsewhere

If you're running Centiment elsewhere, you'll need to provide the application with credentials to reach Firestore and the Natural Language APIs by setting the GOOGLE_APPLICATION_CREDENTIALS environmental variable to the location of your credentials file.

Further, the Store interface allows you to provide alternate backend datastores (e.g. PostgreSQL), if you want to run Centiment on alternative infrastructure.

REST API

Centiment exposes its analysis as JSON via a REST API. Requests are not authenticated by default.

# Get the latest sentiments for the named currency ("bitcoin", in this case)
GET /sentiments/bitcoin

[
  {
    "id": "lwnXwJmNbxRoE0mzXff0",
    "topic": "bitcoin",
    "slug": "bitcoin",
    "query": "bitcoin OR BTC OR #bitcoin OR #BTC -filter:retweets",
    "count": 154,
    "score": 0.11818181921715863,
    "stdDev": 0.3425117817511681,
    "variance": 0.11731432063835981,
    "fetchedAt": "2018-02-12T05:24:15.44671Z"
  }
]

Contributing

PRs are welcome, but any non-trivial changes should be raised as an issue first to discuss the design and avoid having your hard work rejected!

Suggestions for contributors:

  • Additional sentiment analysis adapters (e.g. Azure Cognitive Services, IBM Watson)
  • Alternative backend datastores

License

BSD licensed. See the LICENSE file for details.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (

	// ErrNoResultsFound is returned when a DB cannot find a result in the store.
	ErrNoResultsFound = errors.New("store: no results found")
	// ErrInvalidSlug is returned when a URL slug does not match expected slug format (via slug.IsSlug)
	ErrInvalidSlug = errors.New("store: bad slug format")
)

Functions

func AddHealthCheckEndpoints

func AddHealthCheckEndpoints(r *mux.Router, env *Env) *mux.Router

AddHealthCheckEndpoints adds the health check endpoints to the given router, and returns an instance of the Subrouter.

func AddIndexEndpoints

func AddIndexEndpoints(r *mux.Router, env *Env) *mux.Router

AddIndexEndpoints adds the entrypoint/index handlers to the given router.

func AddMetricEndpoints

func AddMetricEndpoints(r *mux.Router, env *Env) *mux.Router

AddMetricEndpoints adds the metric/debugging endpoints to the given router, and returns an instance of the Subrouter.

func AddSentimentEndpoints

func AddSentimentEndpoints(r *mux.Router, env *Env) *mux.Router

AddSentimentEndpoints adds the sentiment endpoints to the given router, and returns an instance of the Subrouter.

func LogRequest

func LogRequest(logger log.Logger) func(http.Handler) http.Handler

LogRequest logs each HTTP request, using the given logger.

func RunServer

func RunServer(srv *http.Server) func() error

RunServer runs the configured server. TODO(matt): Create a NewServer constructor -> call srv.Run()

Types

type Aggregator

type Aggregator struct {
	// contains filtered or unexported fields
}

Aggregator aggregates results from an analysis run.

func NewAggregator

func NewAggregator(logger log.Logger, db DB) (*Aggregator, error)

NewAggregator creates a new Aggregator: call Run to collect results and save them to the given DB.

func (*Aggregator) Run

func (ag *Aggregator) Run(ctx context.Context, results <-chan *AnalyzerResult) error

Run an aggregatation on the provided results.

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer holds the configuration for running analyses against a Natural Language API. An Analyzer should only be initialized via NewAnalyzer.

func NewAnalyzer

func NewAnalyzer(logger log.Logger, client *nl.Client, numWorkers int) (*Analyzer, error)

NewAnalyzer instantiates an Analyzer. Call the Run method to start an analysis.

func (*Analyzer) Run

func (az *Analyzer) Run(ctx context.Context, searched <-chan *SearchResult, analyzed chan<- *AnalyzerResult) error

Run passes the values from searched to the Natural Language API, performs analysis concurrently, and returns the results on the analyzed channel.

Run returns when analyses have completed, and can be cancelled by wrapping the provided context with context.WithCancel and calling the provided CancelFunc.

type AnalyzerResult

type AnalyzerResult struct {
	TweetID    int64
	Score      float32
	Magnitude  float32
	SearchTerm *SearchTerm
}

AnalyzerResult is the result from natural language analysis of a tweet.

type DB

type DB interface {
	SaveSentiment(ctx context.Context, sentiment Sentiment) (string, error)
	GetSentimentByID(ctx context.Context, id string) (*Sentiment, error)
	GetSentimentsBySlug(ctx context.Context, slug string, limit int) ([]*Sentiment, error)
	GetSentimentsByTopic(ctx context.Context, topic string, limit int) ([]*Sentiment, error)
}

DB represents a database for storing & retrieving Sentiments.

type Endpoint

type Endpoint struct {
	Env     *Env
	Handler func(*Env, http.ResponseWriter, *http.Request) error
}

Endpoint represents a application server endpoint. It bundles a error-returning handler and injects our application dependencies.

func (*Endpoint) ServeHTTP

func (ep *Endpoint) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP implements http.Handler for an Endpoint.

type Env

type Env struct {
	DB       DB
	Hostname string
	Logger   log.Logger
}

Env contains the application dependencies. TODO(matt): Make this type Server

type Firestore

type Firestore struct {
	Store *firestore.Client
	// The name of the collection.
	CollectionName string
}

Firestore is an implementation of DB that uses Google Cloud Firestore.

func (*Firestore) GetSentimentByID

func (fs *Firestore) GetSentimentByID(ctx context.Context, id string) (*Sentiment, error)

GetSentimentByID fetches an existing Sentiment by its ID. It will return a nil value and no error if no record was found.

func (*Firestore) GetSentimentsBySlug

func (fs *Firestore) GetSentimentsBySlug(ctx context.Context, topicSlug string, limit int) ([]*Sentiment, error)

GetSentimentsBySlug fetches all historical sentiments for the given slug ("slugified" topic name) up to limit records. Providing a limit of 0 (or less) will fetch all records. Records are ordered from most recent to least recent.

An error (ErrNoResultsFound) will be returned if no records were found.

func (*Firestore) GetSentimentsByTopic

func (fs *Firestore) GetSentimentsByTopic(ctx context.Context, topic string, limit int) ([]*Sentiment, error)

GetSentimentsByTopic fetches all historical sentiments for the given topic, up to limit records. Providing a limit of 0 (or less) will fetch all records. Records are ordered from most recent to least recent.

An error (ErrNoResultsFound) will be returned if no records were found.

func (*Firestore) SaveSentiment

func (fs *Firestore) SaveSentiment(ctx context.Context, sentiment Sentiment) (string, error)

SaveSentiment saves a Sentiment to the datastore, and returns generated ID of the new record.

type HTTPError

type HTTPError struct {
	Code int   `json:"code"`
	Err  error `json:"error"`
}

HTTPError represents a HTTP error.

func (HTTPError) Error

func (he HTTPError) Error() string

func (HTTPError) JSON

func (he HTTPError) JSON() ([]byte, error)

JSON formats the current HTTPError as JSON.

type SearchResult

type SearchResult struct {
	// contains filtered or unexported fields
}

SearchResult represents the result of a search against Twitter, and encapsulates a Tweet.

type SearchTerm

type SearchTerm struct {
	// The human-readable topic of the search.
	Topic string
	// The Twitter search query
	// Ref: https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators
	Query string
}

SearchTerm represents the term or phrase to search for a given topic.

type Searcher

type Searcher struct {
	// contains filtered or unexported fields
}

Searcher is a worker pool that searches Twitter for the given set of search terms. Call NewSearcher to configure a new pool. Pools are safe to use concurrently.

The "Run" method on Searcher should be used to begin a search.

func NewSearcher

func NewSearcher(logger log.Logger, terms []*SearchTerm, minResults int, maxAge time.Duration, client *anaconda.TwitterApi, db DB) (*Searcher, error)

NewSearcher creates a new Searcher with the given search terms. It will attempt to fetch minResults per search term and return tweets newer than maxAge.

func (*Searcher) Run

func (sr *Searcher) Run(ctx context.Context, searched chan<- *SearchResult) error

Run performs a concurrent search against the configured terms, and returns results onto the provided searched channel.

Run returns when searches have completed, and can be cancelled by wrapping the provided context with context.WithCancel and calling the provided CancelFunc.

type Sentiment

type Sentiment struct {
	ID         string    `json:"id" firestore:"id,omitempty"`
	Topic      string    `json:"topic" firestore:"topic"`
	Slug       string    `json:"slug" firestore:"slug"`
	Query      string    `json:"query" firestore:"query"`
	Count      int64     `json:"count" firestore:"count"`
	Score      float64   `json:"score" firestore:"score"`
	StdDev     float64   `json:"stdDev" firestore:"stdDev"`
	Variance   float64   `json:"variance" firestore:"variance"`
	FetchedAt  time.Time `json:"fetchedAt" firestore:"fetchedAt"`
	LastSeenID int64     `json:"-" firestore:"lastSeenID"`
}

Sentiment represents the aggregated result of performing sentiment analysis against a number (Count) of tweets for a given topic.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL