sentimenter

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2018 License: Apache-2.0 Imports: 19 Imported by: 0

README

sentimenter

Example of multi-step process leveraging GCF and multiple back-end services:

  • Cloud Spanner - horizontally scalable, strongly consistent, relational database service
  • Pub/Sub - Ingest event streams at any scale from anywhere for real-time streaming
  • Cloud Natural Language API - Derive insights from unstructured text using Google ML
  • Stackdriver - Monitoring and management for services, containers, applications, and infrastructure

The sentimenter solutions allows the user to query for public sentiment from the last 100 tweets for any submitted term.

overview

Usage

1. Term Submission

The submitter function which the user can invoke over HTTPS with their search term will create a job, save it with Received state in Spanner DB, and queue that job for processing in Pub/Sub topic.

HTTPS_TRIGGER_URL=$(gcloud alpha functions describe sentimenter-submitter \
    --format='value(httpsTrigger.url)')
curl "${HTTPS_TRIGGER_URL}?term=serverless"

Returns

{
  "id": "d50ff5b2-2120-4587-a99e-c4aea5c3f592",
  "created_on": "2018-08-16T14:52:20.195459344Z",
  "search_term": "serverless",
  "status": "Received",
  "result": null
}
2. Job Processing (Background)

The processor function will be automatically triggered by GCF when a new job arrives on Pub/Sub topic. The processor will change the state of that job to Processing, retrieve tweets using Twitter API, and score each tweet's sentiment using Google's Natural Language API. When done, the score of that job will be saved in the Spanner DB and the job status will be updated to Processed.

3. Job Status

Throughout the entire process, the user can invoke the status function over HTTPS and get the current state of the submitted job. If the job status is Processed, the status of the job will also include its score.

HTTPS_TRIGGER_URL=$(gcloud alpha functions describe sentimenter-status \
    --format='value(httpsTrigger.url)')
curl "${HTTPS_TRIGGER_URL}?id=c24774a1-89df-4ec0-a962-121a36d6966c"

Result

{
  "id": "6c211819-30ef-4bdb-a723-a5be4979c101",
  "created_on": "2018-08-16T14:54:02.822679302Z",
  "search_term": "serverless",
  "status": "Processed",
  "status_url": "https://us-central1-s9-demo.cloudfunctions.net/sentimenter-status?id=6c211819-30ef-4bdb-a723-a5be4979c101",
  "result": {
    "processed_on": "2018-08-16T14:54:06.636848268Z",
    "tweets": 100,
    "positive": 9,
    "negative": 0,
    "score": 13.880000105127692
  }
}

Note, while the positive or negative classification of each tweet is reliable, the overall score of the sentiment for all tweets is derived by combining sentiment with the magnitude of each tweet which tends to favour longer tweets. As a result, the score is only a relative indicator of the overall strength of the sentiment and probably meaningless in itself.

Setup

PubSub Topic
gcloud beta pubsub topics create "sentimenter"

Returns

Created topic [projects/s9-demo/topics/sentimenter]
Spanner DB

First create a Spanner instance

gcloud beta spanner instances create "sentimenter" \
  --config=regional-us-central1 \
  --description="Sentimenter DB" \
  --nodes=1

Returns

Creating instance...done.

Than create the database in the previously created Spanner instance

gcloud spanner databases create "db" --instance="sentimenter"

Returns

Creating database...done.

Finally create the two tables to support the sentimenter solution

gcloud spanner databases ddl update "db" --instance="sentimenter" \
    --ddl='CREATE TABLE jobs (id STRING(MAX), search_term STRING(MAX), created_on TIMESTAMP, status STRING(MAX)) PRIMARY KEY (id)'

gcloud spanner databases ddl update "db" --instance="sentimenter" \
    --ddl='CREATE TABLE results (id STRING(MAX), processed_on TIMESTAMP, tweets INT64, positive INT64, negative INT64, score FLOAT64) PRIMARY KEY (id)'

Returns

Updating database... done.
Functions

First define the necessary envirnment variables

ENV_VARS="TOPIC_NAME=sentimenter"
ENV_VARS="${ENV_VARS},DB_PATH=projects/s9-demo/instances/sentimenter/databases/db"
ENV_VARS="${ENV_VARS},T_CONSUMER_KEY=${T_CONSUMER_KEY},T_CONSUMER_SECRET=${T_CONSUMER_SECRET}"
ENV_VARS="${ENV_VARS},T_ACCESS_TOKEN=${T_ACCESS_TOKEN},T_ACCESS_SECRET=${T_ACCESS_SECRET}"

Note, I'm obfuscating the Twitter API variables by pulling them form my local variables. You can just type these keys here if you need to. See this instructions on how to create Twitter API credentials see

Then deploy the three functions using the GCP gcloud command.

gcloud alpha functions deploy sentimenter-submitter \
  --entry-point SubmitFunction \
  --set-env-vars $ENV_VARS \
  --memory 128MB \
  --region us-central1 \
  --runtime go111 \
  --trigger-http

gcloud alpha functions deploy sentimenter-status \
  --entry-point StatusFunction \
  --set-env-vars $ENV_VARS \
  --memory 128MB \
  --region us-central1 \
  --runtime go111 \
  --trigger-http

gcloud alpha functions deploy sentimenter-processor \
  --entry-point ProcessorFunction \
  --set-env-vars $PROCESS_ENV_VARS \
  --memory 256MB \
  --region us-central1 \
  --runtime go111 \
  --trigger-topic=sentimenter \
  --timeout=540s

If everything goes well, you should see this kind of response to every one of these above functions

Deploying function (may take a while - up to 2 minutes)...done.
availableMemoryMb: 128
entryPoint: StatusFunction
environmentVariables:
  DB_PATH: projects/s9-demo/instances/sentimenter/databases/db
  TOPIC_NAME: sentimenter
httpsTrigger:
  url: https://us-central1-s9-demo.cloudfunctions.net/sentimenter-status
labels:
  deployment-tool: cli-gcloud
name: projects/s9-demo/locations/us-central1/functions/sentimenter-status
runtime: go111
serviceAccountEmail: s9-demo@appspot.gserviceaccount.com
sourceUploadUrl: ...
status: ACTIVE
timeout: 60s
updateTime: '2018-08-16T00:38:33Z'
versionId: '4'

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ProcessorFunction

func ProcessorFunction(ctx context.Context, m PubSubMessage) error

ProcessorFunction processes pubsub messages

func StatusFunction

func StatusFunction(w http.ResponseWriter, r *http.Request)

StatusFunction represents the job status checker functionality

func SubmitFunction

func SubmitFunction(w http.ResponseWriter, r *http.Request)

SubmitFunction represents the request submit functionality

Types

type PubSubMessage

type PubSubMessage struct {
	Data string `json:"data"`
}

PubSubMessage represents PubSub payload

type RequestStatus

type RequestStatus int

RequestStatus represents the sentiment request job status

type SentimentRequest

type SentimentRequest struct {
	ID     string           `json:"id"`
	On     time.Time        `json:"created_on"`
	Term   string           `json:"search_term"`
	Status string           `json:"status"`
	Result *SentimentResult `json:"result"`
}

SentimentRequest represents the sentiment request job

type SentimentResult

type SentimentResult struct {
	Processed time.Time `json:"processed_on"`
	Tweets    int64     `json:"tweets"`
	Positive  int64     `json:"positive"`
	Negative  int64     `json:"negative"`
	Score     float64   `json:"score"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL