config

package

v0.0.0-...-08bcabf Latest Latest Go to latest Published: Jan 13, 2015 License: GPL-3.0 Imports: 16 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/adamar/harvester

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func AddToSchedule()
func CheckDataDir()
func CopyTrainingData()
func ListSchedule()
func MakeInfluxRow(row interface{}, fields []string) [][]interface{}
func MakeSequenceHash(hash string) uint64
func SaveConfig(c SocialHarvestConf, f ...string) bool
type Harvest
type HarvestConfig
type HarvestState
type HypermediaCurie
type HypermediaForm
type HypermediaFormField
type HypermediaFormFieldError
type HypermediaFormFieldRule
type HypermediaLink
type HypermediaMeta
type HypermediaResource
- func NewHypermediaResource() *HypermediaResource
- func (h *HypermediaResource) AddCurie(name string, href string, templated bool)
- func (h *HypermediaResource) End(message ...string) *HypermediaResource
- func (h *HypermediaResource) Success()
type ServicesConfig
type Settings
type SocialHarvest
type SocialHarvestConf
type SocialHarvestContributorGrowth
type SocialHarvestDB
- func NewDatabase(config SocialHarvestConf) *SocialHarvestDB
- func (database *SocialHarvestDB) CreatePartitionTable(table string) error
- func (database *SocialHarvestDB) GetLastHarvestId(territory string, network string, action string, value string) string
- func (database *SocialHarvestDB) GetLastHarvestTime(territory string, network string, action string, value string) time.Time
- func (database *SocialHarvestDB) HasAccess() bool
- func (database *SocialHarvestDB) SaveSettings(settingsRow Settings)
- func (database *SocialHarvestDB) SetLastHarvestTime(territory string, network string, action string, value string, ...)
- func (database *SocialHarvestDB) StoreRow(row interface{})
type SocialHarvestHarvest
type SocialHarvestHashtag
type SocialHarvestMention
type SocialHarvestMessage
type SocialHarvestReport
type SocialHarvestSchedule
- func NewSchedule(config SocialHarvestConf) *SocialHarvestSchedule
type SocialHarvestSharedLink

Constants ¶

This section is empty.

Variables ¶

View Source

var SeriesCollections = map[string]string{
	"SocialHarvestMessage":           "messages",
	"SocialHarvestSharedLink":        "shared_links",
	"SocialHarvestMention":           "mentions",
	"SocialHarvestHashtag":           "hashtags",
	"SocialHarvestContributorGrowth": "contributor_growth",
	"SocialHarvestHarvest":           "harvest",
	"SocialHarvestReport":            "reports",
}

Where to store this stuff (log file, collection, and table names)

Functions ¶

func AddToSchedule ¶

func AddToSchedule()

func CheckDataDir ¶

func CheckDataDir()

Checks to ensure the data directory exists and is writable. It will be created if not. Config and training data go into this directory.

func CopyTrainingData ¶

func CopyTrainingData()

Copies default or configured training data to `sh-data` if it isn't there already.

func ListSchedule ¶

func ListSchedule()

func MakeInfluxRow ¶

func MakeInfluxRow(row interface{}, fields []string) [][]interface{}

Returns data in a series of points for use with InfluxDB, optionally filtering which fields end up in the series.

func MakeSequenceHash ¶

func MakeSequenceHash(hash string) uint64

For InfluxDB. This hash (crc64 checksum) should not easily repeat itself with the time field. Time is to the second in most cases, hashing the message id (id_str for Twitter and Facebook's Id values are strings) should avoid dupes just in case a message is processed twice.

func SaveConfig ¶

func SaveConfig(c SocialHarvestConf, f ...string) bool

Saves the current configuration to the shared data directory on disk as to not overwrite the original. Unless removed, this will be used should the application be restarted (overwriting the default config).

Types ¶

type Harvest ¶

type Harvest struct{}

type HarvestConfig ¶

type HarvestConfig struct {
	QuestionRegex string `json:"questionRegex"`
	Territories   []struct {
		Services ServicesConfig `json:"-"`
		Name     string         `json:"name"`
		Content  struct {
			Options struct {
				KeepMessage          bool   `json:"keepMessage"`
				Lang                 string `json:"lang"`
				TwitterGeocode       string `json:"twitterGeocode"`
				OnlyUseInstagramTags bool   `json:"onlyUseInstagramTags"`
			} `json:"options"`
			Keywords      []string `json:"keywords"`
			Urls          []string `json:"urls"`
			InstagramTags []string `json:"instagramTags"`
		} `json:"content"`
		Accounts struct {
			Twitter    []string `json:"twitter"`
			Facebook   []string `json:"facebook"`
			GooglePlus []string `json:"googlePlus"`
			YouTube    []string `json:"youTube"`
			Instagram  []string `json:"instagram"`
		} `json:"accounts"`
		Schedule struct {
			Everything struct {
				Content  string `json:"content"`
				Accounts string `json:"accounts"`
				Streams  string `json:"streams"`
			} `json:"everything"`
			Twitter struct {
				Content  string `json:"content"`
				Accounts string `json:"accounts"`
				Streams  string `json:"streams"`
			} `json:"twitter"`
			Facebook struct {
				Content  string `json:"content"`
				Accounts string `json:"accounts"`
				Streams  string `json:"streams"`
			} `json:"facebook"`
			GooglePlus struct {
				Content  string `json:"content"`
				Accounts string `json:"accounts"`
			} `json:"googlePlus"`
			YouTube struct {
				Content  string `json:"content"`
				Accounts string `json:"accounts"`
				Streams  string `json:"streams"`
			} `json:"youTube"`
		} `json:"schedule"`
		Limits struct {
			MaxResultsPages int    `json:"maxResultsPages"`
			ResultsPerPage  string `json:"resultsPerPage"`
		} `json:"limits"`
	} `json:"territories"`
}

type HarvestState ¶

type HarvestState struct {
	LastId         string
	LastTime       time.Time
	PagesHarvested int
	ItemsHarvested int
}

type HypermediaCurie ¶

type HypermediaCurie struct {
	Name      string `json:"name,omitempty"`
	Href      string `json:"href,omitempty"`
	Templated bool   `json:"templated,omitempty"`
}

Defines a CURIE

type HypermediaForm ¶

type HypermediaForm struct {
	Name          string                         `json:"name,omitempty"`
	Method        string                         `json:"method,omitempty"`
	Enctype       string                         `json:"enctype"`
	AcceptCharset string                         `json:"accept-charset,omitempty"`
	Target        string                         `json:"target,omitempty"`
	Action        string                         `json:"action,omitempty"`
	Autocomplete  bool                           `json:"autocomplete,omitempty"`
	Fields        map[string]HypermediaFormField `json:"_fields,omitempty"`
}

Form structure defines attributes that match HTML. This tells applications how to work with resources in order to manipulate state. Any attribute not found in HTML should be prefixed with an underscore (for example, "_fields").

type HypermediaFormField ¶

type HypermediaFormField struct {
	Name         string                              `json:"name,omitempty"`
	Value        string                              `json:"value,omitempty"`
	Type         string                              `json:"type,omitempty"`
	Src          string                              `json:"src,omitempty"`
	Checked      bool                                `json:"checked,omitempty"`
	Disabled     bool                                `json:"disabled,omitempty"`
	ReadOnly     bool                                `json:"readonly,omitempty"`
	Required     bool                                `json:"required,omitempty"`
	Autocomplete bool                                `json:"autocomplete,omitempty"`
	Tabindex     int                                 `json:"tabindex,omitempty"`
	Multiple     bool                                `json:"multiple,omitempty"`
	Accept       string                              `json:"accept,omitempty"`
	Errors       map[string]HypermediaFormFieldError `json:"_errors,omitempty"`
	Rules        map[string]HypermediaFormFieldRule  `json:"_rules,omitempty"`
}

Defines properties for a field (HTML attributes) as well as holds the "_errors" and validation "_rules" for that field. "_rules" have key names that map to HypermediaFormField.Name, like { "fieldName": HypermediaFormFieldRule } and the rules themself are named. "_errors" have key names that also map to HypermediaFormField.Name

type HypermediaFormFieldError ¶

type HypermediaFormFieldError struct {
	Name    string `json:"name"`
	Failed  bool   `json:"name"`
	Message string `json:"message,omitempty"`
}

Error messages from validation failures (optional) "name" is the HypermediaFormFieldRule.Name in this case and "message" is returned on failure.

type HypermediaFormFieldRule ¶

type HypermediaFormFieldRule struct {
	Name        string                                         `json:"name"`
	Description string                                         `json:"description,omitempty"`
	Pattern     string                                         `json:"pattern"`
	Function    func(value string) (fail bool, message string) // not for JSON
}

Simple validation rules. Easily nested into "_rules" on "_fields" (optional). Of course front-end validation is merely convenience and not a trustable process. So remember to sanitize and validate any data on the server side of things. However, this does help tremendously in reducing the number of HTTP requests to the API.

type HypermediaLink ¶

type HypermediaLink struct {
	Href        string `json:"href,omitempty"`
	Type        string `json:"type,omitempty"`
	Deprecation string `json:"deprecation,omitempty"`
	Name        string `json:"name,omitempty"`
	Profile     string `json:"profile,omitempty"`
	Title       string `json:"title,omitempty"`
	Hreflang    string `json:"hreflang,omitempty"`
	Templated   bool   `json:"templated,omitempty"`
}

A simple web link structure (somewhat modeled after HAL's links and http://tools.ietf.org/html/rfc5988). NOTE: in HAL format, links can be an array with aliases - our format has no such support, but this doens't break HAL compatibility. Why not support it? Because that changes from {} to [] and changing data types is a burden for others. Plus we have HTTP 301/302. Also, each "_links" key name using this struct should be one of: http://www.iana.org/assignments/link-relations/link-relations.xhtml unless using CURIEs.

type HypermediaMeta ¶

type HypermediaMeta struct {
	Success      bool    `json:"success"`
	Message      string  `json:"message"`
	ResponseTime float32 `json:"responseTime,omitempty"`
	To           string  `json:"to,omitempty"`
	From         string  `json:"from,omitempty"`
	// contains filtered or unexported fields
}

The Meta structure provides some common information helpful to the application and also resource state.

type HypermediaResource ¶

type HypermediaResource struct {
	Meta     HypermediaMeta                `json:"_meta"`
	Links    map[string]HypermediaLink     `json:"_links,omitempty"`
	Curies   map[string]HypermediaCurie    `json:"_curies,omitempty"`
	Data     map[string]interface{}        `json:"_data,omitempty"`
	Embedded map[string]HypermediaResource `json:"_embedded,omitempty"`
	Forms    map[string]HypermediaForm     `json:"_forms,omitempty"`
}

A resource is the root level item being returned. It can contain embedded resources if necessary. It's possible to return more than one resource at a time too (though won't be common). Within each resource there is "_meta" data

func NewHypermediaResource ¶

func NewHypermediaResource() *HypermediaResource

Conveniently sets a few things up for a resource

func (*HypermediaResource) AddCurie ¶

func (h *HypermediaResource) AddCurie(name string, href string, templated bool)

func (*HypermediaResource) End ¶

func (h *HypermediaResource) End(message ...string) *HypermediaResource

Conveniently sets a few things before returning the resource and optionally allows a passed string to set HypermediaResource.Meta.Message

func (*HypermediaResource) Success ¶

func (h *HypermediaResource) Success()

Not necessary... But there may be some other functions that make sense...

type ServicesConfig ¶

type ServicesConfig struct {
	Twitter struct {
		ApiKey            string `json:"apiKey"`
		ApiSecret         string `json:"apiSecret"`
		AccessToken       string `json:"accessToken"`
		AccessTokenSecret string `json:"accessTokenSecret"`
	} `json:"twitter"`
	Facebook struct {
		AppId     string `json:"appId"`
		AppSecret string `json:"appSecret"`
		AppToken  string `json:"appToken"`
	} `json:"facebook"`
	Google struct {
		ServerKey string `json:"serverKey"`
	} `json:"google"`
	Instagram struct {
		ClientId     string `json:"clientId"`
		ClientSecret string `json:"clientSecret"`
	} `json:"instagram"`
	MapQuest struct {
		ApplicationKey string `json:"applicationKey"`
	} `json:"mapQuest"`
}

type Settings ¶

type Settings struct {
	Key      string    `json:"key" db:"key" bson:"key"`
	Value    string    `json:"value" db:"value" bson:"value"`
	Modified time.Time `json:"modified" db:"modified" bson:"modified"`
}

Optional settings table/collection holds Social Harvest configurations and configured dashboards for persistence and clustered servers it is more or less a key value store. Data is stored as JSON string. The Social Harvest config JSON string should easily map to the SocialHarvestConf struct. Other values could be for JavaScript on the front-end.

type SocialHarvest ¶

type SocialHarvest struct {
	Config   SocialHarvestConf
	Schedule *SocialHarvestSchedule
	Database *SocialHarvestDB
}

type SocialHarvestConf ¶

type SocialHarvestConf struct {
	HarvesterServer struct {
		Port int `json:"port"`
		Cors struct {
			AllowedOrigins []string `json:"allowedOrigins"`
		} `json:"cors"`
		AuthKeys []string `json:"authKeys"`
		Disabled bool     `json:"disabled"`
	} `json:"server"`
	ReporterServer struct {
		Port int `json:"port"`
		Cors struct {
			AllowedOrigins []string `json:"allowedOrigins"`
		} `json:"cors"`
		AuthKeys []string `json:"authKeys"`
		Disabled bool     `json:"disabled"`
	} `json:"reporterServer"`
	Database struct {
		Type          string `json:"type"`
		Host          string `json:"host"`
		Port          int    `json:"port"`
		Socket        string `json:"socket"`
		User          string `json:"user"`
		Password      string `json:"password"`
		Database      string `json:"database"`
		RetentionDays int    `json:"retentionDays"`
		PartitionDays int    `json:"partitionDays"`
	} `json:"database"`
	Schema struct {
		Compact bool `json:"compact"`
	} `json:"schema"`
	Logs struct {
		Directory string `json:"directory"`
	} `json:"logs"`
	Debug struct {
		WebProfile bool `json:"webProfile"`
		Bugsnag    struct {
			ApiKey       string `json:"apiKey"`
			ReleaseStage string `json:"releaseStage"`
		} `json:"bugsnag"`
	} `json:"debug"`
	Services ServicesConfig `json:"services"`
	Harvest  HarvestConfig  `json:"harvest"`
}

The configuration structure mapping from JSON

type SocialHarvestContributorGrowth ¶

type SocialHarvestContributorGrowth struct {
	Time      time.Time `json:"time" db:"time" bson:"time"`
	HarvestId string    `json:"harvest_id" db:"harvest_id" bson:"harvest_id"`
	Territory string    `json:"territory" db:"territory" bson:"territory"`
	Network   string    `json:"network" db:"network" bson:"network"`
	// We can look up additional contributor details (like name, location, website URL, etc.) via service API calls as needed. It doesn't change often.
	// So storing in the database would really be wasteful.
	ContributorId string `json:"contributor_id" db:"contributor_id" bson:"contributor_id"`

	// Facebook specific (mostly)
	Likes        int `json:"likes" db:"likes" bson:"likes"`
	TalkingAbout int `json:"talking_about" db:"talking_about" bson:"talking_about"`
	WereHere     int `json:"were_here" db:"were_here" bson:"were_here"`
	Checkins     int `json:"checkins" db:"checkins" bson:"checkins"`

	Views int `json:"views" db:"views" bson:"views"`

	// Twitter uses status updates, but Instagram uses "media" and YouTube channels use "videoCount" - this field is used for any count of (primary) content posted.
	StatusUpdates int `json:"status_udpates" db:"status_updates" bson:"status_updates"`

	// Twitter specific (mostly )
	Listed    int `json:"listed" db:"listed" bson:"listed"`
	Favorites int `json:"favorites" db:"favorites" bson:"favorites"`
	// Many social networks have the sense of followers/following (ie. Google+ calls it circledByCount for People, YouTube uses subscriberCount)
	Followers int `json:"followers" db:"followers" bson:"followers"`
	Following int `json:"following" db:"following" bson:"following"`

	// Google+ specific
	PlusOnes int `json:"plus_ones" db:"plus_ones" bson:"plus_ones"`

	// YouTube specific (though comment count seems like it'll appear elsewhere)
	Comments int `json:"comments" db:"comments" bson:"comments"`
}

Changes in growth and reach over time for a contributor. It would be interesting to track all of this for every contributor discovered, but API rate limits restrict us from doing that. So this will only track for accounts under the "accounts" section of the harvest configuration. NOTE: contributor details (like location, about, website url, etc.) can be obtained when necessary via the service's API on the front-end. A lot of that data changes. A note about convention: I'm not keeping "_count" on fields. It's superfluous. Assume these are all counts. The int type would make that obvious. Keep field names shorter. Also, field names can be used for multiple networks, ie. "followers" - so saying "followers_count" would be very Twitter specific and maybe therefore misleading.

type SocialHarvestDB ¶

type SocialHarvestDB struct {
	Postgres *sqlx.DB
	InfluxDB *influxdb.Client
	MonetDB  *sqlx.DB
	Series   []string
	Schema   struct {
		Compact bool `json:"compact"`
	}
	RetentionDays int
	PartitionDays int
}

func NewDatabase ¶

func NewDatabase(config SocialHarvestConf) *SocialHarvestDB

Initializes the database and returns the client, setting it to `database.Postgres` in the current package scope

func (*SocialHarvestDB) CreatePartitionTable ¶

func (database *SocialHarvestDB) CreatePartitionTable(table string) error

Creates a partition table in a Postgres database NOTE: If this fails to run ahead of time, we have a problem... Though checking on a trigger on every insert carries with it too much overhead. So I'm going to look into columnar store databases in hopes to find performance there. FDWs for Postgres perhaps and also MonetDB (which should be SQL compatible). Though I imagine partitioning will still be a really nice thing to have in the future. Come back to this... TODO: Look at this: https://github.com/keithf4/pg_partman ... probably should just use that.

func (*SocialHarvestDB) GetLastHarvestId ¶

func (database *SocialHarvestDB) GetLastHarvestId(territory string, network string, action string, value string) string

Gets the last harvest id for a given task, param, and network. TODO: Support InfluxDB

func (*SocialHarvestDB) GetLastHarvestTime ¶

func (database *SocialHarvestDB) GetLastHarvestTime(territory string, network string, action string, value string) time.Time

Gets the last harvest time for a given action, value, and network (NOTE: This doesn't necessarily need to have been set, it could be empty...check with time.IsZero()). TODO: Support InfluxDB

func (*SocialHarvestDB) HasAccess ¶

func (database *SocialHarvestDB) HasAccess() bool

Checks access to the database

func (*SocialHarvestDB) SaveSettings ¶

func (database *SocialHarvestDB) SaveSettings(settingsRow Settings)

Saves a settings key/value (Social Harvest config or dashboard settings, etc. - anything that needs configuration data can optionally store it using this function) TODO: Maybe just make this update the JSON file OR save to some sort of localstore so the settings don't go into the database where data is harvested

func (*SocialHarvestDB) SetLastHarvestTime ¶

func (database *SocialHarvestDB) SetLastHarvestTime(territory string, network string, action string, value string, lastTimeHarvested time.Time, lastIdHarvested string, itemsHarvested int)

Sets the last harvest time for a given action, value, network set. For example: "facebook" "publicPostsByKeyword" "searchKeyword" 1402260944 We can use the time to pass to future searches, in Facebook's case, an "until" param that tells Facebook to not give us anything before the last harvest date...assuming we already have it for that particular search query. Multiple params separated by colon.

func (*SocialHarvestDB) StoreRow ¶

func (database *SocialHarvestDB) StoreRow(row interface{})

Stores a harvested row of data into the configured database.

type SocialHarvestHarvest ¶

type SocialHarvestHarvest struct {
	Territory         string    `json:"territory" db:"territory" bson:"territory"`
	Network           string    `json:"network" db:"network" bson:"network"`
	Action            string    `json:"action" db:"action" bson:"action"`
	Value             string    `json:"value" db:"value" bson:"value"`
	LastTimeHarvested time.Time `json:"last_time_harvested" db:"last_time_harvested" bson:"last_time_harvested"`
	LastIdHarvested   string    `json:"last_id_harvested" db:"last_id_harvested" bson:"last_id_harvested"`
	ItemsHarvested    int       `json:"items_harvested" db:"items_harvested" bson:"items_harvested"`
	HarvestTime       time.Time `json:"harvest_time" db:"harvest_time" bson:"harvest_time"`
}

Used for efficiently harvesting (help avoid gathering duplicate data), running through paginated results from APIs, as well as information about harvester performance.

type SocialHarvestHashtag ¶

type SocialHarvestHashtag struct {
	Time      time.Time `json:"time" db:"time" bson:"time"`
	HarvestId string    `json:"harvest_id" db:"harvest_id" bson:"harvest_id"`
	Territory string    `json:"territory" db:"territory" bson:"territory"`
	Network   string    `json:"network" db:"network" bson:"network"`
	MessageId string    `json:"message_id" db:"message_id" bson:"message_id"`

	Tag     string `json:"tag" db:"tag" bson:"tag"`
	Keyword string `json:"keyword" db:"keyword" bson:"keyword"`

	// Much of this becomes redundant if using a JOIN, but we want to stay flexible (a little more data stored for a lot more performance and options)
	ContributorId             string  `json:"contributor_id" db:"contributor_id" bson:"contributor_id"`
	ContributorScreenName     string  `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"`
	ContributorName           string  `json:"contributor_name" db:"contributor_name" bson:"contributor_name"`
	ContributorGender         int     `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"`
	ContributorType           string  `json:"contributor_type" db:"contributor_type" bson:"contributor_type"`
	ContributorLongitude      float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"`
	ContributorLatitude       float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"`
	ContributorGeohash        string  `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"`
	ContributorLang           string  `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"`
	ContributorCountry        string  `json:"contributor_country" db:"contributor_country" bson:"contributor_country"`
	ContributorCity           string  `json:"contributor_city" db:"contributor_city" bson:"contributor_city"`
	ContributorCityPopulation int32   `json:"contributor_city_pop" db:"contributor_city_pop" bson:"contributor_city_pop"`
	ContributorRegion         string  `json:"contributor_region" db:"contributor_region" bson:"contributor_region"`
}

Hashtags are not quite Twitter specific, they're still used all over. Other networks have their own convention too (and their APIs return the tags). So this is all "tags" really, but it's called hashtags (in part to avoid any confusion with a generic "tags" term). To be less confusing, there is a "keyword" field where extracted keywords can be stored. Only a few will be taken per message and stop words will be ignored. These keywords could assist people in creating new, actual, hashtags to use in their social media marketing. Of course this series also holds (without an association) the contributor's details too. So we can determine popular keywords (and hashtags) by geolocation, gender, etc. This series will likely be joined to messages. Though this series can be analyzed by itself too.

type SocialHarvestMention ¶

type SocialHarvestMention struct {
	Time      time.Time `json:"time" db:"time" bson:"time"`
	HarvestId string    `json:"harvest_id" db:"harvest_id" bson:"harvest_id"`
	Territory string    `json:"territory" db:"territory" bson:"territory"`
	Network   string    `json:"network" db:"network" bson:"network"`
	MessageId string    `json:"message_id" db:"message_id" bson:"message_id"`

	ContributorId         string  `json:"contributor_id" db:"contributor_id" bson:"contributor_id"`
	ContributorScreenName string  `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"`
	ContributorName       string  `json:"contributor_name" db:"contributor_name" bson:"contributor_name"`
	ContributorGender     int     `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"`
	ContributorType       string  `json:"contributor_type" db:"contributor_type" bson:"contributor_type"`
	ContributorLongitude  float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"`
	ContributorLatitude   float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"`
	ContributorGeohash    string  `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"`
	ContributorLang       string  `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"`

	MentionedId         string  `json:"mentioned_id" db:"mentioned_id" bson:"mentioned_id"`
	MentionedScreenName string  `json:"mentioned_screen_name" db:"mentioned_screen_name" bson:"mentioned_screen_name"`
	MentionedName       string  `json:"mentioned_name" db:"mentioned_name" bson:"mentioned_name"`
	MentionedGender     int     `json:"mentioned_gender" db:"mentioned_gender" bson:"mentioned_gender"`
	MentionedType       string  `json:"mentioned_type" db:"mentioned_type" bson:"mentioned_type"`
	MentionedLongitude  float64 `json:"mentioned_longitude" db:"mentioned_longitude" bson:"mentioned_longitude"`
	MentionedLatitude   float64 `json:"mentioned_latitude" db:"mentioned_latitude" bson:"mentioned_latitude"`
	MentionedGeohash    string  `json:"mentioned_geohash" db:"mentioned_geohash" bson:"mentioned_geohash"`
	MentionedLang       string  `json:"mentioned_lang" db:"mentioned_lang" bson:"mentioned_lang"`
}

When contributors mention other contributors (and from where - useful for tracking customer base for example). This series tells a good story visually (hopefully on a map). Note: "Type" is directly applicable to Facebook (users vs pages), but we can expand upon this (we have a network value too). So things like "business" or "product" can be added. This would be helpful if a user wanted to filter for any companies being mentioned on Twitter for example. Despite Twitter not having a "type" ... This would require a special process on the data of course, but that's ok. It's set to do that now. We can expand upon it from there. A case could be made for even more fields here, but this is ok for now. Yes there is repeated information that doesn't change (like gender, etc.) but that's also ok. It may require more storage in the database, but it makes for a more efficient query.

type SocialHarvestMessage ¶

type SocialHarvestMessage struct {
	Time      time.Time `json:"time" db:"time" bson:"time"`
	HarvestId string    `json:"harvest_id" db:"harvest_id" bson:"harvest_id"`
	Territory string    `json:"territory" db:"territory" bson:"territory"`
	Network   string    `json:"network" db:"network" bson:"network"`
	MessageId string    `json:"message_id" db:"message_id" bson:"message_id"`
	// contributor information (some transient information, we take note at the time of the message - can help with a contributor's influence at the time of message - or we can track how certain messages helped a contributor gain influence - OR we can say only show me messages from contributors who have X followers, etc.)
	ContributorId             string  `json:"contributor_id" db:"contributor_id" bson:"contributor_id"`
	ContributorScreenName     string  `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"`
	ContributorName           string  `json:"contributor_name" db:"contributor_name" bson:"contributor_name"`
	ContributorGender         int     `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"`
	ContributorType           string  `json:"contributor_type" db:"contributor_type" bson:"contributor_type"`
	ContributorLongitude      float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"`
	ContributorLatitude       float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"`
	ContributorGeohash        string  `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"`
	ContributorLang           string  `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"`
	ContributorCountry        string  `json:"contributor_country" db:"contributor_country" bson:"contributor_country"`
	ContributorCity           string  `json:"contributor_city" db:"contributor_city" bson:"contributor_city"`
	ContributorCityPopulation int32   `json:"contributor_city_pop" db:"contributor_city_pop" bson:"contributor_city_pop"`
	ContributorRegion         string  `json:"contributor_region" db:"contributor_region" bson:"contributor_region"`

	// Data that changes, think about the value of having it...maybe remove it... API calls can always be made to get this current info.
	// But this kinda gives a user an idea for influencers (at the harvest time at least). So while it's definitely dated...It could be used as a
	// decent filter, ie. only show users who have over a million followers, etc.
	ContributorLikes         int `json:"contributor_likes" db:"contributor_likes" bson:"contributor_likes"`
	ContributorStatusesCount int `json:"contributor_statuses_count" db:"contributor_statuses_count" bson:"contributor_statuses_count"`
	ContributorListedCount   int `json:"contributor_listed_count" db:"contributor_listed_count" bson:"contributor_listed_count"`
	ContributorFollowers     int `json:"contributor_followers" db:"contributor_followers" bson:"contributor_followers"`
	// This value is technically stateful, but can be treated as stateless because it doesn't really get revoked and change back...
	ContributorVerified int `json:"contributor_verified" db:"contributor_verified" bson:"contributor_verified"` // Twitter for sure, but I think other networks too?

	Message    string `json:"message" db:"message" bson:"message"`
	IsQuestion int    `json:"is_question" db:"is_question" bson:"is_question"`
	Category   string `json:"category" db:"category" bson:"category"`
	// Note these values are at the time of harvest. it may be confusing enough to not need these values stored...but how long can we track each message? API rate limits...
	// TODO: Maybe remove these? (think on it) also these technically don't need prefixes because we have the "network" field.
	FacebookShares       int `json:"facebook_shares" db:"facebook_shares" bson:"facebook_shares"`
	TwitterRetweetCount  int `json:"twitter_retweet_count" db:"twitter_retweet_count" bson:"twitter_retweet_count"`
	TwitterFavoriteCount int `json:"twitter_favorite_count" db:"twitter_favorite_count" bson:"twitter_favorite_count"`
	// Instagram (and I suppose Facebook if possible)
	LikeCount int `json:"like_count" db:"like_count" bson:"like_count"`
	// Google+
	GooglePlusReshares int64 `json:"google_plus_reshares" db:"google_plus_reshares" bson:"google_plus_reshares"`
	GooglePlusOnes     int64 `json:"google_plus_ones" db:"google_plus_ones" bson:"google_plus_ones"`
}

Posts, status updates, comments, etc.

type SocialHarvestReport ¶

type SocialHarvestReport struct {
}

Social Harvest reports are generated for several reasons and is designed specifically for the Social Harvest Dashboard tool. 1: Performance

reports contain aggregate data, real-time queries over the potential amount of data would be silly (slow UX/dashboard)
the dashboard's widgets can pretty much all share from the same JSON response from the API now

2: Storage

reports are smaller and raw data can be removed reduce database size and lowering hosting cost

3: Consistency

this makes the available data to the front-end dashboard reliable and well defined (we know what we'll have)

Reports are always on an hourly basis. However, the current hour (partial) can be queried as if it was already built as a report. This ensures the dashboard shows data up to the minute. If a higher resolution is desired (ie. aggregate data by the minute), then custom tools would need to be built. This is of course possible since Social Harvest has all that data...But it is outside the design of the Social Harvest dashboard and requires custom code. However, the dashboard lets users look through messages. So we still need to keep (at least some of) that around.

type SocialHarvestSchedule ¶

type SocialHarvestSchedule struct {
	Cron *cron.Cron
}

func NewSchedule ¶

func NewSchedule(config SocialHarvestConf) *SocialHarvestSchedule

Set up the schedule so it is accessible by others and start it

type SocialHarvestSharedLink ¶

type SocialHarvestSharedLink struct {
	Time                      time.Time `json:"time" db:"time" bson:"time"`
	HarvestId                 string    `json:"harvest_id" db:"harvest_id" bson:"harvest_id"`
	Territory                 string    `json:"territory" db:"territory" bson:"territory"`
	Network                   string    `json:"network" db:"network" bson:"network"`
	MessageId                 string    `json:"message_id" db:"message_id" bson:"message_id"`
	ContributorId             string    `json:"contributor_id" db:"contributor_id" bson:"contributor_id"`
	ContributorScreenName     string    `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"`
	ContributorName           string    `json:"contributor_name" db:"contributor_name" bson:"contributor_name"`
	ContributorGender         int       `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"`
	ContributorType           string    `json:"contributor_type" db:"contributor_type" bson:"contributor_type"`
	ContributorLongitude      float64   `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"`
	ContributorLatitude       float64   `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"`
	ContributorGeohash        string    `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"`
	ContributorLang           string    `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"`
	ContributorCountry        string    `json:"contributor_country" db:"contributor_country" bson:"contributor_country"`
	ContributorCity           string    `json:"contributor_city" db:"contributor_city" bson:"contributor_city"`
	ContributorCityPopulation int32     `json:"contributor_city_pop" db:"contributor_city_pop" bson:"contributor_city_pop"`
	ContributorRegion         string    `json:"contributor_region" db:"contributor_region" bson:"contributor_region"`
	Type                      string    `json:"type" db:"type" bson:"type"`
	Preview                   string    `json:"preview" db:"preview" bson:"preview"`
	Source                    string    `json:"source" db:"source" bson:"source"`
	Url                       string    `json:"url" db:"url" bson:"url"`
	ExpandedUrl               string    `json:"expanded_url" db:"expanded_url" bson:"expanded_url"`
	Host                      string    `json:"host" db:"host" bson:"host"`
}

Shared URLs. The "type" will tell us if it's media (video, photo, etc.) or HTML. It's more about content type. Not necessarily "blog" or something. TODO: Possibly scrape those pages to get extra information to get semantic data being discussed/shared for a particular territory. This would enrich things like "type" ...

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL