Documentation ¶
Index ¶
- Variables
- func AddToSchedule()
- func CheckDataDir()
- func CopyTrainingData()
- func ListSchedule()
- func MakeInfluxRow(row interface{}, fields []string) [][]interface{}
- func MakeSequenceHash(hash string) uint64
- func SaveConfig(c SocialHarvestConf, f ...string) bool
- type Harvest
- type HarvestConfig
- type HarvestState
- type HypermediaCurie
- type HypermediaForm
- type HypermediaFormField
- type HypermediaFormFieldError
- type HypermediaFormFieldRule
- type HypermediaLink
- type HypermediaMeta
- type HypermediaResource
- type ServicesConfig
- type Settings
- type SocialHarvest
- type SocialHarvestConf
- type SocialHarvestContributorGrowth
- type SocialHarvestDB
- func (database *SocialHarvestDB) CreatePartitionTable(table string) error
- func (database *SocialHarvestDB) GetLastHarvestId(territory string, network string, action string, value string) string
- func (database *SocialHarvestDB) GetLastHarvestTime(territory string, network string, action string, value string) time.Time
- func (database *SocialHarvestDB) HasAccess() bool
- func (database *SocialHarvestDB) SaveSettings(settingsRow Settings)
- func (database *SocialHarvestDB) SetLastHarvestTime(territory string, network string, action string, value string, ...)
- func (database *SocialHarvestDB) StoreRow(row interface{})
- type SocialHarvestHarvest
- type SocialHarvestHashtag
- type SocialHarvestMention
- type SocialHarvestMessage
- type SocialHarvestReport
- type SocialHarvestSchedule
- type SocialHarvestSharedLink
Constants ¶
This section is empty.
Variables ¶
var SeriesCollections = map[string]string{
"SocialHarvestMessage": "messages",
"SocialHarvestSharedLink": "shared_links",
"SocialHarvestMention": "mentions",
"SocialHarvestHashtag": "hashtags",
"SocialHarvestContributorGrowth": "contributor_growth",
"SocialHarvestHarvest": "harvest",
"SocialHarvestReport": "reports",
}
Where to store this stuff (log file, collection, and table names)
Functions ¶
func AddToSchedule ¶
func AddToSchedule()
func CheckDataDir ¶
func CheckDataDir()
Checks to ensure the data directory exists and is writable. It will be created if not. Config and training data go into this directory.
func CopyTrainingData ¶
func CopyTrainingData()
Copies default or configured training data to `sh-data` if it isn't there already.
func ListSchedule ¶
func ListSchedule()
func MakeInfluxRow ¶
func MakeInfluxRow(row interface{}, fields []string) [][]interface{}
Returns data in a series of points for use with InfluxDB, optionally filtering which fields end up in the series.
func MakeSequenceHash ¶
For InfluxDB. This hash (crc64 checksum) should not easily repeat itself with the time field. Time is to the second in most cases, hashing the message id (id_str for Twitter and Facebook's Id values are strings) should avoid dupes just in case a message is processed twice.
func SaveConfig ¶
func SaveConfig(c SocialHarvestConf, f ...string) bool
Saves the current configuration to the shared data directory on disk as to not overwrite the original. Unless removed, this will be used should the application be restarted (overwriting the default config).
Types ¶
type HarvestConfig ¶
type HarvestConfig struct { QuestionRegex string `json:"questionRegex"` Territories []struct { Services ServicesConfig `json:"-"` Name string `json:"name"` Content struct { Options struct { KeepMessage bool `json:"keepMessage"` Lang string `json:"lang"` TwitterGeocode string `json:"twitterGeocode"` OnlyUseInstagramTags bool `json:"onlyUseInstagramTags"` } `json:"options"` Keywords []string `json:"keywords"` Urls []string `json:"urls"` InstagramTags []string `json:"instagramTags"` } `json:"content"` Accounts struct { Twitter []string `json:"twitter"` Facebook []string `json:"facebook"` GooglePlus []string `json:"googlePlus"` YouTube []string `json:"youTube"` Instagram []string `json:"instagram"` } `json:"accounts"` Schedule struct { Everything struct { Content string `json:"content"` Accounts string `json:"accounts"` Streams string `json:"streams"` } `json:"everything"` Twitter struct { Content string `json:"content"` Accounts string `json:"accounts"` Streams string `json:"streams"` } `json:"twitter"` Facebook struct { Content string `json:"content"` Accounts string `json:"accounts"` Streams string `json:"streams"` } `json:"facebook"` GooglePlus struct { Content string `json:"content"` Accounts string `json:"accounts"` } `json:"googlePlus"` YouTube struct { Content string `json:"content"` Accounts string `json:"accounts"` Streams string `json:"streams"` } `json:"youTube"` } `json:"schedule"` Limits struct { MaxResultsPages int `json:"maxResultsPages"` ResultsPerPage string `json:"resultsPerPage"` } `json:"limits"` } `json:"territories"` }
type HarvestState ¶
type HypermediaCurie ¶
type HypermediaCurie struct { Name string `json:"name,omitempty"` Href string `json:"href,omitempty"` Templated bool `json:"templated,omitempty"` }
Defines a CURIE
type HypermediaForm ¶
type HypermediaForm struct { Name string `json:"name,omitempty"` Method string `json:"method,omitempty"` Enctype string `json:"enctype"` AcceptCharset string `json:"accept-charset,omitempty"` Target string `json:"target,omitempty"` Action string `json:"action,omitempty"` Autocomplete bool `json:"autocomplete,omitempty"` Fields map[string]HypermediaFormField `json:"_fields,omitempty"` }
Form structure defines attributes that match HTML. This tells applications how to work with resources in order to manipulate state. Any attribute not found in HTML should be prefixed with an underscore (for example, "_fields").
type HypermediaFormField ¶
type HypermediaFormField struct { Name string `json:"name,omitempty"` Value string `json:"value,omitempty"` Type string `json:"type,omitempty"` Src string `json:"src,omitempty"` Checked bool `json:"checked,omitempty"` Disabled bool `json:"disabled,omitempty"` ReadOnly bool `json:"readonly,omitempty"` Required bool `json:"required,omitempty"` Autocomplete bool `json:"autocomplete,omitempty"` Tabindex int `json:"tabindex,omitempty"` Multiple bool `json:"multiple,omitempty"` Accept string `json:"accept,omitempty"` Errors map[string]HypermediaFormFieldError `json:"_errors,omitempty"` Rules map[string]HypermediaFormFieldRule `json:"_rules,omitempty"` }
Defines properties for a field (HTML attributes) as well as holds the "_errors" and validation "_rules" for that field. "_rules" have key names that map to HypermediaFormField.Name, like { "fieldName": HypermediaFormFieldRule } and the rules themself are named. "_errors" have key names that also map to HypermediaFormField.Name
type HypermediaFormFieldError ¶
type HypermediaFormFieldError struct { Name string `json:"name"` Failed bool `json:"name"` Message string `json:"message,omitempty"` }
Error messages from validation failures (optional) "name" is the HypermediaFormFieldRule.Name in this case and "message" is returned on failure.
type HypermediaFormFieldRule ¶
type HypermediaFormFieldRule struct { Name string `json:"name"` Description string `json:"description,omitempty"` Pattern string `json:"pattern"` Function func(value string) (fail bool, message string) // not for JSON }
Simple validation rules. Easily nested into "_rules" on "_fields" (optional). Of course front-end validation is merely convenience and not a trustable process. So remember to sanitize and validate any data on the server side of things. However, this does help tremendously in reducing the number of HTTP requests to the API.
type HypermediaLink ¶
type HypermediaLink struct { Href string `json:"href,omitempty"` Type string `json:"type,omitempty"` Deprecation string `json:"deprecation,omitempty"` Name string `json:"name,omitempty"` Profile string `json:"profile,omitempty"` Title string `json:"title,omitempty"` Hreflang string `json:"hreflang,omitempty"` Templated bool `json:"templated,omitempty"` }
A simple web link structure (somewhat modeled after HAL's links and http://tools.ietf.org/html/rfc5988). NOTE: in HAL format, links can be an array with aliases - our format has no such support, but this doens't break HAL compatibility. Why not support it? Because that changes from {} to [] and changing data types is a burden for others. Plus we have HTTP 301/302. Also, each "_links" key name using this struct should be one of: http://www.iana.org/assignments/link-relations/link-relations.xhtml unless using CURIEs.
type HypermediaMeta ¶
type HypermediaMeta struct { Success bool `json:"success"` Message string `json:"message"` ResponseTime float32 `json:"responseTime,omitempty"` To string `json:"to,omitempty"` From string `json:"from,omitempty"` // contains filtered or unexported fields }
The Meta structure provides some common information helpful to the application and also resource state.
type HypermediaResource ¶
type HypermediaResource struct { Meta HypermediaMeta `json:"_meta"` Links map[string]HypermediaLink `json:"_links,omitempty"` Curies map[string]HypermediaCurie `json:"_curies,omitempty"` Data map[string]interface{} `json:"_data,omitempty"` Embedded map[string]HypermediaResource `json:"_embedded,omitempty"` Forms map[string]HypermediaForm `json:"_forms,omitempty"` }
A resource is the root level item being returned. It can contain embedded resources if necessary. It's possible to return more than one resource at a time too (though won't be common). Within each resource there is "_meta" data
func NewHypermediaResource ¶
func NewHypermediaResource() *HypermediaResource
Conveniently sets a few things up for a resource
func (*HypermediaResource) AddCurie ¶
func (h *HypermediaResource) AddCurie(name string, href string, templated bool)
func (*HypermediaResource) End ¶
func (h *HypermediaResource) End(message ...string) *HypermediaResource
Conveniently sets a few things before returning the resource and optionally allows a passed string to set HypermediaResource.Meta.Message
func (*HypermediaResource) Success ¶
func (h *HypermediaResource) Success()
Not necessary... But there may be some other functions that make sense...
type ServicesConfig ¶
type ServicesConfig struct { Twitter struct { ApiKey string `json:"apiKey"` ApiSecret string `json:"apiSecret"` AccessToken string `json:"accessToken"` AccessTokenSecret string `json:"accessTokenSecret"` } `json:"twitter"` Facebook struct { AppId string `json:"appId"` AppSecret string `json:"appSecret"` AppToken string `json:"appToken"` } `json:"facebook"` Google struct { ServerKey string `json:"serverKey"` } `json:"google"` Instagram struct { ClientId string `json:"clientId"` ClientSecret string `json:"clientSecret"` } `json:"instagram"` MapQuest struct { ApplicationKey string `json:"applicationKey"` } `json:"mapQuest"` }
type Settings ¶
type Settings struct { Key string `json:"key" db:"key" bson:"key"` Value string `json:"value" db:"value" bson:"value"` Modified time.Time `json:"modified" db:"modified" bson:"modified"` }
Optional settings table/collection holds Social Harvest configurations and configured dashboards for persistence and clustered servers it is more or less a key value store. Data is stored as JSON string. The Social Harvest config JSON string should easily map to the SocialHarvestConf struct. Other values could be for JavaScript on the front-end.
type SocialHarvest ¶
type SocialHarvest struct { Config SocialHarvestConf Schedule *SocialHarvestSchedule Database *SocialHarvestDB }
type SocialHarvestConf ¶
type SocialHarvestConf struct { HarvesterServer struct { Port int `json:"port"` Cors struct { AllowedOrigins []string `json:"allowedOrigins"` } `json:"cors"` AuthKeys []string `json:"authKeys"` Disabled bool `json:"disabled"` } `json:"server"` ReporterServer struct { Port int `json:"port"` Cors struct { AllowedOrigins []string `json:"allowedOrigins"` } `json:"cors"` AuthKeys []string `json:"authKeys"` Disabled bool `json:"disabled"` } `json:"reporterServer"` Database struct { Type string `json:"type"` Host string `json:"host"` Port int `json:"port"` Socket string `json:"socket"` User string `json:"user"` Password string `json:"password"` Database string `json:"database"` RetentionDays int `json:"retentionDays"` PartitionDays int `json:"partitionDays"` } `json:"database"` Schema struct { Compact bool `json:"compact"` } `json:"schema"` Logs struct { Directory string `json:"directory"` } `json:"logs"` Debug struct { WebProfile bool `json:"webProfile"` Bugsnag struct { ApiKey string `json:"apiKey"` ReleaseStage string `json:"releaseStage"` } `json:"bugsnag"` } `json:"debug"` Services ServicesConfig `json:"services"` Harvest HarvestConfig `json:"harvest"` }
The configuration structure mapping from JSON
type SocialHarvestContributorGrowth ¶
type SocialHarvestContributorGrowth struct { Time time.Time `json:"time" db:"time" bson:"time"` HarvestId string `json:"harvest_id" db:"harvest_id" bson:"harvest_id"` Territory string `json:"territory" db:"territory" bson:"territory"` Network string `json:"network" db:"network" bson:"network"` // We can look up additional contributor details (like name, location, website URL, etc.) via service API calls as needed. It doesn't change often. // So storing in the database would really be wasteful. ContributorId string `json:"contributor_id" db:"contributor_id" bson:"contributor_id"` // Facebook specific (mostly) Likes int `json:"likes" db:"likes" bson:"likes"` TalkingAbout int `json:"talking_about" db:"talking_about" bson:"talking_about"` WereHere int `json:"were_here" db:"were_here" bson:"were_here"` Checkins int `json:"checkins" db:"checkins" bson:"checkins"` Views int `json:"views" db:"views" bson:"views"` // Twitter uses status updates, but Instagram uses "media" and YouTube channels use "videoCount" - this field is used for any count of (primary) content posted. StatusUpdates int `json:"status_udpates" db:"status_updates" bson:"status_updates"` // Twitter specific (mostly ) Listed int `json:"listed" db:"listed" bson:"listed"` Favorites int `json:"favorites" db:"favorites" bson:"favorites"` // Many social networks have the sense of followers/following (ie. Google+ calls it circledByCount for People, YouTube uses subscriberCount) Followers int `json:"followers" db:"followers" bson:"followers"` Following int `json:"following" db:"following" bson:"following"` // Google+ specific PlusOnes int `json:"plus_ones" db:"plus_ones" bson:"plus_ones"` // YouTube specific (though comment count seems like it'll appear elsewhere) Comments int `json:"comments" db:"comments" bson:"comments"` }
Changes in growth and reach over time for a contributor. It would be interesting to track all of this for every contributor discovered, but API rate limits restrict us from doing that. So this will only track for accounts under the "accounts" section of the harvest configuration. NOTE: contributor details (like location, about, website url, etc.) can be obtained when necessary via the service's API on the front-end. A lot of that data changes. A note about convention: I'm not keeping "_count" on fields. It's superfluous. Assume these are all counts. The int type would make that obvious. Keep field names shorter. Also, field names can be used for multiple networks, ie. "followers" - so saying "followers_count" would be very Twitter specific and maybe therefore misleading.
type SocialHarvestDB ¶
type SocialHarvestDB struct { Postgres *sqlx.DB InfluxDB *influxdb.Client MonetDB *sqlx.DB Series []string Schema struct { Compact bool `json:"compact"` } RetentionDays int PartitionDays int }
func NewDatabase ¶
func NewDatabase(config SocialHarvestConf) *SocialHarvestDB
Initializes the database and returns the client, setting it to `database.Postgres` in the current package scope
func (*SocialHarvestDB) CreatePartitionTable ¶
func (database *SocialHarvestDB) CreatePartitionTable(table string) error
Creates a partition table in a Postgres database NOTE: If this fails to run ahead of time, we have a problem... Though checking on a trigger on every insert carries with it too much overhead. So I'm going to look into columnar store databases in hopes to find performance there. FDWs for Postgres perhaps and also MonetDB (which should be SQL compatible). Though I imagine partitioning will still be a really nice thing to have in the future. Come back to this... TODO: Look at this: https://github.com/keithf4/pg_partman ... probably should just use that.
func (*SocialHarvestDB) GetLastHarvestId ¶
func (database *SocialHarvestDB) GetLastHarvestId(territory string, network string, action string, value string) string
Gets the last harvest id for a given task, param, and network. TODO: Support InfluxDB
func (*SocialHarvestDB) GetLastHarvestTime ¶
func (database *SocialHarvestDB) GetLastHarvestTime(territory string, network string, action string, value string) time.Time
Gets the last harvest time for a given action, value, and network (NOTE: This doesn't necessarily need to have been set, it could be empty...check with time.IsZero()). TODO: Support InfluxDB
func (*SocialHarvestDB) HasAccess ¶
func (database *SocialHarvestDB) HasAccess() bool
Checks access to the database
func (*SocialHarvestDB) SaveSettings ¶
func (database *SocialHarvestDB) SaveSettings(settingsRow Settings)
Saves a settings key/value (Social Harvest config or dashboard settings, etc. - anything that needs configuration data can optionally store it using this function) TODO: Maybe just make this update the JSON file OR save to some sort of localstore so the settings don't go into the database where data is harvested
func (*SocialHarvestDB) SetLastHarvestTime ¶
func (database *SocialHarvestDB) SetLastHarvestTime(territory string, network string, action string, value string, lastTimeHarvested time.Time, lastIdHarvested string, itemsHarvested int)
Sets the last harvest time for a given action, value, network set. For example: "facebook" "publicPostsByKeyword" "searchKeyword" 1402260944 We can use the time to pass to future searches, in Facebook's case, an "until" param that tells Facebook to not give us anything before the last harvest date...assuming we already have it for that particular search query. Multiple params separated by colon.
func (*SocialHarvestDB) StoreRow ¶
func (database *SocialHarvestDB) StoreRow(row interface{})
Stores a harvested row of data into the configured database.
type SocialHarvestHarvest ¶
type SocialHarvestHarvest struct { Territory string `json:"territory" db:"territory" bson:"territory"` Network string `json:"network" db:"network" bson:"network"` Action string `json:"action" db:"action" bson:"action"` Value string `json:"value" db:"value" bson:"value"` LastTimeHarvested time.Time `json:"last_time_harvested" db:"last_time_harvested" bson:"last_time_harvested"` LastIdHarvested string `json:"last_id_harvested" db:"last_id_harvested" bson:"last_id_harvested"` ItemsHarvested int `json:"items_harvested" db:"items_harvested" bson:"items_harvested"` HarvestTime time.Time `json:"harvest_time" db:"harvest_time" bson:"harvest_time"` }
Used for efficiently harvesting (help avoid gathering duplicate data), running through paginated results from APIs, as well as information about harvester performance.
type SocialHarvestHashtag ¶
type SocialHarvestHashtag struct { Time time.Time `json:"time" db:"time" bson:"time"` HarvestId string `json:"harvest_id" db:"harvest_id" bson:"harvest_id"` Territory string `json:"territory" db:"territory" bson:"territory"` Network string `json:"network" db:"network" bson:"network"` MessageId string `json:"message_id" db:"message_id" bson:"message_id"` Tag string `json:"tag" db:"tag" bson:"tag"` Keyword string `json:"keyword" db:"keyword" bson:"keyword"` // Much of this becomes redundant if using a JOIN, but we want to stay flexible (a little more data stored for a lot more performance and options) ContributorId string `json:"contributor_id" db:"contributor_id" bson:"contributor_id"` ContributorScreenName string `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"` ContributorName string `json:"contributor_name" db:"contributor_name" bson:"contributor_name"` ContributorGender int `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"` ContributorType string `json:"contributor_type" db:"contributor_type" bson:"contributor_type"` ContributorLongitude float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"` ContributorLatitude float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"` ContributorGeohash string `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"` ContributorLang string `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"` ContributorCountry string `json:"contributor_country" db:"contributor_country" bson:"contributor_country"` ContributorCity string `json:"contributor_city" db:"contributor_city" bson:"contributor_city"` ContributorCityPopulation int32 `json:"contributor_city_pop" db:"contributor_city_pop" bson:"contributor_city_pop"` ContributorRegion string `json:"contributor_region" db:"contributor_region" bson:"contributor_region"` }
Hashtags are not quite Twitter specific, they're still used all over. Other networks have their own convention too (and their APIs return the tags). So this is all "tags" really, but it's called hashtags (in part to avoid any confusion with a generic "tags" term). To be less confusing, there is a "keyword" field where extracted keywords can be stored. Only a few will be taken per message and stop words will be ignored. These keywords could assist people in creating new, actual, hashtags to use in their social media marketing. Of course this series also holds (without an association) the contributor's details too. So we can determine popular keywords (and hashtags) by geolocation, gender, etc. This series will likely be joined to messages. Though this series can be analyzed by itself too.
type SocialHarvestMention ¶
type SocialHarvestMention struct { Time time.Time `json:"time" db:"time" bson:"time"` HarvestId string `json:"harvest_id" db:"harvest_id" bson:"harvest_id"` Territory string `json:"territory" db:"territory" bson:"territory"` Network string `json:"network" db:"network" bson:"network"` MessageId string `json:"message_id" db:"message_id" bson:"message_id"` ContributorId string `json:"contributor_id" db:"contributor_id" bson:"contributor_id"` ContributorScreenName string `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"` ContributorName string `json:"contributor_name" db:"contributor_name" bson:"contributor_name"` ContributorGender int `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"` ContributorType string `json:"contributor_type" db:"contributor_type" bson:"contributor_type"` ContributorLongitude float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"` ContributorLatitude float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"` ContributorGeohash string `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"` ContributorLang string `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"` MentionedId string `json:"mentioned_id" db:"mentioned_id" bson:"mentioned_id"` MentionedScreenName string `json:"mentioned_screen_name" db:"mentioned_screen_name" bson:"mentioned_screen_name"` MentionedName string `json:"mentioned_name" db:"mentioned_name" bson:"mentioned_name"` MentionedGender int `json:"mentioned_gender" db:"mentioned_gender" bson:"mentioned_gender"` MentionedType string `json:"mentioned_type" db:"mentioned_type" bson:"mentioned_type"` MentionedLongitude float64 `json:"mentioned_longitude" db:"mentioned_longitude" bson:"mentioned_longitude"` MentionedLatitude float64 `json:"mentioned_latitude" db:"mentioned_latitude" bson:"mentioned_latitude"` MentionedGeohash string `json:"mentioned_geohash" db:"mentioned_geohash" bson:"mentioned_geohash"` MentionedLang string `json:"mentioned_lang" db:"mentioned_lang" bson:"mentioned_lang"` }
When contributors mention other contributors (and from where - useful for tracking customer base for example). This series tells a good story visually (hopefully on a map). Note: "Type" is directly applicable to Facebook (users vs pages), but we can expand upon this (we have a network value too). So things like "business" or "product" can be added. This would be helpful if a user wanted to filter for any companies being mentioned on Twitter for example. Despite Twitter not having a "type" ... This would require a special process on the data of course, but that's ok. It's set to do that now. We can expand upon it from there. A case could be made for even more fields here, but this is ok for now. Yes there is repeated information that doesn't change (like gender, etc.) but that's also ok. It may require more storage in the database, but it makes for a more efficient query.
type SocialHarvestMessage ¶
type SocialHarvestMessage struct { Time time.Time `json:"time" db:"time" bson:"time"` HarvestId string `json:"harvest_id" db:"harvest_id" bson:"harvest_id"` Territory string `json:"territory" db:"territory" bson:"territory"` Network string `json:"network" db:"network" bson:"network"` MessageId string `json:"message_id" db:"message_id" bson:"message_id"` // contributor information (some transient information, we take note at the time of the message - can help with a contributor's influence at the time of message - or we can track how certain messages helped a contributor gain influence - OR we can say only show me messages from contributors who have X followers, etc.) ContributorId string `json:"contributor_id" db:"contributor_id" bson:"contributor_id"` ContributorScreenName string `json:"contributor_screen_name" db:"contributor_screen_name" bson:"contributor_screen_name"` ContributorName string `json:"contributor_name" db:"contributor_name" bson:"contributor_name"` ContributorGender int `json:"contributor_gender" db:"contributor_gender" bson:"contributor_gender"` ContributorType string `json:"contributor_type" db:"contributor_type" bson:"contributor_type"` ContributorLongitude float64 `json:"contributor_longitude" db:"contributor_longitude" bson:"contributor_longitude"` ContributorLatitude float64 `json:"contributor_latitude" db:"contributor_latitude" bson:"contributor_latitude"` ContributorGeohash string `json:"contributor_geohash" db:"contributor_geohash" bson:"contributor_geohash"` ContributorLang string `json:"contributor_lang" db:"contributor_lang" bson:"contributor_lang"` ContributorCountry string `json:"contributor_country" db:"contributor_country" bson:"contributor_country"` ContributorCity string `json:"contributor_city" db:"contributor_city" bson:"contributor_city"` ContributorCityPopulation int32 `json:"contributor_city_pop" db:"contributor_city_pop" bson:"contributor_city_pop"` ContributorRegion string `json:"contributor_region" db:"contributor_region" bson:"contributor_region"` // Data that changes, think about the value of having it...maybe remove it... API calls can always be made to get this current info. // But this kinda gives a user an idea for influencers (at the harvest time at least). So while it's definitely dated...It could be used as a // decent filter, ie. only show users who have over a million followers, etc. ContributorLikes int `json:"contributor_likes" db:"contributor_likes" bson:"contributor_likes"` ContributorStatusesCount int `json:"contributor_statuses_count" db:"contributor_statuses_count" bson:"contributor_statuses_count"` ContributorListedCount int `json:"contributor_listed_count" db:"contributor_listed_count" bson:"contributor_listed_count"` ContributorFollowers int `json:"contributor_followers" db:"contributor_followers" bson:"contributor_followers"` // This value is technically stateful, but can be treated as stateless because it doesn't really get revoked and change back... ContributorVerified int `json:"contributor_verified" db:"contributor_verified" bson:"contributor_verified"` // Twitter for sure, but I think other networks too? Message string `json:"message" db:"message" bson:"message"` IsQuestion int `json:"is_question" db:"is_question" bson:"is_question"` Category string `json:"category" db:"category" bson:"category"` // Note these values are at the time of harvest. it may be confusing enough to not need these values stored...but how long can we track each message? API rate limits... // TODO: Maybe remove these? (think on it) also these technically don't need prefixes because we have the "network" field. TwitterRetweetCount int `json:"twitter_retweet_count" db:"twitter_retweet_count" bson:"twitter_retweet_count"` TwitterFavoriteCount int `json:"twitter_favorite_count" db:"twitter_favorite_count" bson:"twitter_favorite_count"` // Instagram (and I suppose Facebook if possible) LikeCount int `json:"like_count" db:"like_count" bson:"like_count"` // Google+ GooglePlusOnes int64 `json:"google_plus_ones" db:"google_plus_ones" bson:"google_plus_ones"` }
Posts, status updates, comments, etc.
type SocialHarvestReport ¶
type SocialHarvestReport struct { }
Social Harvest reports are generated for several reasons and is designed specifically for the Social Harvest Dashboard tool. 1: Performance
- reports contain aggregate data, real-time queries over the potential amount of data would be silly (slow UX/dashboard)
- the dashboard's widgets can pretty much all share from the same JSON response from the API now
2: Storage
- reports are smaller and raw data can be removed reduce database size and lowering hosting cost
3: Consistency
- this makes the available data to the front-end dashboard reliable and well defined (we know what we'll have)
Reports are always on an hourly basis. However, the current hour (partial) can be queried as if it was already built as a report. This ensures the dashboard shows data up to the minute. If a higher resolution is desired (ie. aggregate data by the minute), then custom tools would need to be built. This is of course possible since Social Harvest has all that data...But it is outside the design of the Social Harvest dashboard and requires custom code. However, the dashboard lets users look through messages. So we still need to keep (at least some of) that around.
type SocialHarvestSchedule ¶
func NewSchedule ¶
func NewSchedule(config SocialHarvestConf) *SocialHarvestSchedule
Set up the schedule so it is accessible by others and start it
type SocialHarvestSharedLink ¶
type SocialHarvestSharedLink struct {}
Shared URLs. The "type" will tell us if it's media (video, photo, etc.) or HTML. It's more about content type. Not necessarily "blog" or something. TODO: Possibly scrape those pages to get extra information to get semantic data being discussed/shared for a particular territory. This would enrich things like "type" ...