retrosheet

module
v0.0.0-...-d2b6d94 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 9, 2021 License: MIT

README

RETROSHEET!

Introduction

Have you ever wondered what let sportscasters pull up stats on all sorts of things during a game? Did you ever want to have something like that for free? And do you prefer programming in Go? Typescript? Keep reading.

Retrosheet.org

There are various sources for baseball data that can be accessed online. A good list of the main ones is discussed at sabr.org. Some of those have data that is already cooked and you perform queries online, but you aren't able to download the actual databases.

Some have API's available. Another source that I have used is mysportsfeed.com. It is excellent and provides live game data as well as archives via an API. It isn't free but if you want some live data its worth a look. I wrote a phone app that utilized this service to show game stats, using Dart and Flutter.

What I ended up wanting to do is use one of these sources that let me host the database and provide an API for using it. Not because I want to complete with the online services, but instead as an exercise in putting all the pieces together to end up with a set of data (retrosheet.org), a database (Mongodb) to hold it, a GraphQL API server (written in Go), and a client for looking at the data (React/Redux/Typescript).

One of the sources mentioned at sabr.org that fit the bill is retrosheet.org. It has some pretty comprehensive data about games, personnel and teams. Their datasets go back as far as 1871!. Best of all their data is free to download and use with the proper attribution. So I started this project using their data as the basis. Just to be clear, I am not affiliated with retrosheet.org, I am just using their data and putting a wrapper around it.

I suggest you peruse retrosheet.org to get a feel for what it provides. The folks at Retrosheet are all volunteers and they put a lot of work into making this data available. I thank them for that!

     The information used here was obtained free of
     charge from and is copyrighted by Retrosheet.  Interested
     parties may contact Retrosheet at "www.retrosheet.org". 
The Data

The data from Retrosheet I am using includes a database of all professional games from 1871 to the most recent completed season. That's the big one. It also has a couple of tables for personnel and teams. I am using those three to begin with. They also have a database of play-by-play data for pro teams from 1920 to 2020. To keep it simple at first I am skipping that one. But the methodology I am using to incorporate the data is applicable to that one too.

Here's the files you should download if you want to follow along. You can find Golang definitions of the data records in the directory src/jsontypes. Or the Typescript definitions in **src/retro/src/query/query/queryTypes.ts.

  • Game Logs

    • The gamelogs include scores, player info and other stuff. They are organized by season and can be downloaded piecemeal. I downloaded the whole thing but my code doesn't require all the years if you don't need them. Just download the ones you prefer. In a later chapter there is an option to upload the gamelog data to Mongodb Cloud. The free tier of Mongodb has a 512MB limit and this gamelog set exceeds that limit, so you would have to delete about 2/3 of the csv data before proceeding.
    • Each gamelog is a csv file with a row for each game. A description of the columns is here. There is a gamelog file for each year.
type Game struct {
	Date                    string `json:"Date"`
	GameNumber              string `json:"GameNumber"`
	DayOfWeek               string `json:"DayOfWeek"`
	VisitorTeam             string `json:"VisitorTeam"`
	VisitorLeague           string `json:"VisitorLeague"`
	VisitorGameNumber       int    `json:"VisitorGameNumber"`
	HomeTeam                string `json:"HomeTeam"`
	HomeLeague              string `json:"HomeLeague"`
	HomeGameNumber          int    `json:"HomeGameNumber"`
	VisitorScore            int    `json:"VisitorScore"`
	HomeScore               int    `json:"HomeScore"`
  ... the whole record has
  • Team Data
    • This data has just a few fields but is useful for queries because you can use it as keys for finding data in the game logs. There is only one file .
    • The Team Data File is here.
interface Person {
  Abbr: string; // the key field used in the other data sets
  League: string;
  City: string;
  Nickname: string;
  FirstYear: string;
  LastYear: string;
}
  • Personnel Data)
    • This data has just a few fields but again gives you keys that can be used for lookups in the game logs. There is only one file.
    • This data doesn't have a separate download, you just need to copy the data from the web page itself.
type Person struct {
	ID             string `json:"ID"` // the key field used in other datasets
	Last           string `json:"Last"`
	First          string `json:"First"`
	PlayerDebut    string `json:"PlayerDebut"`
	ManagerDebut   string `json:"ManagerDebut"`
	CoachDebut     string `json:"CoachDebut"`
	UmpireDebut    string `json:"UmpireDebut"`
}
The Stack

Here's what I use for my stack:

  • Visual Studio Code
    • running on Windows with remote to either my Linux box or WSL2/Debian on Windows.
  • Dev Platform
    • All my coding and execution is on Linux. Should work on Windows or Mac with some manual work
    • Some Antivirus on Windows seems to consider Go executables as malware and will block them. Depends on your systems.
    • For Windows the Linux Bash scripts would have to be worked.
  • Data Processing
    • JavaScript
    • Go
    • Bash
  • Database
    • Mongodb
      • the data is read-only
      • the Retrosheet.org data is organized in .csv files with some overlap between the different datasets
      • the data seemed suited for a NoSQL document store. A relational database would have required a lot of refactoring of the data.
      • see the src/api/db.go file for the database code.
  • GraphQL Server
    • graph-gophers/graphql-go
      • this seemed the most straightforward of the Golang graphql libraries
      • it is straight Graphql without adding any abstractions on top of it
      • worked flawlessly
      • see the src/api directory for the implementations for Retrosheet
    • Go
  • Client
The Process

This is a general description. Go to the Quickstart section below for the step-by-step instructions.

The data and code are processed in a few steps:

  1. Download the dataset to a directory outside the repo
  • Bash script in retrosheet/setup/setup.sh
  1. Read the .csv files, convert to JSON and store in the data directory
  • Run bash script in retrosheet/script/init.sh
  • JavaScript code in retrosheet/src/transform
  1. Run 'retrosheet' utility to upload JSON to Mongodb
  • Run executable retrosheet/bin/retrosheet -p
  • main program in retrosheet/cmd/retrosheet.go
  • uses 'loader' library in retrosheet/src/loader
  1. Run GraphQL backend server
  • Run server retrosheet/bin/server
  • main program in retrosheet/cmd/server.go
  • uses 'query' library in retrosheet/src/query/*.go
    • database access
  • uses 'api' library in retrosheet/src/api/*.go
  1. Run web client
  • Client is a React app in retrosheet/src/retro
  • uses Redux/Redux-toolkit for data store
  • uses graphql-request
    • for client side GraphQL
  • styling is Tailwindcss

Quickstart With Docker

The easiest way to get all the pieces of this system running is to use Docker. There is a complete procedure in Docker-Setup.md

Detailed Startup (not using docker)

If you want to do a deeper dive into how the pieces work, you can use the following detailed procedure.

Prerequisites
  • Use Linux for best results
  • You must have the following installed:
    • Go version 1.16 or later (due to deprecation of ioutils)
    • Mongodb setup and running
      • make a note of the connection string
      • the Mongodb server can be running on any system as long as you can connect to it
    • Node.js version 14 or late
      • not tested with earlier versions, might work
  • Linux utils
    • unzip : sudo apt install unzip
    • build-essential : sudo apt install build-essential
      • process needs 'make'
Procedure
  1. directory setup
  • in a project workspace, create a directory baseball
  • cd into baseball
  • clone the retrosheet repo in this directory
  • configure retrosheet/script/env.sh and source it to set the proper environment variables
  1. download the data
  • create a directory named data in the baseball directory
    • one up from retrosheet
  • cd into the data directory
  • copy the contents of retrosheet/setup directory into data
    • cp -r retrosheet/setup/ .
  • run the script setup.sh
    • this will download the required data files and put them in the proper directory structure
  1. run the transform script
  • cd back to retrosheet
  • run script/init.sh
    • does 'npm install' on the transform programs then runs them
    • reads all the csv files and converts them to JSON
    • processing the gamelogs may take a few minutes
    • the script performs a few simple tests to check if the data is there after processing
  1. build the executables
  • verify you are in the top level retrosheet directory
  • if 'make' is not installed on your linux system, you need to apt install build-essentials
  • if you are on Windows you would need to look at the Makefile and perform the operations manually.
  • if you haven't already, you will also need to install the golang tools 'go vet' and 'staticcheck'
    • go install honnef.co/go/tools/cmd/staticcheck@latest
  • run 'make'
  • builds bin/retrosheet and bin/server
  1. start Mongodb
  • note the IP:PORT it runs on
  • be sure to have that set in the RETROSHEET_MONGO environment variable
  • Mongodb can be run from anywhere as long as your server can connect to it
  1. populate the database
  • verify you are in the top level retrosheet directory
  • run bin/retrosheet -p
  • the personnel and teams upload is pretty quick. uploading the game data can take a while.
  1. start the GraphQL server
  • verify you are in the top level retrosheet directory
  • run bin/server
    • note the urls it prints out
  1. run the client
  • verify you are in the top level retrosheet directory
  • The client needs to know the URL of the GraphQL server. That value is set by the call to setServerURL function in file client/retro/src/index.tsx. Set the value in that call as needed.
  • the API server must be running
  • the Mongodb database must be running
  • cd into client/retro
  • npm install (if its the first time you ran this)
  • npm start (to run dev server)
  1. connect to the client with your preferred browser (except IE)
Server Side Data

A file containing prerendered data about teams and parks is stored in the file _retrosheet/client/retro/src/SSrData.ts. This data is used for server side rendering of two relatively small datasets to avoid having to make queries at runtime. A default version of this file is included when you check out the repo. It is based on data available as of this writing (September 2021). If newer data is published, the file can be rebuilt with the following process.

  • cd into retrosheet/client/retro/ssr
  • the API server must be running
  • the Mongodb database must be running
  • cd into client/retro
  • execute "make clean && make"
Environment Variables

The scripts and executables depend on the following environment variables. Set them to the appropriate values for your system. An example is in retrosheet/script/env.sh.


# base path of RETROSHEET project code
export RETROSHEET="${HOME}/<...>/baseball/retrosheet"

# url to mongodb server
# local
export RETROSHEET_MONGO="mongodb://<ip:port>"
# example for mongodb atlas
# export RETROSHEET_MONGO="mongodb+srv://<username>:<password>@cluster0.<cluster id>.mongodb.net/<database name>?retryWrites=true&w=majority"

# based path to retrosheet data files
export RETROSHEET_DATA="${HOME}/<...>/baseball/data"

# graphql server ip:port
export RETROSHEET_SERVER="<ip:port>"
  ├── data (where Retrosheet data is downloaded)
  |   |── other data files
  │   └── gamelogs (unzipped to here)
  └── retrosheet ( output binaries)
  ├── bin             
  │   ├── retrosheet
  │   └── server
  ├── client (React client)
  │   └── retro
  ├── cmd (# top level source )
  │   ├── retrosheet
  │   └── server
  ├── script (script and programs for downloading retrosheet data)
  ├── setup (scripts for setup)
  
  ├── src (libraries source)
  │   ├── api (API server and database access)
  │   ├── jsontypes  (type defs for the retrosheet data)
  │   ├── loader (utilities for uploading to Mongodb)
  │   ├── query (GraphQL query utilities for the retrosheet data types)      
  │   ├── transform (JavaScript code for transforming downloaded files into JSON)
  │   └── util (Misc utilities)
  ├── test (Basic test functions)
  │   ├── js (tests for db access - JavaScript)
  │   ├── query (tests for API access - TypeScript))
  └── 
  

Directories

Path Synopsis
cmd
src
api
Package api provides the mapping of the JSON documents to GraphQL.
Package api provides the mapping of the JSON documents to GraphQL.
jsontypes
Package jsontypes provides support for reading the JSON documents produced by the csv-transform module, converting them to Go types, and making them available to the loader package.
Package jsontypes provides support for reading the JSON documents produced by the csv-transform module, converting them to Go types, and making them available to the loader package.
loader
Package loader uses the functions and types from jsontypes to upload the json files to a mongodb server
Package loader uses the functions and types from jsontypes to upload the json files to a mongodb server
query
Package query contains the functions that access the database directly.
Package query contains the functions that access the database directly.
util
this package contains utility functions including log helpers
this package contains utility functions including log helpers
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL