baseball-stats-db

module
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2018 License: Apache-2.0

README

Baseball Stats DB

The goal of the project is to provide the data provided by the Baseball Databank files in databases to make using the data easier. Currently there are three officially supported databases PostgreSQL, SQLite and MongoDB. If you are wondering why MySQL isn't support read this.

Another source of data that will be a part of this database is from Retrosheet.org. The Retrosheet project has game logs and other data points that the Baseball Databank project does not. Currently the data will be in two separate databases but future releases will include a combined database.

Schemas

The schema file for all the supported databases live in the schemas directory. Each database will have a similar name convention. The All schema files follow this naming convention:

Baseball Databank

The Baseball Databank schema has the following naming convention:

(postgres|sqlite)_schema_<season>_<github-commit-hash>.sql

Where <season> the most recent season in the databases <githhub-commit-hash> is the hash of the latest Baseball Databank repository update. The SQLite schema for the most recent commit, as of 2017-11-22, would be sqlite_schema_2016_4a64a55.sql.

Retrosheet

The Retrosheet.org database schema has the following naming convention:

`(mysql|postgres|sqlite)schema_retrosheet.sql

Backups

If want the schema AND the data you'll find backup files in the backups/ directory. The backup file naming convention is:

(postgres|sqlite|mongodb)_backup_<date of backup>_<season>_<github-commit-hash>.tgz

If I had a backup file done today would have the name postgres_backup_20171122_2016_4a64a55.tgz

Utilities

In addition to the schema and backup files there are utilities that you can use to create load the databases yourself.

  • databank-dbloader is used to load Baseball Databank into a database of your choosing. For more information see the README.mdl
  • retsched-dbloader is used to load Retrosheet's schedules file into a database of your choosing. For more information see its README.md.

Licensing & Acknowledgments

Baseball Databank is a compilation of historical baseball data in a
convenient, tidy format, distributed under Open Data terms.

This work is licensed under a Creative Commons Attribution-ShareAlike
3.0 Unported License.  For details see:
http://creativecommons.org/licenses/by-sa/3.0/

Person identification and demographics data are provided by
Chadwick Baseball Bureau (http://www.chadwick-bureau.com),
from its Register of baseball personnel.

Player performance data for 1871 through 2014 is based on the
Lahman Baseball Database, version 2015-01-24, which is 
Copyright (C) 1996-2015 by Sean Lahman.

The tables Parks.csv and HomeGames.csv are based on the game logs
and park code table published by Retrosheet.
This information is available free of charge from and is copyrighted
by Retrosheet.  Interested parties may contact Retrosheet at 
http://www.retrosheet.org.
The information used here was obtained free of
charge from and is copyrighted by Retrosheet.  Interested
parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL