lobbyist-lookup

command module
v0.0.0-...-e732a77 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 1, 2017 License: MIT Imports: 19 Imported by: 0

README

Unified Congress Lobbyist Disclosure Scrapper and Lookup

Deploy

  • Record Retrieval

    - Latest current year House lobby disclosure filings available on [House.gov](http://disclosures.house.gov/).
    • Using the webbrowser based search may result in

      Cannot download more than 2000 records. Please refine search.

    • Using past filings download link utimately leads to here to download filings in xml format.

      • The house.gov site uses an input element with method of POST to an asp page to serve the archive files. The site also runs on ASP which has ViewState and EventValidation enforced to prevent CSRF. ViewStateand EventValidation makes programmatic POST requests more complicated as we need to have valid ViewState and EventValidation values in order to send a valid POST request.
        • This Go program retrieves a response from the ASP server with a GET request. After parsing the hidden ViewState and EventValidation input values, we are able to construct a valid POST request which the ASP server replies back with a file stream. We write the file stream to a defined file.
          • houseRetrieve.go uses code.google.com/p/go.net/html package to parse HTML for tokens.
          • houseRetrieve.go contains the archive downloading portion of the code and can be repurposed to send/received requests with other ASP sites using CSRF protection.
    • XXXX Registration archives contain new registrations for that year. XXXX N Quarter archives contain filings due for N quarter.

      • This program will download all archives for the current year.
    • Use predicted file naming convention for Senate filings on Senate.gov.

      • Senate provides xml files with up to 1000 filings per file.
        • XML files are in UTF-16 and Go expects UTF-8
          • Used code.google.com/p/go-charset/charset to convert UTF-8 to UTF-16.
    • Interesting Info

      • House has ~90k filings versus Senate's ~130k filings.
      • House filings are in their individual XML file versus Senate filing being 1000 per file
      • Senate filings therefore parse faster funnily enough.
    • Retrieves lobbyist filings every day.

      • Heroku cycles dynos every 24 hrs so that also refreshes the list as well ;)
Parameter Comment
__VIEWSTATE extracted token
__EVENTVALIDATION extracted token
selFilesXML requestd archive filename from page HTML input element
btnDownloadXML needed to tell ASP to serve file?

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Godeps
_workspace/src/code.google.com/p/go-charset/charset
The charset package implements translation between character sets.
The charset package implements translation between character sets.
_workspace/src/code.google.com/p/go-charset/charset/iconv
The iconv package provides an interface to the GNU iconv character set conversion library (see http://www.gnu.org/software/libiconv/).
The iconv package provides an interface to the GNU iconv character set conversion library (see http://www.gnu.org/software/libiconv/).
_workspace/src/code.google.com/p/go-charset/data
The data package embeds all the charset data files as Go data.
The data package embeds all the charset data files as Go data.
_workspace/src/code.google.com/p/go.net/html
Package html implements an HTML5-compliant tokenizer and parser.
Package html implements an HTML5-compliant tokenizer and parser.
_workspace/src/code.google.com/p/go.net/html/atom
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
_workspace/src/code.google.com/p/go.net/html/charset
Package charset provides common text encodings for HTML documents.
Package charset provides common text encodings for HTML documents.
godepback
_workspace/src/code.google.com/p/go.net/html
Package html implements an HTML5-compliant tokenizer and parser.
Package html implements an HTML5-compliant tokenizer and parser.
_workspace/src/code.google.com/p/go.net/html/atom
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
_workspace/src/code.google.com/p/go.net/html/charset
Package charset provides common text encodings for HTML documents.
Package charset provides common text encodings for HTML documents.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL