debugging_wordcount is an example that verifies word counts in Shakespeare and includes Beam best practices.
This example, debugging_wordcount, is the third in a series of four successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount and wordcount. After you've looked at this example, then see the windowed_wordcount pipeline, for introduction of additional concepts.
Basic concepts, also in the minimal_wordcount and wordcount examples: Reading text files; counting a PCollection; executing a Pipeline both locally and using a selected runner; defining DoFns.
New Concepts:
1. Using the richer struct DoFn form and accessing optional arguments. 2. Logging using the Beam log package, even in a distributed environment 3. Testing your Pipeline via passert
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --input.
Package main imports 11 packages (graph). Updated 2018-04-24. Refresh now. Tools for package owners. This is an inactive package (no imports and no commits in at least two years).