beam: Files

Command wordcount

wordcount is an example that counts words in Shakespeare and includes Beam best practices.

This example is the second in a series of four successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount. After you've looked at this example, then see the debugging_workcount pipeline, for introduction of additional concepts.

For a detailed walkthrough of this example, see

Basic concepts, also in the minimal_wordcount example: Reading text files; counting a PCollection; writing to text files

New Concepts:

1. Executing a Pipeline both locally and using the selected runner
2. Defining your own pipeline options
3. Using ParDo with static DoFns defined out-of-line
4. Building a composite transform

Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options added by the 'beamx' package and not hard-coded as they were in the minimal_wordcount example. The 'beamx' package also registers all included runners and filesystems as a convenience.

To change the runner, specify:


To execute this pipeline, specify a local output file (if using the 'direct' runner) or a remote file on a supported distributed file system.


The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --input.

Package Files


Package main imports 10 packages (graph). Updated 2020-11-01. Refresh now. Tools for package owners.