osmpbf

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2024 License: MIT Imports: 14 Imported by: 21

README

osm/osmpbf Go Reference

Package osmpbf provides a scanner for decoding large OSM PBF files. They are typically found at planet.osm.org or Geofabrik Download.

Example:

file, err := os.Open("./delaware-latest.osm.pbf")
if err != nil {
	panic(err)
}
defer f.Close()

// The third parameter is the number of parallel decoders to use.
scanner := osmpbf.New(context.Background(), file, runtime.GOMAXPROCS(-1))
defer scanner.Close()

for scanner.Scan() {
	switch o := scanner.Object().(type)
	case *osm.Node:

	case *osm.Way:

	case *osm.Relation:
}

if err := scanner.Err(); err != nil {
	panic(err)
}

Note: Scanners are not safe for parallel use. One should feed the objects into a channel and have workers read from that.

Skipping Types

Sometimes only ways or relations are needed. In this case reading and creating those objects can be skipped completely. After creating the Scanner set the appropriate attributes to true.

type Scanner struct {
	// Skip element types that are not needed. The data is skipped
	// at the encoded protobuf level, but each block still
	// needs to be decompressed.
	SkipNodes     bool
	SkipWays      bool
	SkipRelations bool

	// contains filtered or unexported fields
}

Filtering Elements

The above skips all elements of a type. To filter based on the element's tags or other values, use the filter functions. These filter functions are called in parallel and not in a predefined order. This can be a performant way to filter for elements with a certain set of tags.

type Scanner struct {
	// If the Filter function is false, the element well be skipped
	// at the decoding level. The functions should be fast, they block the
	// decoder, there are `procs` number of concurrent decoders.
	// Elements can be stored if the function returns true. Memory is
	// reused if the filter returns false.
	FilterNode     func(*osm.Node) bool
	FilterWay      func(*osm.Way) bool
	FilterRelation func(*osm.Relation) bool

	// contains filtered or unexported fields
}

OSM PBF files with node locations on ways

This package supports reading OSM PBF files where the ways have been annotated with the coordinates of each node. Such files can be generated using osmium, with the add-locations-to-ways subcommand. This feature makes it possible to work with the ways and their geometries without having to keep all node locations in some index (which takes work and memory resources).

Coordinates are stored in the Lat and Lon fields of each WayNode. There is no need to specify an explicit option; when the node locations are present on the ways, they are loaded automatically. For more info about the OSM PBF format extension, see the original blog post.

Using cgo/czlib for decompression

OSM PBF files are a set of blocks that are zlib compressed. When using the pure golang implementation this can account for about 1/3 of the read time. When cgo is enabled the package will used czlib.

$ CGO_ENABLED=0 go test -bench . > disabled.txt
$ CGO_ENABLED=1 go test -bench . > enabled.txt
$ benchcmp disabled.txt enabled.txt
benchmark                        old ns/op     new ns/op     delta
BenchmarkLondon-12               312294630     229927205     -26.37%
BenchmarkLondon_nodes-12         246562457     160021768     -35.10%
BenchmarkLondon_ways-12          216803544     134747327     -37.85%
BenchmarkLondon_relations-12     158722633     80560144      -49.24%

benchmark                        old allocs     new allocs     delta
BenchmarkLondon-12               2469128        2416804        -2.12%
BenchmarkLondon_nodes-12         1056166        1003850        -4.95%
BenchmarkLondon_ways-12          1845032        1792716        -2.84%
BenchmarkLondon_relations-12     509090         456772         -10.28%

benchmark                        old bytes     new bytes     delta
BenchmarkLondon-12               963734544     954877896     -0.92%
BenchmarkLondon_nodes-12         658337435     649482060     -1.35%
BenchmarkLondon_ways-12          441674734     432819378     -2.00%
BenchmarkLondon_relations-12     187941609     179086389     -4.71%

Documentation

Overview

Example (Stats)

ExampleStats demonstrates how to read a full file and gather some stats. This is similar to `osmconvert --out-statistics`

package main

import (
	"context"
	"fmt"
	"math"
	"os"
	"time"

	"github.com/paulmach/osm"
	"github.com/paulmach/osm/osmpbf"
)

// ExampleStats demonstrates how to read a full file and gather some stats.
// This is similar to `osmconvert --out-statistics`
func main() {
	f, err := os.Open("../testdata/delaware-latest.osm.pbf")
	if err != nil {
		fmt.Printf("could not open file: %v", err)
		os.Exit(1)
	}
	defer f.Close()

	nodes, ways, relations := 0, 0, 0
	stats := newElementStats()

	minLat, maxLat := math.MaxFloat64, -math.MaxFloat64
	minLon, maxLon := math.MaxFloat64, -math.MaxFloat64

	minTS, maxTS := time.Date(2100, 1, 1, 0, 0, 0, 0, time.UTC), time.Time{}

	var (
		maxNodeRefs   int
		maxNodeRefsID osm.WayID
	)

	var (
		maxRelRefs   int
		maxRelRefsID osm.RelationID
	)

	scanner := osmpbf.New(context.Background(), f, 3)
	defer scanner.Close()

	for scanner.Scan() {
		var ts time.Time

		switch e := scanner.Object().(type) {
		case *osm.Node:
			nodes++
			ts = e.Timestamp
			stats.Add(e.ElementID(), e.Tags)

			if e.Lat > maxLat {
				maxLat = e.Lat
			}

			if e.Lat < minLat {
				minLat = e.Lat
			}

			if e.Lon > maxLon {
				maxLon = e.Lon
			}

			if e.Lon < minLon {
				minLon = e.Lon
			}
		case *osm.Way:
			ways++
			ts = e.Timestamp
			stats.Add(e.ElementID(), e.Tags)

			if l := len(e.Nodes); l > maxNodeRefs {
				maxNodeRefs = l
				maxNodeRefsID = e.ID
			}
		case *osm.Relation:
			relations++
			ts = e.Timestamp
			stats.Add(e.ElementID(), e.Tags)

			if l := len(e.Members); l > maxRelRefs {
				maxRelRefs = l
				maxRelRefsID = e.ID
			}
		}

		if ts.After(maxTS) {
			maxTS = ts
		}

		if ts.Before(minTS) {
			minTS = ts
		}
	}

	if err := scanner.Err(); err != nil {
		fmt.Printf("scanner returned error: %v", err)
		os.Exit(1)
	}

	fmt.Println("timestamp min:", minTS.Format(time.RFC3339))
	fmt.Println("timestamp max:", maxTS.Format(time.RFC3339))
	fmt.Printf("lon min: %0.7f\n", minLon)
	fmt.Printf("lon max: %0.7f\n", maxLon)
	fmt.Printf("lat min: %0.7f\n", minLat)
	fmt.Printf("lat max: %0.7f\n", maxLat)
	fmt.Println("nodes:", nodes)
	fmt.Println("ways:", ways)
	fmt.Println("relations:", relations)
	fmt.Println("version max:", stats.MaxVersion)
	fmt.Println("node id min:", stats.Ranges[osm.TypeNode].Min)
	fmt.Println("node id max:", stats.Ranges[osm.TypeNode].Max)
	fmt.Println("way id min:", stats.Ranges[osm.TypeWay].Min)
	fmt.Println("way id max:", stats.Ranges[osm.TypeWay].Max)
	fmt.Println("relation id min:", stats.Ranges[osm.TypeRelation].Min)
	fmt.Println("relation id max:", stats.Ranges[osm.TypeRelation].Max)
	fmt.Println("keyval pairs max:", stats.MaxTags)
	fmt.Println("keyval pairs max object:", stats.MaxTagsID.Type(), stats.MaxTagsID.Ref())
	fmt.Println("noderefs max:", maxNodeRefs)
	fmt.Println("noderefs max object: way", maxNodeRefsID)
	fmt.Println("relrefs max:", maxRelRefs)
	fmt.Println("relrefs max object: relation", maxRelRefsID)

}

// Stats is a shared bit of code to accumulate stats from the element ids.
type elementStats struct {
	Ranges     map[osm.Type]*idRange
	MaxVersion int

	MaxTags   int
	MaxTagsID osm.ElementID
}

type idRange struct {
	Min, Max int64
}

func newElementStats() *elementStats {
	return &elementStats{
		Ranges: map[osm.Type]*idRange{
			osm.TypeNode:     {Min: math.MaxInt64},
			osm.TypeWay:      {Min: math.MaxInt64},
			osm.TypeRelation: {Min: math.MaxInt64},
		},
	}
}

func (s *elementStats) Add(id osm.ElementID, tags osm.Tags) {
	s.Ranges[id.Type()].Add(id.Ref())

	if v := id.Version(); v > s.MaxVersion {
		s.MaxVersion = v
	}

	if l := len(tags); l > s.MaxTags {
		s.MaxTags = l
		s.MaxTagsID = id
	}
}

func (r *idRange) Add(ref int64) {
	if ref > r.Max {
		r.Max = ref
	}

	if ref < r.Min {
		r.Min = ref
	}
}
Output:

timestamp min: 2007-10-16T15:59:24Z
timestamp max: 2016-08-10T17:32:02Z
lon min: -76.1748935
lon max: -74.4929376
lat min: 38.0273717
lat max: 39.9688859
nodes: 723870
ways: 73144
relations: 1644
version max: 421
node id min: 75385503
node id max: 4343778904
way id min: 9650669
way id max: 436488690
relation id min: 82010
relation id max: 6462005
keyval pairs max: 276
keyval pairs max object: relation 148838
noderefs max: 1811
noderefs max object: way 318739264
relrefs max: 7177
relrefs max object: relation 4799100

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Header struct {
	Bounds               *osm.Bounds
	RequiredFeatures     []string
	OptionalFeatures     []string
	WritingProgram       string
	Source               string
	ReplicationTimestamp time.Time
	ReplicationSeqNum    uint64
	ReplicationBaseURL   string
}

Header contains the contents of the header in the pbf file.

type Scanner

type Scanner struct {
	// Skip element types that are not needed. The data is skipped
	// at the encoded protobuf level, but each block still needs to be decompressed.
	SkipNodes     bool
	SkipWays      bool
	SkipRelations bool

	// If the Filter function is false, the element well be skipped
	// at the decoding level. The functions should be fast, they block the
	// decoder, there are `procs` number of concurrent decoders.
	// Elements can be stored if the function returns true. Memory is
	// reused if the filter returns false.
	FilterNode     func(*osm.Node) bool
	FilterWay      func(*osm.Way) bool
	FilterRelation func(*osm.Relation) bool
	// contains filtered or unexported fields
}

Scanner provides a convenient interface reading a stream of osm data from a file or url. Successive calls to the Scan method will step through the data.

Scanning stops unrecoverably at EOF, the first I/O error, the first xml error or the context being cancelled. When a scan stops, the reader may have advanced arbitrarily far past the last token.

The Scanner API is based on bufio.Scanner https://golang.org/pkg/bufio/#Scanner

func New

func New(ctx context.Context, r io.Reader, procs int) *Scanner

New returns a new Scanner to read from r. procs indicates amount of paralellism, when reading blocks which will off load the unzipping/decoding to multiple cpus.

func (*Scanner) Close

func (s *Scanner) Close() error

Close cleans up all the reading goroutines, it does not close the underlying reader.

func (*Scanner) Err

func (s *Scanner) Err() error

Err returns the first non-EOF error that was encountered by the Scanner.

func (*Scanner) FullyScannedBytes

func (s *Scanner) FullyScannedBytes() int64

FullyScannedBytes returns the number of bytes that have been read and fully scanned. OSM protobuf files contain data blocks with 8000 nodes each. The returned value contains the bytes for the blocks that have been fully scanned.

A user can use this number of seek forward in a file and begin reading mid-data. Note that while elements are usually sorted by Type, ID, Version in OSM protobuf files, versions of given element may span blocks.

func (*Scanner) Header

func (s *Scanner) Header() (*Header, error)

Header returns the pbf file header with interesting information about how it was created.

func (*Scanner) Object

func (s *Scanner) Object() osm.Object

Object returns the most recent token generated by a call to Scan as a new osm.Object. Currently osm.pbf files only contain nodes, ways and relations. This method returns an object so match the osm.Scanner interface and allows this Scanner to share an interface with osmxml.Scanner.

func (*Scanner) PreviousFullyScannedBytes added in v0.1.0

func (s *Scanner) PreviousFullyScannedBytes() int64

PreviousFullyScannedBytes returns the previous value of FullyScannedBytes. This is interesting because it's not totally clear if a feature spans a block. For example, if one quits after finding the first relation, upon restarting there is no way of knowing if the first relation is complete, so skip it. But if this relation is the first relation in the file we'll skip a full relation.

func (*Scanner) Scan

func (s *Scanner) Scan() bool

Scan advances the Scanner to the next element, which will then be available through the Element method. It returns false when the scan stops, either by reaching the end of the input, an io error, an xml error or the context being cancelled. After Scan returns false, the Err method will return any error that occurred during scanning, except that if it was io.EOF, Err will return nil.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL