strinterp

package module
v0.0.0-...-f875426 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2016 License: MIT Imports: 6 Imported by: 0

README

strinterp - "String/Stream Interpolation"

Build Status

Morally correct string/stream interpolation.

go get github.com/thejerf/strinterp

This code is posted in support of a blog post about why we continue to write insecure software, which I recommend reading in order to understand the design and the purpose of the design. At the moment, I wouldn't particularly propose that you use it in real code; I don't. After all this represents ~20 hours of screwing around rather than something I'd ship directly. However, it does do what it does, so if you are moved to use it, I won't object. As the LICENSE says, if it breaks you get to keep both pieces. If enough pull requests come in to turn this into a real library I won't complain.

But even as just some musings meant for support of a blog post, I could not stand publishing this without the jerf-standard full godoc, including examples, usage, and everything else you might otherwise expect this README.md to cover on GitHub, plus full test coverage and golint-cleanliness.

Commit Signing

Starting with the commit after d46cc8a8c2a22, I will be signing this repository with the "jerf" keybase account. If you are viewing this repository through GitHub, you should see the commits as showing as "verified" in the commit view.

(Bear in mind that due to the nature of how git commit signing works, there may be runs of unverified commits; what matters is that the top one is signed.)

Documentation

Overview

Package strinterp provides a demonstration of morally correct string interpolation.

This package was created in support of a blog post about why we are still writing insecure software in 2015: http://www.jerf.org/iri/post/2942

It's the result of about 20 hours of screwing around. I meant to keep it shorter, but I started to have too much fun.

"Morally" correct means that I intend this to demonstrate a point about API and language design, and that any actual utility is a bit coincidental.

That said, as this developed it became potentially more useful than I had initially intended, because instead of expressing all the interpolations in terms of strings, they are all expressed in terms of io.Writers. Since this library also permits inputting the strings to be interpolated in the form of io.Readers, this means that this entire library is fully capable of string interpolation in the middle of streams, not just strings. Or, if you prefer, this is a *stream* interpolator. The "str" in "strinterp" is pleasingly ambiguous.

This documentation focuses on usage; for the reasoning behind the design, consult the blog post.

Using String Interpolators

To use this package, create an interpolator object:

i := strinterp.NewInterpolator()

You can then use it to interpolate strings. The simplest case is concatenation:

concated, err := i.InterpStr("concatenated: %RAW;%RAW;", str1, str2)

See the blog post for a discussion of why this is deliberately a bit heavyweight and *designed* to call attention to the use of "RAW", rather than making such usage a simple and quiet default behavior.

The "format string", the first element of the call, has the following syntax:

  • Begins with %, ends with unescaped ;
  • Begins with the formatter/encoder name
  • Which may be followed by a colon, then args for that formatter
  • Which may then be followed by a pipe, and further specifications of encoders with optional arguments

You may backslash-escape any of the pipe, colon, or semicolon to pass them through as arguments to the formatter/encoder, or backslash itself to pass it through. (The formatter/encoder will of course receive the decoded bytes without the escaping backslash.) To emit a raw %, use "%%;".

Here is an example of a format string that uses all these features:

result, err := i.InterpStr("copy and paste: %json|base64:url;", obj)

This will result in the standard encoding/json encoding being used on the obj, then it will be converted to base64, which will use the encoding/base64 URLEncoding due to the "url" argument being passed. You can continue piping to further encoders indefinitely.

There are two different kinds of interpolators you can write, formatters and encoders.

Formatters

A "formatter" is a routine that takes a Go value of some sort and converts it to some bytes to be written out via a provided io.Writer. A formatter has the function signature defined by the Formatter type, which is:

func (w io.Writer, arg interface{}, params []byte) error

When called, the function should first examine the parameters. If it doesn't like the parameters, it should return ErrUnknownArguments, properly filled out. (Note: It is important to be strict on the parameters; if they don't make perfect sense, this is your only chance to warn a user about that.) It should then take the arg and write it out to the io.Writer in whatever manner makes sense, then return either the error obtained during writing or nil if it was fully successful.

You want to write a Formatter when you are trying to convert something that isn't already a string, []byte, or io.Reader into output. Therefore it only makes sense in the first element of a formatter's pipeline (the "json" in the previous example), because only a formatter can handle arbitrary objects.

See the Formatter documentation below for more gritty details.

Encoders

An "encoder" is a routine that receives incoming io.Writer requests, modifies them in a suitable manner, and passes them down to the next io.Writer in the chain. In other words it takes []byte and generates further []byte from them.

You want to write an Encoder when either you want to transform input going through it (like escaping), or when you know the only valid input coming in will be in the form of a string, []byte, or io.Reader, which strinterp will automatically handle feeding down the encoder pipeline.

See the Encoder documentation below for more gritty details.

Configuring Your Interpolators

To configure your interpolator, you will need to add additional formatters and encoders to the interpolator so it is aware of them. NewInterpolator will return a bare *Interpolator with only the "RAW" encoder. A DefaultInterpolator is also provided that comes preconfigured for some HTML- and JSON-type-tasks. Consulting the "examples.go" file in the godoc file listing below will highlight these formatters and interpolators for your cribbing convenience.

Use the AddFormatter and AddEncoder functions to add these to your interpolator to configure it.

(Since I find people often get a sort of mental block around this, remember that, for instance, even though I provide you a default JSON streamer based on the standard encoding/json library, if you have something else you prefer, you can always specify a *different* json formatter for your own usage.)

Once configured, for maximum utility I recommend putting string interpolation into your environment object. See http://www.jerf.org/iri/post/2929 .

Direct Encoder Usage

It is also possible to directly use the Encoders, as their type signature tends to imply (note how you don't have to pass them any *Interpolator or any other context). Ideally you instantiate a WriterStack around your target io.Writer and .Push encoders on top of that, as WriterStack handles some corner cases around Encoders that want to be "Close"d, then call .Finish() on the WriterStack when done, which DOES NOT close the underlying io.Writer. This is probably the maximally-performing way to do this sort of encoding in a stream.

Security Note

This is true of all string interpolators, but even more so of strinterp since it can be hooked up to arbitrary formatters and encoders. You MUST NOT feed user input as the interpolation source string. In fact I'd suggest that one could make a good case that the first parameter to strinterp should always be a constant string in the source code base, and if I were going to write a local validation routine to plug into go vet or something I'd probably add that as a rule.

Again, let me emphasize, this is NOT special to strinterp. You shouldn't let users feed into the first parameter of fmt.Sprintf, or any other such string, in any language for that matter. It's possible some are "safe" to do that in, but given the wide range of havoc done over the years by letting users control interpolation strings, I would just recommend against it unconditionally. Even when "safe" it probably isn't what you mean.

Care should also be taken in the construction of filters. If they get much "smarter" than a for loop iterating over bytes/runes and doing "something" with them, you're starting to ask for trouble if user input passes through them. Generally the entire point of strinterp is to handle potentially untrusted input in a safe manner, so if you start "interpreting" user input you could be creating openings for attackers.

Contributing

I'm interested in pull requests for more Formatters and Encoders for the "default Interpolator", though ideally only for things in the standard library.

Index

Constants

This section is empty.

Variables

View Source
var ErrNotGiven = errors.New("value not given")

ErrNotGiven will be passed to a Formatter as the value it is encoding, if the caller did not give enough arguments to the InterpStr or InterpWriter calls.

This is public so your formatter can check for it.

View Source
var NotGiven = NotGivenType{}

NotGiven is the token passed to the formatters to indicate the value was not given. This distinguishes the value from "nil", which may well be perfectly legitimate.

Functions

func Base64

func Base64(w io.Writer, args []byte) (io.Writer, error)

Base64 defines an Encoder that implements base64 encoding.

It takes as a parameter either "std" or "url", to select between Standard or URL base64 encoding. If no parameter is given, Standard is chosen. Any other parameter results in ErrUnknownArguments.

func CDATA

func CDATA(inner io.Writer, args []byte) (io.Writer, error)

CDATA defines an HTML CDATA escaper, which is to say, the type of data that appears as "text" within HTML.

There's a lot of history and browser variations here. By default this is a very aggressive encoding function suitable for use in all the parts of HTML that permit "CDATA" that I know of, including attribute values. (Some browsers do not like literal newlines in attributes, considering it to terminate the tag.) However, this aggression may result in difficult-to-read HTML. If you are outputting HTML text as text (as opposed to attribute values), you can pass the argument "nocrlf" to avoid encoding CR and LF as entities.

func JSON

func JSON(w io.Writer, val interface{}, params []byte) error

JSON defineds a formatter that uses the standard encoding/json module to output JSON.

Types

type Encoder

type Encoder func(io.Writer, []byte) (io.Writer, error)

An Encoder is a function that takes an "inner" io.Writer and returns an io.Writer that wraps that writer, such that calls to the returned Writer will produce the desired encoding behavior. See examples.go.

In addition to conforming to the io.Writer interface, Encoders must also never cut up Unicode characters between calls. This technically means that existing io.Writer transformers *may* not conform to this interface, though most if not all probably do by accident. Encoders thus may also count on the fact that they will not receive partial Unicode characters, which may permit stateless Encoders to be written. This is facilitated with the provided WriteFunc type as well.

type ErrUnknownArguments

type ErrUnknownArguments struct {
	Arguments []byte
	ErrorStr  string
}

ErrUnknownArguments is the error that is returned when you pass arguments to a formatter/encoder that it doesn't understand. This is public so your formatters and encoders can reuse it.

func (ErrUnknownArguments) Error

func (ua ErrUnknownArguments) Error() string

type Formatter

type Formatter func(io.Writer, interface{}, []byte) error

A Formatter is a function that takes the argument interface{} and writes the corresponding bytes to the io.Writer, based on the arguments. This is generally useful for doing non-trivial transforms on arbitrary objects, such as JSON-encoding them. If your argument is anything other than a string, []byte, or io.Reader, you'll need a Formatter.

The []byte is any additional parameters passed via the colon mechanism, containing only those extra parameters (i.e., no colon or semicolon). Interpreting them is entirely up to the function. This is nil if no colon was used. (Note this can be distinguished from blank, though that seems like a bad idea. Note also the len of a nil slice is 0, which makes that the easiest thing to check.)

interface{} is the value. If the value was not given to the interpolator at all (i.e., more format strings given than values), the value will be == NotGiven, a singleton value used for this case.

If the formatting could be completed successfully, the bytes should all be written to the io.Writer by the time the formatter returns. If the formatting could not be completed successfully, an error should be returned. In that case there are no guarantees about how much of the stream may have been written, which is fundamental to a stream-style library.

type Interpolator

type Interpolator struct {
	// contains filtered or unexported fields
}

An Interpolator represents an object that can perform string interpolation.

Interpolators are created via NewInterpolator.

Interpolators are designed to be used via being initialized with all desired format string handlers in a single goroutine. Once initialized, the interpolator can be freely used in any number of goroutines.

func NewDefaultInterpolator

func NewDefaultInterpolator() *Interpolator

NewDefaultInterpolator returns a new Interpolator set up with some more format strings available:

json: the JSON formatter
base64: the Base64 encoder
cdata: the HTML CDATA encoder

More things may be added in future versions of this library. The safest long-term thing to do is to use NewInterpolator and configure it yourself. But this is convenient for demos and such.

func NewInterpolator

func NewInterpolator() *Interpolator

NewInterpolator returns a new Interpolator, with only the default load of interpolation primitives.

These are:

"%": Yields a literal % without consuming an arg
"RAW": interpolates the given string, []byte, or io.Reader directly
  (if an io.Reader, io.Copy is used)

func (*Interpolator) AddEncoder

func (i *Interpolator) AddEncoder(format string, handler Encoder) error

AddEncoder adds an encoder type to the interpolator.

If the format string is already registered, an error will be returned.

func (*Interpolator) AddFormatter

func (i *Interpolator) AddFormatter(format string, handler Formatter) error

AddFormatter adds a interpolation format to the interpolator.

If the format string is already registered, an error will be returned.

func (*Interpolator) InterpStr

func (i *Interpolator) InterpStr(format string, args ...interface{}) (string, error)

InterpStr is a convenience function that does interpolation on a format string and returns the resulting string.

func (*Interpolator) InterpWriter

func (i *Interpolator) InterpWriter(w io.Writer, formatBytes []byte, args ...interface{}) error

InterpWriter interpolates the format []byte into the passed io.Writer.

type NotGivenType

type NotGivenType struct{}

NotGivenType uniquely identifies the token passed to formatters when an argument is not given for the formatter.

type WriterFunc

type WriterFunc func([]byte) (int, error)

WriterFunc is a type that wraps a function that implements the io.Writer interface with an implementation of calling it for .Write. This allows Encoders to easily return stateless functions as their implementation. See several examples in examples.go.

func (WriterFunc) Write

func (wf WriterFunc) Write(b []byte) (int, error)

This implements the Write method of io.Writer.

type WriterStack

type WriterStack struct {
	io.Writer
	// contains filtered or unexported fields
}

A WriterStack allows us to wrap Encoders around a given io.Writer.

WriterStack solves the problem of some of the Encoders potentially wanting to be .Close()d, even if the underlying io.Writer is not closable, or you do not wish to close the underlying writer so it can be reused later. This can, for instance, be seen in the base64 encoder shipped by this library. By calling .Finish() on this object you can safely use these Encoders. .Finish() should always be called to end a WriterStack's output.

WriterStack can be used without any other strinterp functionality.

func NewWriterStack

func NewWriterStack(w io.Writer) *WriterStack

NewWriterStack returns a new *WriterStack with the argument being used as the lowest-level writer.

func (*WriterStack) Finish

func (ws *WriterStack) Finish() error

Finish will finish the WriterStack's work, which may flush intermediate encoders by calling .Close() on them. This will not close the base io.Writer.

func (*WriterStack) Push

func (ws *WriterStack) Push(enc Encoder, args []byte) error

Push wraps a writer on top of the stack, which will process any bytes and send them to any subsequent writers.

If the Push returns an error, the WriterStack is no longer valid to use.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL