wavecarve

package module
v0.0.0-...-0852a48 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2023 License: BSD-3-Clause Imports: 10 Imported by: 0

README

Seam carving audio

wave

Here's the idea

What if audio could be converted to an image, and then seam carving could be used on the image, and then the image could be converted back to audio. Would it sound interesting? Would it be useful somehow as a sound design tool?

TL;DR

It didn't quite work out.

Process

With some string, glue, experience with Go and some output from GPT4, I created the wavecarve package for Go. This package provides these functions:

  • A function for reading a .wav file: ReadWavFile(filePath string) ([]int16, WAVHeader, error)
  • A function for creating and writing to a .wav file: WriteWavFile(filePath string, int16s []int16, header WAVHeader)
  • A function for converting audio to an image (more or less, the conversion is a bit lossy, unfortunately): CreateSpectrogramFromAudio(int16s []int16) (*image.RGBA, error)
  • A function for removing the least interesting parts of the image, using the excellent github.com/esimov/caire package: CarveSeams(img *image.RGBA, newWidthInPercentage float64) (*image.RGBA, error)
  • And finally, a function for converting the image back to audio: CreateAudioFromSpectrogram(img *image.RGBA) ([]int16, error)

These functions are used by the utilities that are included in the cmd directory, which are:

  • cmd/spectrogram - a utility that reads input.wav, creates a visual representation of the audio (a spectrogram with phase information) and outputs the image to spectrogram.png.
  • cmd/recreate - a utility that reads input.wav, creates a visual representation of the audio, uses this representation to try to re-create the audio (a lossy process), and outputs output.wav.
  • cmd/carve - a utility that reads input.wav, creates a visual representation, seams carves the image to remove the least interesting parts, writes this image to carved.png and then creates audio from the image and outputs output.wav.

Note that the generated .wav files are unesessarily large with a little bit of audio at the start and a lot of silence at the end and needs to be trimmed down manually after having being generated. This might be fixed in a future version.

Results

Here is the example audio I used as an input file:

https://github.com/xyproto/wavecarve/raw/main/wav/example.wav

Here is a spectrogram created with cmd/spectrogram:

spectrogram

Here is a seam carved version of the spectrogram, reduced to 50% of the width:

carved

And here is the re-created audio from example.wav, created with cmd/recreate and then converted to .mp3. The audio has lost quite a bit of quality in the process and is not particularly pleasing to listen to:

https://github.com/xyproto/wavecarve/raw/main/mp3/output.mp3

If the carved image is used to re-create audio instead, this is the result:

https://github.com/xyproto/wavecarve/raw/main/mp3/carved.mp3

Even though the audio is of low quality, one can get a hint of which effect seam carving has on audio. It's not particularly pleasing.

Conclusions
  • A spectrogram + phase information is not a great representation of audio, since converting audio to this format and back is a very lossy process.
  • Seam carving does not produce a particularly interesting effect on the audio, using these types of spectrograms.
  • There might be other visual representations of audio that gives much better results, though.
  • It's not a great sound design tool, unless a lot of filters are applied afterwards, perhaps.

Documentation

Index

Constants

View Source
const (
	// Assume 44100 Hz sample rate
	SampleRate = 44100

	// Assume 16-bit depth
	BitsPerSample = 16

	// Assume mono audio
	NumChannels = 1

	// Assume PCM audio
	AudioFormat = 1

	// Compute other header values
	ByteRate   = SampleRate * NumChannels * BitsPerSample / 8
	BlockAlign = NumChannels * BitsPerSample / 8

	// Size of the FFT
	FFTSize = 1024
)

Variables

This section is empty.

Functions

func CarveSeams

func CarveSeams(img *image.RGBA, newWidthInPercentage float64) (*image.RGBA, error)

CarveSeams removes seams from the image to reduce its width by the given percentage.

func CreateAudioFromSpectrogram

func CreateAudioFromSpectrogram(img *image.RGBA) ([]int16, error)

CreateAudioFromSpectrogram creates audio from a spectrogram and extracts the length of the audio data from the image.

func CreateSpectrogramFromAudio

func CreateSpectrogramFromAudio(int16s []int16) (*image.RGBA, error)

CreateSpectrogramFromAudio creates a spectrogram from an []int16 and encodes the length of the audio data into the image.

func WriteWavFile

func WriteWavFile(filePath string, int16s []int16, header WAVHeader) error

Write a .wav file

Types

type WAVHeader

type WAVHeader struct {
	ChunkID       [4]byte
	ChunkSize     uint32
	Format        [4]byte
	Subchunk1ID   [4]byte
	Subchunk1Size uint32
	AudioFormat   uint16
	NumChannels   uint16
	SampleRate    uint32
	ByteRate      uint32
	BlockAlign    uint16
	BitsPerSample uint16
	Subchunk2ID   [4]byte
	Subchunk2Size uint32
}

WAVHeader represents the header of a WAV file.

func ReadWavFile

func ReadWavFile(filePath string) ([]int16, WAVHeader, error)

Read a .wav file

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL