xtractr

package module
v0.0.0-...-dde40d8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 11, 2023 License: MIT Imports: 15 Imported by: 0

README

xtractr with zip bombs protection :)

Here you can read a bit about zip bombs:

Go Library for Queuing and Extracting ZIP, RAR, GZ, BZ2, TAR, TGZ, TBZ2, 7Z, ISO files. Can also be used ad-hoc for direct decompression and extraction. See docs.

  • GoDoc
  • Works on Linux, Windows, FreeBSD and macOS without Cgo.
  • Supports 32 and 64 bit architectures.
  • Decrypts RAR and 7-Zip archives with passwords.

Interface

This library provides a queue, and a common interface to extract files. It does not do the heavy lifting, and relies on these libraries to extract files:

Zip, Gzip, Tar and Bzip are all handled by the standard Go library.

Examples

Example 1 - Queue

package main

import (
	"log"
	"os"
	"strings"

	"golift.io/xtractr"
)

// Logger satisfies the xtractr.Logger interface.
type Logger struct {
	xtractr *log.Logger
	debug   *log.Logger
	info    *log.Logger
}

// Printf satisfies the xtractr.Logger interface.
func (l *Logger) Printf(msg string, v ...interface{}) {
	l.xtractr.Printf(msg, v...)
}

// Debug satisfies the xtractr.Logger interface.
func (l *Logger) Debugf(msg string, v ...interface{}) {
	l.debug.Printf(msg, v...)
}

// Infof printf an info line.
func (l *Logger) Infof(msg string, v ...interface{}) {
	l.info.Printf(msg, v...)
}

func main() {
	log := &Logger{
		xtractr: log.New(os.Stdout, "[XTRACTR] ", 0),
		debug:   log.New(os.Stdout, "[DEBUG] ", 0),
		info:    log.New(os.Stdout, "[INFO] ", 0),
	}
	q := xtractr.NewQueue(&xtractr.Config{
		Suffix:   "_xtractd",
		Logger:   log,
		Parallel: 1,
		FileMode: 0644, // ignored for tar files.
		DirMode:  0755,
	})
	defer q.Stop() // Stop() waits until all extractions finish.

	response := make(chan *xtractr.Response)
	// This sends an item into the extraction queue (buffered channel).
	q.Extract(&xtractr.Xtract{
		Name:       "my archive",    // name is not import to this library.
		SearchPath: "/tmp/archives", // can also be a direct file.
		CBChannel:  response,        // queue responses are sent here.
	})

	// Queue always sends two responses. 1 on start and again when finished (error or not)
	resp := <-response
	log.Infof("Extraction started: %s", strings.Join(resp.Archives, ", "))

	resp = <-response
	if resp.Error != nil {
		// There is possibly more data in the response that is useful even on error.
		// ie you may want to cleanup any partial extraction.
		log.Printf("Error: %v", resp.Error)
	}

	log.Infof("Extracted Files:\n - %s", strings.Join(resp.NewFiles, "\n - "))
}

Example 2 - Direct

This example shows ExtractFile() with a very simple XFile. You can choose output path, as well as file and dir modes. Failing to provide OutputDir results in unexpected behavior. ExtractFile() attempts to identify the type of file. If you know the file type you may call the direct method instead:

  • ExtractZIP(*XFile)
  • ExtractRAR(*XFile)
  • ExtractTar(*XFile)
  • ExtractGzip(*XFile)
  • ExtractBzip(*XFile)
  • ExtractTarGzip(*XFile)
  • ExtractTarBzip(*XFile)
  • Extract7z(*XFile)
package main

import (
	"log"
	"strings"

	"golift.io/xtractr"
)

func main() {
	x := &xtractr.XFile{
		FilePath:  "/tmp/myfile.zip",
		OutputDir: "/tmp/myfile", // do not forget this.
	}

	// size is how many bytes were written.
	// files may be nil, but will contain any files written (even with an error).
	size, files, err := xtractr.ExtractFile(x)
	if err != nil || files == nil {
		log.Fatal(size, files, err)
	}

	log.Println("Bytes written:", size, "Files Extracted:\n -", strings.Join(files, "\n -"))
}

This is what XFile looks like (today at least):

// XFile defines the data needed to extract an archive.
type XFile struct {
	FilePath  string      // Path to archive being extracted.
	OutputDir string      // Folder to extract archive into.
	FileMode  os.FileMode // Write files with this mode.
	DirMode   os.FileMode // Write folders with this mode.
	Password  string      // (RAR/7z) Archive password. Blank for none.
}

Documentation

Overview

Package xtractr provides methods and procedures to extract compressed archive files. It can be used in two ways. The simplest method is to pass an archive file path and an output path to ExtractFile(). This decompresses the provided file and returns some information about the data written.

The other, more sophisticated way to extract files is with a queue. The queue method allows you to send an Xtract into a channel where it's queued up and extracted in order. The number of concurrent extractions is configured when the queue is created. A provided callback method is run when a queued Xtract begins and it's run again when the Xtract finishes.

Index

Constants

View Source
const (
	DefaultDirMode  = 0o755
	DefaultFileMode = 0o644
	DefaultSuffix   = "_xtractr"
	// DefaultBufferSize is the size of the extraction buffer.
	// ie. How many jobs can be queued before things get slow.
	DefaultBufferSize = 1000
)

Sane defaults.

Variables

View Source
var (
	ErrQueueStopped       = fmt.Errorf("extractor queue stopped, cannot extract")
	ErrNoCompressedFiles  = fmt.Errorf("no compressed files found")
	ErrUnknownArchiveType = fmt.Errorf("unknown archive file type")
	ErrInvalidPath        = fmt.Errorf("archived file contains invalid path")
	ErrInvalidHead        = fmt.Errorf("archived file contains invalid header file")
	ErrQueueRunning       = fmt.Errorf("extractor queue running, cannot start")
	ErrNoConfig           = fmt.Errorf("call NewQueue() to initialize a queue")
	ErrNoLogger           = fmt.Errorf("xtractr.Config.Logger must be non-nil")
)

Custom errors returned by this module.

View Source
var ErrPotentialZipBomb = errors.New("compress ratio or file size limit exceeded, possibly zip bomb")
View Source
var MaxNormalCompressRatio = int64(1032)

Functions

func Difference

func Difference(slice1 []string, slice2 []string) []string

Difference returns all the strings that are in slice2 but not in slice1. Used to find new files in a file list from a path. ie. those we extracted. This is a helper method and only exposed for convenience. You do not have to call this.

func Extract7z

func Extract7z(xFile *XFile) (int64, []string, []string, error)

Extract7z extracts a 7zip archive. Volumes: https://github.com/bodgit/sevenzip/issues/54

func ExtractBzip

func ExtractBzip(xFile *XFile) (int64, []string, error)

ExtractBzip extracts a bzip2-compressed file. That is, a single file.

func ExtractFile

func ExtractFile(xFile *XFile) (int64, []string, []string, error)

ExtractFile calls the correct procedure for the type of file being extracted. Returns size of extracted data, list of extracted files, list of archives processed, and/or error.

func ExtractGzip

func ExtractGzip(xFile *XFile) (int64, []string, error)

ExtractGzip extracts a gzip-compressed file. That is, a single file.

func ExtractISO

func ExtractISO(xFile *XFile) (int64, []string, error)

ExtractISO writes an ISO's contents to disk.

func ExtractRAR

func ExtractRAR(xFile *XFile) (int64, []string, []string, error)

func ExtractTar

func ExtractTar(xFile *XFile) (int64, []string, error)

ExtractTar extracts a raw (non-compressed) tar archive.

func ExtractTarBzip

func ExtractTarBzip(xFile *XFile) (int64, []string, error)

ExtractTarBzip extracts a bzip2-compressed tar archive.

func ExtractTarGzip

func ExtractTarGzip(xFile *XFile) (int64, []string, error)

ExtractTarGzip extracts a gzip-compressed tar archive.

func ExtractZIP

func ExtractZIP(xFile *XFile) (int64, []string, error)

ExtractZIP extracts a zip file.. to a destination. Simple enough.

func FindCompressedFiles

func FindCompressedFiles(filter Filter) map[string][]string

FindCompressedFiles returns all the rar and zip files in a path. This attempts to grab only the first file in a multi-part archive. Sometimes there are multiple archives, so if the archive does not have "part" followed by a number in the name, then it will be considered an independent archive. Some packagers seem to use different naming schemes, so this will need to be updated as time progresses. So far it's working well. This is a helper method and only exposed for convenience. You do not have to call this.

func SetCompressRatioLimit

func SetCompressRatioLimit(newLimit int64)

Types

type Config

type Config struct {
	// Size of the extraction channel buffer. Default=1000.
	// Use -1 for unbuffered channel. Not recommend.
	BuffSize int
	// Number of concurrent extractions allowed.
	Parallel int
	// Filemode used when writing files, tar ignores this, so does Windows.
	FileMode os.FileMode
	// Filemode used when writing folders, tar ignores this.
	DirMode os.FileMode
	// The suffix used for temporary folders.
	Suffix string
	// Logs are sent to this Logger.
	Logger
}

Config is the input data to configure the Xtract queue. Fill this out and pass it into NewQueue() to create a queue for archive extractions.

type Exclude

type Exclude []string

Exclude represents an exclusion list.

func (Exclude) Has

func (e Exclude) Has(test string) bool

Has returns true if the test has an excluded suffix.

type Filter

type Filter struct {
	// This is the path to search in for archives.
	Path string
	// Any files with this suffix are ignored. ie. ".7z" or ."iso"
	ExcludeSuffix Exclude
}

Filter is the input to find compressed files.

type Logger

type Logger interface {
	Printf(string, ...interface{})
	Debugf(string, ...interface{})
}

Logger allows this library to write logs. Use this to capture them in your own flow.

type Response

type Response struct {
	// Extract Started (false) or Finished (true).
	Done bool
	// Size of data written.
	Size int64
	// Temporary output folder.
	Output string
	// Items still in queue.
	Queued int
	// When this extract began.
	Started time.Time
	// Elapsed extraction duration. ie. How long it took.
	Elapsed time.Duration
	// Extra archives extracted from within an archive.
	Extras map[string][]string
	// Initial archives found and extracted.
	Archives map[string][]string
	// Files written to final path.
	NewFiles []string
	// Error encountered, only when done=true.
	Error error
	// Copied from input data.
	X *Xtract
}

Response is sent to the call-back function. The first CBFunction call is just a notification that the extraction has started. You can determine it's the first call by chcking Response.Done. false = started, true = finished. When done=false the only other meaningful data provided is the re.Archives, re.Output and re.Queue.

type XFile

type XFile struct {
	// Path to archive being extracted.
	FilePath string
	// Folder to extract archive into.
	OutputDir string
	// Write files with this mode.
	FileMode os.FileMode
	// Write folders with this mode.
	DirMode os.FileMode
	// (RAR/7z) Archive password. Blank for none. Gets prepended to Passwords, below.
	Password string
	// (RAR/7z) Archive passwords (to try multiple).
	Passwords []string
}

XFile defines the data needed to extract an archive.

func (*XFile) Extract

func (x *XFile) Extract() (int64, []string, []string, error)

Extract calls the correct procedure for the type of file being extracted. Returns size of extracted data, list of extracted files, and/or error.

func (*XFile) Size

func (x *XFile) Size() (int64, error)

type Xtract

type Xtract struct {
	// Unused in this app; exposed for calling library.
	Name string
	// Archive password. Only supported with RAR and 7zip files. Prepended to Passwords.
	Password string
	// Archive passwords (try multiple). Only supported with RAR and 7zip files.
	Passwords []string
	// Folder path and filters describing where and how to find archives.
	Filter
	// Set DisableRecursion to true if you want to avoid extracting archives inside archives.
	DisableRecursion bool
	// Set RecurseISO to true if you want to recursively extract archives in ISO files.
	// If ISOs and other archives are found, none will not extract recursively if this is false.
	RecurseISO bool
	// Folder to extract data. Default is same level as SearchPath with a suffix.
	ExtractTo string
	// Leave files in temporary folder? false=move files back to Searchpath
	// Moving files back will cause the "extracted files" returned to only contain top-level items.
	TempFolder bool
	// Delete Archives after successful extraction? Be careful.
	DeleteOrig bool
	// Create a log (.txt) file of the extraction information.
	LogFile bool
	// Callback Function, runs twice per queued item.
	CBFunction func(*Response)
	// Callback Channel, msg sent twice per queued item.
	CBChannel chan *Response
	// Total Written Bytes for current work
	TotalUnpacked int64
}

Xtract defines the queue input data: data needed to extract files in a path. Fill this out to create a queued extraction and pass it into Xtractr.Extract(). If a CBFunction is provided it runs when the queued extract begins w/ Response.Done=false. The CBFunction is called again when the extraction finishes w/ Response.Done=true. The CBFunction channel works the exact same way, except it's a channel instead of a blocking function.

type Xtractr

type Xtractr struct {
	// contains filtered or unexported fields
}

Xtractr is what you get from NewQueue(). This is the main app struct. Use this struct to call Xtractr.Extract() to queue an extraction.

func NewQueue

func NewQueue(config *Config) *Xtractr

NewQueue returns a new Xtractr Queue you can send Xtract jobs into. This is where to start if you're creating an extractor queue. You must provide a Logger in the config, everything else is optional.

func (*Xtractr) DeleteFiles

func (x *Xtractr) DeleteFiles(files ...string)

DeleteFiles obliterates things and logs. Use with caution.

func (*Xtractr) Extract

func (x *Xtractr) Extract(extract *Xtract) (int, error)

Extract is how external code begins an extraction process against a path. To add an item to the extraction queue, create an Xtract struct with the search path set and pass it to this method. The current queue size is returned.

func (*Xtractr) GetFileList

func (x *Xtractr) GetFileList(path string) ([]string, error)

GetFileList returns all the files in a path. This is non-resursive and only returns files _in_ the base path provided. This is a helper method and only exposed for convenience. You do not have to call this.

func (*Xtractr) MoveFiles

func (x *Xtractr) MoveFiles(fromPath string, toPath string, overwrite bool) ([]string, error)

MoveFiles relocates files then removes the folder they were in. Returns the new file paths. This is a helper method and only exposed for convenience. You do not have to call this.

func (*Xtractr) Rename

func (x *Xtractr) Rename(oldpath, newpath string) error

Rename is an attempt to deal with "invalid cross link device" on weird file systems.

func (*Xtractr) Start

func (x *Xtractr) Start() error

Start restarts the queue. This can be called only after you call Stop().

func (*Xtractr) Stop

func (x *Xtractr) Stop()

Stop shuts down the extractor routines. Call this to shut things down.

Directories

Path Synopsis
cmd
xt
Package main is a binary used for demonstration purposes.
Package main is a binary used for demonstration purposes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL