gonativeextractor

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2024 License: MIT Imports: 7 Imported by: 0

README

NativeExtractor module for Golang

This is official Golang binding for the NativeExtractor project.

Installation

Requirements

Usage

Creating a Extractor

The following parameters are required to create an Extractor:

  • batch - number of logical symbols to be analyzed in the stream, if batch is negative, the default value is set to 2^16,
  • thread - the number of threads for miners to run on, if thread is negative, the default value is set to maximum threads available,
  • flags - initial flags, more about Flags.

The new Extractor must be deallocated with the Destroy method after use.

Returns a pointer to a new instance of Extractor.

More about Extractor.

e := gonativeextractor.NewExtractor(-1, -1, 0)

Adding miner

This function allows you to add miners from a Shared Object (.so library).

The following parameters are required to add a miner:

  • sodir - a path to the Shared Object,
  • symbol - Shared Object symbol and
  • params - i.e. what the miner mines, are optional - may be empty array or nil, but if present, has to be terminated with \x00.

The default path to the .so library is set to /usr/lib/nativeextractor_miners.

Returns an error if a path to a non-existent file is specified or if a non-existent miner name is specified.

More about Miners.

err := e.AddMinerSo(gonativeextractor.DEFAULT_MINERS_PATH+"/glob_entities.so", "match_glob", []byte("world\x00"))

Streams

There are two types of streams. One is the FileStream and the other is the BufferStream.

  • When creating a FileStream, the path to the file is required.
  • When creating a BufferStream, a byte array terminated with "\x00" is required.
st, err := gonativeextractor.NewFileStream("./fixtures/create_stream.txt")
st, err := gonativeextractor.NewBufferStream([]byte("Hello world byte\x00"))

The stream needs to be attached to the Extractor.

err = e.SetStream(st)

At this point you have an Extractor created, a miner added, and a stream created and attached.

Flags

It is also possible to explicitly set and unset flags.

An Extractor may have these flags enabled:

  • E_NO_ENCLOSED_OCCURRENCES
  • E_SORT_RESULTS

More about Flags.

err = e.SetFlags(gonativeextractor.E_SORT_RESULTS)
err = e.UnsetFlags(gonativeextractor.E_SORT_RESULTS)

Occurrences and batches

Now you can iterate through individual batches and their individual occurrences. Cycle within a cycle.

For iterate over the batches, you can use the Eof function, which returns bool, to check if the stream attached to the Extractor has ended. At the same time, you need to use the Next function, which primarily gives the next batch of found entities and returns the first found occurrence and error, if any.

Here comes the inner, second, iteration. The Eof function can be used again, but from the Occurrence class. This checks whether all occurrences have been read. Again, the Next function (again from the Occurrence class) must be used with it. This just moves the pointer to the next occurrence and returns nothing. If it is EOF, it does nothing.

From one occurrence you can get the following:

  • Str - Creates a string containing found occurrence,
  • Pos - Casts position of the found occurrence to Go integer type,
  • Upos - Same as Pos but with UTF position,
  • Len - Casts length of the found occurrence to Go integer type,
  • Ulen - Same as Len but with UTF length,
  • Label - Casts label of the found occurrence to Go string,
  • Prob - Casts probability of the found occurrence to Go float type.
for !e.Eof() {
    r, err := e.Next()
    if err != nil {
       return err
    }
    for !r.Eof() {
        fmt.Println(r.Str())   // "world"
        fmt.Println(r.Pos())   // 6
        fmt.Println(r.Upos())  // 6
        fmt.Println(r.Len())   // 5
        fmt.Println(r.Ulen())  // 5
        fmt.Println(r.Label()) // "Glob"
        fmt.Println(r.Prob())  // 1

        r.Next()
    }
}

Meta

There is also a Meta function that give you meta information about the Extractor such as:

  • Ldpath - Path to the .so library,
  • Ldsymb - Miner function name,
  • Meta - Meta info about miner functions and labels,
  • Params - Miner parameters,
  • Ldptr - Pointer to the loaded .so library.
meta := e.Meta()
fmt.Println(meta[0].Params)  // "world"
fmt.Println(meta[0].Meta)    // 1
fmt.Println(meta[0].Meta[0]) // "match_glob"

Destroy

In the end, it is necessary to destroy the Extractor with Destroy. This function will unset the stream if it has not been unset, close it, and then destroy the Extractor itself.

err = e.Destroy() 

If we explicitly want to call the UnsetStream function, the Close function must be called before.

err = st.Close()
e.UnsetStream()

Documentation

Index

Constants

View Source
const (
	// Disables enclosed occurrence feature.
	E_NO_ENCLOSED_OCCURRENCES = 1 << 0
	// Enables results ascending sorting of occurrence records.
	E_SORT_RESULTS = 1 << 1
)

Constants for valid NativeExtractor flags.

View Source
const DEFAULT_MINERS_PATH = "/usr/lib/nativeextractor_miners"

Default path to .so libs representing miners.

View Source
const POINTER_SIZE = bits.UintSize / 8

Size of pointer in bytes.

Variables

This section is empty.

Functions

This section is empty.

Types

type BufferStream

type BufferStream struct {
	Buffer []byte
	Ptr    *C.struct_stream_buffer_c
}

Structure representing buffer over heap memory.

func NewBufferStream

func NewBufferStream(buffer []byte) (*BufferStream, error)

Creates a new BufferStream.

Parameters:

  • buffer - byte array for stream initialization (has to be terminated with \x00).

Returns:

  • pointer to a new instance of BufferStream.
  • error if any occurred, nil otherwise.

func (*BufferStream) Check

func (ego *BufferStream) Check() bool

Checks if an error occurred in a BufferStream.

Returns:

  • true if an error occurred, false otherwise.

func (*BufferStream) Close

func (ego *BufferStream) Close() error

Closes a BufferStream.

Returns:

  • error if the stream has been already closed, nil otherwise.

func (*BufferStream) GetStream

func (ego *BufferStream) GetStream() *C.struct_stream_c

Gets the inner stream structure.

Returns:

  • pointer to the C struct stream_c.

type DlSymbol

type DlSymbol struct {
	// Path to the .so library
	Ldpath string
	// Miner function name
	Ldsymb string
	// Meta info about miner functions and labels
	Meta []string
	// Miner params
	Params string
	// Pointer to the loaded .so library
	Ldptr unsafe.Pointer
}

Structure with information about a miner.

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Analyzes next batch with miners.

func NewExtractor

func NewExtractor(batch int, threads int, flags uint32) *Extractor

Creates a new Extractor. Has to be deallocated with 'Destroy' method after use.

Parameters:

  • batch - number of logical symbols to be analyzed in the stream (if negative, defaults to 2^16),
  • threads - number of threads for miners to run on (if negative, defaults to maximum threads available),
  • flags - initial flags.

Returns:

  • pointer to a new instance of Extractor.

func (*Extractor) AddMinerSo

func (ego *Extractor) AddMinerSo(sodir string, symbol string, params []byte) error

Loads a Miner from a Shared Object (.so library).

Parameters:

  • sodir - a path to the shared object,
  • symbol - shared object symbol,
  • params - optional (may be empty array or nil, but if present, has to be terminated with \x00).

Returns:

  • error if any occurred, nil otherwise.

func (*Extractor) Destroy

func (ego *Extractor) Destroy() error

Destroys the Extractor.

Returns:

  • error if the extractor has been already closed, nil otherwise.

func (*Extractor) Eof

func (ego *Extractor) Eof() bool

Checks if the stream attached to the Extractor ended.

Returns:

  • true if there is nothing to read (stream ended or no stream set), false otherwise.

func (*Extractor) GetLastError

func (ego *Extractor) GetLastError() error

Gives the last error which occurred in Extractor.

Returns:

  • error, nil if no error occurred.

func (*Extractor) Meta

func (ego *Extractor) Meta() []DlSymbol

Gives the meta information about Extractor.

Returns:

  • slice of structures with information about miners.

func (*Extractor) Next

func (ego *Extractor) Next() (Occurrencer, error)

Gives the next batch of found entities.

Returns:

  • the first found occurrence,
  • error, if any occurred.

func (*Extractor) SetFlags

func (ego *Extractor) SetFlags(flags uint32) error

Sets NativeExtractor flags.

Parameters:

  • flags - use constants defined above.

Returns:

  • error if any occurred, nil otherwise.

func (*Extractor) SetStream

func (ego *Extractor) SetStream(stream Streamer) error

Sets a stream to the Extractor.

Parameters:

  • stream - an instance of Streamer interface.

Returns:

  • error if any occurred, nil otherwise.

func (*Extractor) UnsetFlags

func (ego *Extractor) UnsetFlags(flags uint32) error

Unsets NativeExtractor flags.

Parameters:

  • flags - use constants defined above.

Returns:

  • error if any occurred, nil otherwise.

func (*Extractor) UnsetStream

func (ego *Extractor) UnsetStream()

Dettaches the stream from the Extractor.

type FileStream

type FileStream struct {
	Ptr  *C.struct_stream_file_c
	Path string
}

Structure representing stream from file.

func NewFileStream

func NewFileStream(path string) (*FileStream, error)

Creates a new FileStream.

Parameters:

  • path - path to a file.

Returns:

  • pointer to a new instance of FileStream.
  • error if any occurred, nil otherwise.

func (*FileStream) Check

func (ego *FileStream) Check() bool

Checks if an error occurred in a FileStream.

Returns:

  • true if an error occurred, false otherwise.

func (*FileStream) Close

func (ego *FileStream) Close() error

Closes a FileStream.

Returns:

  • error if the stream has been already closed, nil otherwise.

func (*FileStream) GetStream

func (ego *FileStream) GetStream() *C.struct_stream_c

Gets the inner stream structure.

Returns:

  • pointer to the C struct stream_c.

type Occurrence

type Occurrence struct {
	// contains filtered or unexported fields
}

Struct implementing Occurrencer, contains a pointer to the current occurrence (C struct).

func (*Occurrence) Eof

func (ego *Occurrence) Eof() bool

Checks if all occurrences have been read.

Returns:

  • true if there is nothing to read (current pointer is nil), false otherwise.

func (*Occurrence) Label

func (ego *Occurrence) Label() string

Casts label of the found occurrence to Go string.

Returns:

  • label of the found entity.

func (*Occurrence) Len

func (ego *Occurrence) Len() uint32

Casts length of the found occurrence to Go integer type.

Returns:

  • length of the found occurrence (in bytes).

func (*Occurrence) Next

func (ego *Occurrence) Next()

Moves the pointer to the next occurrence. If EOF, does nothing.

func (*Occurrence) Pos

func (ego *Occurrence) Pos() uint64

Casts position of the found occurrence to Go integer type.

Returns:

  • position of the found occurrence (in bytes).

func (*Occurrence) Prob

func (ego *Occurrence) Prob() float64

Casts probability of the found occurrence to Go float type.

Returns:

  • probability of the occurrence.

func (*Occurrence) Str

func (ego *Occurrence) Str() string

Creates a string containing found occurrence.

Returns:

  • found occurrence.

func (*Occurrence) Ulen

func (ego *Occurrence) Ulen() uint32

Casts UTF length of the found occurrence to Go integer type.

Returns:

  • length of the found occurrence (in unicode characters).

func (*Occurrence) Upos

func (ego *Occurrence) Upos() uint64

Casts UTF position of the found occurrence to Go integer type.

Returns:

  • position of the found occurrence (in unicode characters).

type Occurrencer

type Occurrencer interface {
	Next()
	Eof() bool
	Str() string
	Pos() uint64
	Upos() uint64
	Len() uint32
	Ulen() uint32
	Label() string
	Prob() float64
}

Interface alowing an access to the found occurrence.

type Streamer

type Streamer interface {
	GetStream() *C.struct_stream_c
	Check() bool
	io.Closer
}

Interface for streams.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL