exporter

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 12, 2023 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package exporter provides tools for extracting and converting chat session data from JSON files into various formats, such as CSV and JSON datasets.

This package facilitates tasks like data visualization, reporting, and machine learning data preparation.

The exporter package defines types to represent chat sessions, messages, and associated metadata.

It includes functions to:

  • Read chat session data from JSON files
  • Convert sessions to CSV with different formatting options
  • Create separate CSV files for sessions and messages
  • Extract sessions to a JSON format for Hugging Face datasets

The package also handles fields in the source JSON that may be represented as either strings or integers by using the custom StringOrInt type.

Additionally, it now supports context-aware operations, allowing for better control over long-running processes and the ability to cancel them if needed.

Code:

func (soi *StringOrInt) UnmarshalJSON(data []byte) error {
	// Try unmarshalling into a string
	var s string
	if err := json.Unmarshal(data, &s); err != nil {
		// If there is an error, try unmarshalling into an int
		var i int64
		if err := json.Unmarshal(data, &i); err != nil {
			return err // Return the error if it is not a string or int
		}
		// Convert int to string and assign it to the custom type
		*soi = StringOrInt(strconv.FormatInt(i, 10))
		return nil
	}
	// If no error, assign the string value to the custom type
	*soi = StringOrInt(s)
	return nil
}

Usage examples:

To read chat sessions from a JSON file and convert them to a CSV format with context support:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

store, err := exporter.ReadJSONFromFile("path/to/chat-sessions.json")
if err != nil {
    log.Fatal(err)
}
err = exporter.ConvertSessionsToCSV(ctx, store.ChatNextWebStore.Sessions, exporter.FormatOptionInline, "output.csv")
if err != nil {
    log.Fatal(err)
}

To create separate CSV files for sessions and messages:

err = exporter.CreateSeparateCSVFiles(store.ChatNextWebStore.Sessions, "sessions.csv", "messages.csv")
if err != nil {
    log.Fatal(err)
}

To extract chat sessions to a JSON dataset:

datasetJSON, err := exporter.ExtractToDataset(store.ChatNextWebStore.Sessions)
if err != nil {
    log.Fatal(err)
}
fmt.Println(datasetJSON)

Copyright (c) 2023 H0llyW00dzZ

Index

Constants

View Source
const (
	// FormatOptionInline specifies the format where messages are displayed inline.
	FormatOptionInline = iota + 1

	// FormatOptionPerLine specifies the format where each message is on a separate line.
	FormatOptionPerLine

	// FormatOptionJSON specifies the format where messages are encoded as JSON.
	FormatOptionJSON

	// OutputFormatSeparateCSVFiles specifies the option to create separate CSV files for sessions and messages.
	OutputFormatSeparateCSVFiles
)

Variables

This section is empty.

Functions

func ConvertSessionsToCSV

func ConvertSessionsToCSV(ctx context.Context, sessions []Session, formatOption int, outputFilePath string) error

ConvertSessionsToCSV writes a slice of Session objects into a CSV file with support for context cancellation.

It delegates the writing of sessions to format-specific functions based on the formatOption provided.

The outputFilePath parameter specifies the path to the output CSV file.

It returns an error if the context is cancelled, the format option is invalid, or writing to the CSV fails.

func CreateSeparateCSVFiles

func CreateSeparateCSVFiles(sessions []Session, sessionsFileName string, messagesFileName string) (err error)

CreateSeparateCSVFiles creates two separate CSV files for sessions and messages from a slice of Session objects.

It takes the file names as parameters and returns an error if the files cannot be created or if writing the data fails.

Errors from closing files or flushing data to the CSV writers are captured and will be returned after all operations are attempted.

Error messages are logged to the console.

func ExtractToDataset

func ExtractToDataset(sessions []Session) (string, error)

ExtractToDataset converts a slice of Session objects into a JSON formatted string suitable for use as a dataset in machine learning applications.

It returns an error if marshaling the sessions into JSON format fails.

func WriteHeaders added in v1.1.8

func WriteHeaders(csvWriter *csv.Writer, headers []string) error

WriteHeaders writes the provided headers to the csv.Writer.

func WriteMessageData added in v1.1.8

func WriteMessageData(csvWriter *csv.Writer, sessions []Session) error

WriteMessageData writes message data to the provided csv.Writer.

func WriteSessionData added in v1.1.8

func WriteSessionData(csvWriter *csv.Writer, sessions []Session) error

WriteSessionData writes session data to the provided csv.Writer.

Types

type ChatNextWebStore

type ChatNextWebStore struct {
	ChatNextWebStore Store `json:"chat-next-web-store"`
}

ChatNextWebStore is a wrapper for Store that aligns with the expected JSON structure for a chat-next-web-store object.

func ReadJSONFromFile

func ReadJSONFromFile(filePath string) (ChatNextWebStore, error)

ReadJSONFromFile reads a JSON file from the given file path and unmarshals it into a ChatNextWebStore struct.

It returns an error if the file cannot be opened, the JSON is invalid, or the JSON format does not match the expected ChatNextWebStore format.

type Mask

type Mask struct {
	ID        StringOrInt `json:"id"` // Use the custom type for ID
	Avatar    string      `json:"avatar"`
	Name      string      `json:"name"`
	Lang      string      `json:"lang"`
	CreatedAt int64       `json:"createdAt"` // Assuming it's a Unix timestamp
}

Mask represents an anonymization mask for a participant in a chat session, including the participant's ID, avatar link, name, language, and creation timestamp.

type Message

type Message struct {
	ID      string `json:"id"`
	Date    string `json:"date"`
	Role    string `json:"role"`
	Content string `json:"content"`
}

Message represents a single message within a chat session, including metadata like the ID, date, role of the sender, and the content of the message itself.

type Session

type Session struct {
	ID                 string    `json:"id"`
	Topic              string    `json:"topic"`
	MemoryPrompt       string    `json:"memoryPrompt"`
	Stat               Stat      `json:"stat"`
	LastUpdate         int64     `json:"lastUpdate"` // Changed to int64 assuming it's a Unix timestamp
	LastSummarizeIndex int       `json:"lastSummarizeIndex"`
	Mask               Mask      `json:"mask"`
	Messages           []Message `json:"messages"`
}

Session represents a single chat session, including session metadata, statistics, messages, and the mask for the participant.

type Stat

type Stat struct {
	TokenCount int `json:"tokenCount"`
	WordCount  int `json:"wordCount"`
	CharCount  int `json:"charCount"`
}

Stat represents statistics for a chat session, such as the count of tokens, words, and characters.

type Store

type Store struct {
	Sessions []Session `json:"sessions"`
}

Store encapsulates a collection of chat sessions.

type StringOrInt

type StringOrInt string

StringOrInt is a custom type to handle JSON values that can be either strings or integers (Magic Golang 🎩 🪄).

It implements the Unmarshaler interface to handle this mixed type when unmarshaling JSON data.

func (*StringOrInt) UnmarshalJSON

func (soi *StringOrInt) UnmarshalJSON(data []byte) error

UnmarshalJSON is a custom unmarshaler for StringOrInt that tries to unmarshal JSON data as a string, and if that fails, as an integer, which is then converted to a string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL