rtfparser

package
v2.0.0-...-98876da Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2024 License: GPL-3.0 Imports: 10 Imported by: 0

Documentation

Overview

Package rtfparser implements a parser for the Rich Text Format.

This code is forked from https://github.com/IntelligenceX/fileconversion, which itself is forked from https://github.com/J45k4/rtf-go and extracts text from RTF files.

I ported it from standard lib's regexp package to github.com/dlclark/regexp2, hoping the use of FindNextMatch() instead of FindAllStringSubmatch() might lower memory requirements when processing large files. While this seems to be the case the parser still is very inefficient for larger files (e.g. those containing images.)

Index

Constants

This section is empty.

Variables

View Source
var ErrNoRtf error = errors.New("rtfparser: document is not an RTF")

Functions

func IsFileRTF

func IsFileRTF(data []byte) bool

IsFileRTF checks if the data indicates a RTF file RTF has a signature of 7B 5C 72 74 66 31, or in string "{\rtf1"

func Rtf2SingleLine

func Rtf2SingleLine(inputRtf string) string

Rtf2SingleLine converts RTF formatted input to plain text without preserving any layout and formatting, returning one long string

func Rtf2Text

func Rtf2Text(inputRtf string) string

Rtf2Text removes rtf characters from string and returns the new string. This function retains some of the layout, e.g. paragraphs/newlines tabs and tables.

Types

type RichTextDoc

type RichTextDoc struct {
	// contains filtered or unexported fields
}

func NewFromBytes

func NewFromBytes(data []byte) (d *RichTextDoc, err error)

func (*RichTextDoc) Close

func (d *RichTextDoc) Close()

Close is a no-op for RTFs

func (*RichTextDoc) MetadataMap

func (d *RichTextDoc) MetadataMap() map[string]string

func (*RichTextDoc) StreamText

func (d *RichTextDoc) StreamText(w io.Writer)

func (*RichTextDoc) Text

func (d *RichTextDoc) Text() string

type RtfMetadata

type RtfMetadata struct {
	Author, Comment, Company, Category, Operator, Subject, Title string
	// Created and last modified times have no timezone information attached in RTF files.
	// This packages returns them as local times, effectively attaching the running systems timezone information.
	Created, Modified *time.Time
}

func GetRtfInfo

func GetRtfInfo(inputRtf string) (m RtfMetadata, err error)

GetRtfInfo extracts some metadata from the RTF input string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL