Documentation ¶
Overview ¶
Package rtfparser implements a parser for the Rich Text Format.
This code is forked from https://github.com/IntelligenceX/fileconversion, which itself is forked from https://github.com/J45k4/rtf-go and extracts text from RTF files.
I ported it from standard lib's regexp package to github.com/dlclark/regexp2, hoping the use of FindNextMatch() instead of FindAllStringSubmatch() might lower memory requirements when processing large files. While this seems to be the case the parser still is very inefficient for larger files (e.g. those containing images.)
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrNoRtf error = errors.New("rtfparser: document is not an RTF")
Functions ¶
func IsFileRTF ¶
IsFileRTF checks if the data indicates a RTF file RTF has a signature of 7B 5C 72 74 66 31, or in string "{\rtf1"
func Rtf2SingleLine ¶
Rtf2SingleLine converts RTF formatted input to plain text without preserving any layout and formatting, returning one long string
Types ¶
type RichTextDoc ¶
type RichTextDoc struct {
// contains filtered or unexported fields
}
func NewFromBytes ¶
func NewFromBytes(data []byte) (d *RichTextDoc, err error)
func (*RichTextDoc) MetadataMap ¶
func (d *RichTextDoc) MetadataMap() map[string]string
func (*RichTextDoc) StreamText ¶
func (d *RichTextDoc) StreamText(w io.Writer)
func (*RichTextDoc) Text ¶
func (d *RichTextDoc) Text() string
type RtfMetadata ¶
type RtfMetadata struct {
Author, Comment, Company, Category, Operator, Subject, Title string
// Created and last modified times have no timezone information attached in RTF files.
// This packages returns them as local times, effectively attaching the running systems timezone information.
Created, Modified *time.Time
}
func GetRtfInfo ¶
func GetRtfInfo(inputRtf string) (m RtfMetadata, err error)
GetRtfInfo extracts some metadata from the RTF input string.