gocorenlp

package module
v0.6.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2024 License: AGPL-3.0 Imports: 0 Imported by: 0

README

gocorenlp

Go Report Card Go Reference

A Go (Golang) client for Stanford CoreNLP server.

Installation

Similar to getting other Go libraries, run the go get command for the latest version:

go get github.com/donyori/gocorenlp@latest

Or specify a particular version (for example, v0.1.0):

go get github.com/donyori/gocorenlp@v0.1.0

For more information, see go get documentation here and here.

Usage

0. Prerequisites

We assume that you are familiar with the Stanford CoreNLP server. If not, see its official page.

Before using this library, you need to have Stanford CoreNLP installed or have a Stanford CoreNLP server available somewhere. To install Stanford CoreNLP, see its download page.


1. Start with the client package

The client package provides functionality to annotate your human language text with the CoreNLP server. See the documentation of package client for details.

Now we assume that you have already launched a CoreNLP server on 127.0.0.1:9000. (If your server is elsewhere, see Section 2.)

Here is a simple example of using this server to annotate the text

The quick brown fox jumped over the lazy dog.

package main

import (
	"fmt"

	"github.com/donyori/gocorenlp/client"
	"github.com/donyori/gocorenlp/model/v4.5.6-eb50467fa8e3/pb"
)

func main() {
	text := "The quick brown fox jumped over the lazy dog."
	annotators := "tokenize,ssplit,pos,lemma"

	// Specify the document model.
	// Depending on your CoreNLP version, use the appropriate model.
	// See package github.com/donyori/gocorenlp/model for details.
	doc := new(pb.Document)

	// Annotate the text with the specified annotators
	// and store the result in doc.
	err := client.AnnotateString(text, annotators, doc)
	if err != nil {
		panic(err) // handle error
	}

	// Print some annotation results.
	fmt.Println("Original text:", doc.GetText())
	fmt.Println("+--------+-----+--------+")
	fmt.Println("| Word   | POS | Lemma  |")
	fmt.Println("+--------+-----+--------+")
	for _, token := range doc.GetSentence()[0].GetToken() {
		fmt.Printf(
			"| %-7s| %-4s| %-7s|\n",
			token.GetWord(),
			token.GetPos(),
			token.GetLemma(),
		)
	}
	fmt.Println("+--------+-----+--------+")
}

It outputs:

Original text: The quick brown fox jumped over the lazy dog.
+--------+-----+--------+
| Word   | POS | Lemma  |
+--------+-----+--------+
| The    | DT  | the    |
| quick  | JJ  | quick  |
| brown  | JJ  | brown  |
| fox    | NN  | fox    |
| jumped | VBD | jump   |
| over   | IN  | over   |
| the    | DT  | the    |
| lazy   | JJ  | lazy   |
| dog    | NN  | dog    |
| .      | .   | .      |
+--------+-----+--------+

2. Create a client with custom settings

The previous example uses the default client, which can only connect to the CoreNLP server on 127.0.0.1:9000.

If you want to use a CoreNLP server elsewhere, you need to create a new client.

To create a client, use function New. The supported options are in struct Options.

Here is an example snippet:

c, err := client.New(&client.Options{
	Hostname:   "localhost", // Set the hostname here. If omitted, "127.0.0.1" is used.
	Port:       8080,        // Set the port number here. If omitted, 9000 is used.
	StatusPort: 8081,        // Set the port number of the status server here. If omitted, it is the same as Port.

	ClientTimeout: time.Second * 15,      // Set a timeout for each request here.
	Charset:       "utf-8",               // Set the charset of your text here. If omitted, "utf-8" is used.
	Annotators:    "tokenize,ssplit,pos", // Set the default annotators here.

	// Set the username and password here
	// if your server requires a basic auth.
	Username: "Alice",
	Password: "Alice's password",

	// If your server has a server ID
	// (i.e., server name, set by -server_id),
	// set it here.
	ServerID: "CoreNLPServer",
})
if err != nil {
	panic(err) // handle error
}
// Now you can work with the new client c.

3. Check status and stop server

You can check if the server is online (liveness) and ready to accept connections (readiness) using functions/methods Live and Ready:

if err := client.Live(); err == nil {
	fmt.Println("Server is live.")
} else {
	fmt.Println("Server is offline.")
}

if err := client.Ready(); err == nil {
	fmt.Println("Server is ready to accept connections.")
} else {
	fmt.Println("Server is not ready.")
}

In addition, you can shut down the server through the client.

To shut down a local server, use function Shutdown or client's method ShutdownLocal:

err := client.Shutdown()
if err == nil {
	fmt.Println("Server has been shut down.")
} else {
	panic(err) // handle error
}

To shut down a remote server, you need to provide the shutdown key and using client's method Shutdown. (Don't know what the shutdown key is? See here.)


4. Cache annotation results

You can cache the annotation results for future use. To do this, use functions/methods AnnotateRaw or AnnotateStringRaw.

Here is an example snippet to cache the annotation results in a local file:

// Create a file to save the annotation result.
filename := "./annotation.ann"
f, err := os.Create(filename)
if err != nil {
	panic(err) // handle error
}
defer f.Close()

// Annotate the text with the specified annotators
// and store the result in f.
_, err = client.AnnotateStringRaw(text, annotators, f)
if err != nil {
	panic(err) // handle error
}
// Then you can use the annotation by reading it from the file.

AnnotateRaw and AnnotateStringRaw output data without parsing. You can decode it into the document model by function DecodeResponseBody in our package model:

doc := new(pb.Document) // specify your document model
err := model.DecodeResponseBody(data, doc) // data is that output by AnnotateRaw or AnnotateStringRaw
if err != nil {
	panic(err) // handle error
}

If you put many annotation results together in a large file, you can decode them using ResponseBodyDecoder:

// Open your annotation result file.
filename := "./annotation.ann"
f, err := os.Open(filename)
if err != nil {
	panic(err) // handle error
}
defer f.Close()

// Create a ResponseBodyDecoder on it.
dec := model.NewResponseBodyDecoder(f)

var doc pb.Document // specify your document model
// Decode the annotation results until EOF.
for {
	err := dec.Decode(&doc)
	if err != nil {
		if errors.Is(err, io.EOF) {
			break
		}
		panic(err) // handle error
	}
	// Work with doc.
}

5. Annotation data model

The CoreNLP server provides several forms to present annotation results, such as JSON, XML, text, and serialized formats (including ProtoBuf). (See this page for details.)

Our client asks the CoreNLP server to serialize the results in Protocol Buffers (ProtoBuf).

At the current stage, we provide the models supporting CoreNLP 3.6.0, and 4.0.0 to 4.5.6. These models are organized into subpackages of model named in the form

github.com/donyori/gocorenlp/model/vX.Y.Z-abcdefabcdef/pb

where vX.Y.Z-abcdefabcdef means:

  • X.Y.Z is the version of CoreNLP.
  • abcdefabcdef is a 12-character prefix of the commit hash of the retrieved ProtoBuf file in the Stanford CoreNLP project.

For example, the model for CoreNLP 4.5.0 is in

github.com/donyori/gocorenlp/model/v4.5.0-45b47e245c36/pb

See the documentation of package model for details.

The following table shows the correspondence between models and CoreNLP versions:

Model Package CoreNLP Version
model/v3.6.0-29765338a2e8/pb 3.6.0
model/v4.0.0-2b3dd38abe00/pb 4.0.0
model/v4.1.0-a1427196ba6e/pb 4.1.0
model/v4.2.0-3ad83fc2e42e/pb 4.2.0
model/v4.2.1-d8d09b2c81a5/pb 4.2.1, 4.2.2
model/v4.3.0-f885cd198767/pb 4.3.0, 4.3.1, 4.3.2
model/v4.4.0-e90f30f13c40/pb 4.4.0
model/v4.5.0-45b47e245c36/pb 4.5.0, 4.5.1
model/v4.5.2-9c3dfee5af50/pb 4.5.2
model/v4.5.3-5250f9faf9f1/pb 4.5.3, 4.5.4
model/v4.5.5-f1b929e47a57/pb 4.5.5
model/v4.5.6-eb50467fa8e3/pb 4.5.6

(CoreNLP 4.2.1 and 4.2.2 use exactly the identical ProtoBuf model, as do 4.3.0, 4.3.1, 4.3.2, and 4.5.0, 4.5.1, and 4.5.3, 4.5.4.)

Our library also accepts the ProtoBuf model generated by you. You can retrieve the CoreNLP ProtoBuf file (edu.stanford.nlp.pipeline.CoreNLP.proto) from your CoreNLP JAR file (usually stanford-corenlp-X.Y.Z-sources.jar), edit it, and compile it to .go file. Then pass your Document struct to our API to make it work with your CoreNLP server:

doc := new(mymodel.Document) // mymodel.Document is the document struct compiled by you
err := client.AnnotateString(text, annotators, doc)
if err != nil {
	panic(err) // handle error
}

You can also retrieve the CoreNLP ProtoBuf file from its GitHub repository.

About how to compile ProtoBuf to Go, see this tutorial.


For more documentation about this library, see on pkg.go.dev.

License

The GNU Affero General Public License 3.0 (AGPL-3.0) - Yuan Gao. Please have a look at the LICENSE.

Contact

You can contact me by email: <donyoridoyodoyo@outlook.com>.

Documentation

Overview

Package gocorenlp provides a Go (Golang) client for Stanford CoreNLP server.

Please start with its subpackage client.

Directories

Path Synopsis
Package client provides an HTTP client for the Stanford CoreNLP server.
Package client provides an HTTP client for the Stanford CoreNLP server.
Package errors defines some error types and provides functions to identify the type of specified error.
Package errors defines some error types and provides functions to identify the type of specified error.
internal
pbtest
Package pbtest provides functions for testing ProtoBuf models.
Package pbtest provides functions for testing ProtoBuf models.
Package model defines structures for Stanford CoreNLP in its subpackages and provides functions to parse them from ProtoBuf wire encoding.
Package model defines structures for Stanford CoreNLP in its subpackages and provides functions to parse them from ProtoBuf wire encoding.
v3.6.0-29765338a2e8
Package v3_6_0_29765338a2e8 corresponds to Stanford CoreNLP 3.6.0, with commit hash 29765338a2e8d82fc8cef5b34a5cf56a69b0669f.
Package v3_6_0_29765338a2e8 corresponds to Stanford CoreNLP 3.6.0, with commit hash 29765338a2e8d82fc8cef5b34a5cf56a69b0669f.
v3.6.0-29765338a2e8/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 3.6.0.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 3.6.0.
v4.0.0-2b3dd38abe00
Package v4_0_0_2b3dd38abe00 corresponds to Stanford CoreNLP 4.0.0, with commit hash 2b3dd38abe002bf8407bb22e9fd6d0fa78e7f985.
Package v4_0_0_2b3dd38abe00 corresponds to Stanford CoreNLP 4.0.0, with commit hash 2b3dd38abe002bf8407bb22e9fd6d0fa78e7f985.
v4.0.0-2b3dd38abe00/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.0.0.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.0.0.
v4.1.0-a1427196ba6e
Package v4_1_0_a1427196ba6e corresponds to Stanford CoreNLP 4.1.0, with commit hash a1427196ba6efc79a60279dd95d9bf2baa8a3549.
Package v4_1_0_a1427196ba6e corresponds to Stanford CoreNLP 4.1.0, with commit hash a1427196ba6efc79a60279dd95d9bf2baa8a3549.
v4.1.0-a1427196ba6e/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.1.0.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.1.0.
v4.2.0-3ad83fc2e42e
Package v4_2_0_3ad83fc2e42e corresponds to Stanford CoreNLP 4.2.0, with commit hash 3ad83fc2e42e9658f808e10619abc4f4cbc22069.
Package v4_2_0_3ad83fc2e42e corresponds to Stanford CoreNLP 4.2.0, with commit hash 3ad83fc2e42e9658f808e10619abc4f4cbc22069.
v4.2.0-3ad83fc2e42e/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.2.0.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.2.0.
v4.2.1-d8d09b2c81a5
Package v4_2_1_d8d09b2c81a5 corresponds to Stanford CoreNLP 4.2.1 and 4.2.2, with commit hash d8d09b2c81a5094b83f1275af362329d495e7170.
Package v4_2_1_d8d09b2c81a5 corresponds to Stanford CoreNLP 4.2.1 and 4.2.2, with commit hash d8d09b2c81a5094b83f1275af362329d495e7170.
v4.2.1-d8d09b2c81a5/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.2.1 and 4.2.2.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.2.1 and 4.2.2.
v4.3.0-f885cd198767
Package v4_3_0_f885cd198767 corresponds to Stanford CoreNLP 4.3.0, 4.3.1, and 4.3.2, with commit hash f885cd198767219875f08479d3819493bacc8637.
Package v4_3_0_f885cd198767 corresponds to Stanford CoreNLP 4.3.0, 4.3.1, and 4.3.2, with commit hash f885cd198767219875f08479d3819493bacc8637.
v4.3.0-f885cd198767/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.3.0, 4.3.1, and 4.3.2.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.3.0, 4.3.1, and 4.3.2.
v4.4.0-e90f30f13c40
Package v4_4_0_e90f30f13c40 corresponds to Stanford CoreNLP 4.4.0, with commit hash e90f30f13c40fc00c41f67d48900c8760453c046.
Package v4_4_0_e90f30f13c40 corresponds to Stanford CoreNLP 4.4.0, with commit hash e90f30f13c40fc00c41f67d48900c8760453c046.
v4.4.0-e90f30f13c40/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.4.0.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.4.0.
v4.5.0-45b47e245c36
Package v4_5_0_45b47e245c36 corresponds to Stanford CoreNLP 4.5.0 and 4.5.1, with commit hash 45b47e245c367663bba2e81a26ea7c29262ad0d8.
Package v4_5_0_45b47e245c36 corresponds to Stanford CoreNLP 4.5.0 and 4.5.1, with commit hash 45b47e245c367663bba2e81a26ea7c29262ad0d8.
v4.5.0-45b47e245c36/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.0 and 4.5.1.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.0 and 4.5.1.
v4.5.2-9c3dfee5af50
Package v4_5_2_9c3dfee5af50 corresponds to Stanford CoreNLP 4.5.2, with commit hash 9c3dfee5af50a2279429ae9e010ba51c8f91b351.
Package v4_5_2_9c3dfee5af50 corresponds to Stanford CoreNLP 4.5.2, with commit hash 9c3dfee5af50a2279429ae9e010ba51c8f91b351.
v4.5.2-9c3dfee5af50/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.2.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.2.
v4.5.3-5250f9faf9f1
Package v4_5_3_5250f9faf9f1 corresponds to Stanford CoreNLP 4.5.3 and 4.5.4, with commit hash 5250f9faf9f192a2350000b7fecf65d1d5b63e13.
Package v4_5_3_5250f9faf9f1 corresponds to Stanford CoreNLP 4.5.3 and 4.5.4, with commit hash 5250f9faf9f192a2350000b7fecf65d1d5b63e13.
v4.5.3-5250f9faf9f1/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.3 and 4.5.4.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.3 and 4.5.4.
v4.5.5-f1b929e47a57
Package v4_5_5_f1b929e47a57 corresponds to Stanford CoreNLP 4.5.5, with commit hash f1b929e47a57d9ff0a17b2d6789fe73705ad24b3.
Package v4_5_5_f1b929e47a57 corresponds to Stanford CoreNLP 4.5.5, with commit hash f1b929e47a57d9ff0a17b2d6789fe73705ad24b3.
v4.5.5-f1b929e47a57/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.5.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.5.
v4.5.6-eb50467fa8e3
Package v4_5_6_eb50467fa8e3 corresponds to Stanford CoreNLP 4.5.6, with commit hash eb50467fa8e3f44b5aee53394231d2f68e6d130b.
Package v4_5_6_eb50467fa8e3 corresponds to Stanford CoreNLP 4.5.6, with commit hash eb50467fa8e3f44b5aee53394231d2f68e6d130b.
v4.5.6-eb50467fa8e3/pb
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.6.
Package pb provides auto-generated structures for the data set of Stanford CoreNLP 4.5.6.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL