backend

package
v0.0.17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 13, 2022 License: Apache-2.0 Imports: 22 Imported by: 0

Documentation

Overview

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

TODO: should this be a subpackage?

Index

Constants

View Source
const (
	// BitbakeMaxBytesLine sets a larger maximum for file line scanning than
	// the default of bufio.MaxScanTokenSize which is sort of small.
	BitbakeMaxBytesLine = 1024 * 1024 * 8 // 8 MiB

	// BitbakeLicensePrefix is the string we look for when trying to find a
	// license.
	BitbakeLicensePrefix = `LICENSE = "`

	// BitbakeLicenseSuffix is the terminating string at the end of the
	// line. We must not include the newline here.
	BitbakeLicenseSuffix = `"`

	// BitbakeFilenameSuffix is the file extension used by the bitbake
	// files.
	BitbakeFilenameSuffix = ".bb"
)
View Source
const (
	// CranLicensePrefix is the string we look for when trying to find a
	// license.
	CranLicensePrefix = "License"

	// CranFilename is the filename used by the R metadata files.
	CranFilename = "DESCRIPTION"
)
View Source
const (
	// AskalonoConfidenceError is the error string askalono returns for when
	// it doesn't have high enough confidence in a file.
	AskalonoConfidenceError = "Confidence threshold not high enough for any known license"
)
View Source
const (
	// PomFilename is the file name used by the pomfiles.
	PomFilename = "pom.xml"
)
View Source
const (
	// RegexpMaxBytesLine sets a larger maximum for file line scanning than
	// the default of bufio.MaxScanTokenSize which is sort of small.
	RegexpMaxBytesLine = 1024 * 1024 * 8 // 8 MiB
)
View Source
const (
	// ScancodeProgram is the name of the scancode executable.
	ScancodeProgram = "scancode"
)
View Source
const (
	// SpdxMaxBytesLine sets a larger maximum for file line scanning than
	// the default of bufio.MaxScanTokenSize which is sort of small.
	SpdxMaxBytesLine = 1024 * 1024 * 8 // 8 MiB

)

Variables

View Source
var (
	// ErrInvalidLicenseFormat is an error used in the
	// CranDescriptionFileSubParser when licenses with invalid format are
	// found.
	ErrInvalidLicenseFormat = errors.New("invalid format in License(s)")
)

Functions

func CranDescriptionFileSubParser added in v0.0.7

func CranDescriptionFileSubParser(input string) ([]string, error)

CranDescriptionFileSubParser is used to parse the License field in DESCRIPTION files.

Types

type Askalono added in v0.0.2

type Askalono struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir
	// contains filtered or unexported fields
}

Askalono is based on the rust askalono project. It uses the Sørensen–Dice coefficient for license comparison. It would be pretty easy, and preferable to use one of the many pre-existing golang Sørensen–Dice implementations and to have a pure golang solution for this, however it would be good to have at least one backend that exec's out to a remote process, and since this one is fairly self-contained, it is a good example to use before we try and wrap something more complicated like scancode. See: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient

func (*Askalono) ScanPath added in v0.0.2

func (obj *Askalono) ScanPath(ctx context.Context, path safepath.Path, info *interfaces.Info) (*interfaces.Result, error)

func (*Askalono) Setup added in v0.0.2

func (obj *Askalono) Setup(ctx context.Context) error

func (*Askalono) String added in v0.0.2

func (obj *Askalono) String() string

type AskalonoLicense added in v0.0.2

type AskalonoLicense struct {
	// Name is the SPDX name of the license found.
	Name string `json:"name"`

	// Kind is some sort of license tag. So far I've found "original".
	Kind string `json:"kind"`

	// Aliases is probably aliases for this license. I've not found this
	// output anywhere atm, so I've left it as an interface.
	Aliases []interface{} `json:"aliases"`
}

AskalonoLicense is the format of the license struct returned by askalono.

type AskalonoOutput added in v0.0.2

type AskalonoOutput struct {
	// Path is an absolute file path to the file being scanned.
	Path string `json:"path"`

	// Result specifies what it found.
	Result *AskalonoResultContaining `json:"result"`

	// Error is a string returned instead of Result on askalono error.
	Error string
}

AskalonoOutput is modelled after the askalono output format.

example:

{
	"path": "/home/ANT.AMAZON.COM/purple/code/license-finder-repo/spdx.go",
	"result": {
		"score": 0.9310345,
		"license": {
			"name": "MIT",
			"kind":"original",
			"aliases": []
		},
		"containing": [
			{
				"score":0.993865,
				"license": {
					"name":"MIT",
					"kind":"original",
					"aliases": []
				},
				"line_range":[17,26]
			}
		]
	}
}

type AskalonoResult added in v0.0.2

type AskalonoResult struct {
	// Score is the matching score found. A 1.00 is a perfect match.
	Score float64 `json:"score"`

	// License points to the license information attached with this find.
	License *AskalonoLicense `json:"license"`
}

AskalonoResult is the generic result format returned by askalono. It is usually augmented by an additional field. That can be found in AskalonoResultRanged or AskalonoResultContaining.

type AskalonoResultContaining added in v0.0.2

type AskalonoResultContaining struct {
	*AskalonoResult

	// Containing has some further information about the output. It isn't
	// always populated, and I think it is only used when --optimize is used
	// *and* it didn't find an exact match. It lists all the other matches
	// it found.
	Containing []*AskalonoResultRanged `json:"containing"`
}

AskalonoResultContaining is a version of the AskalonoResult that also contains a list of additional AskalonoResultRanged matches.

type AskalonoResultRanged added in v0.0.2

type AskalonoResultRanged struct {
	*AskalonoResult

	// LineRangeRaw specifies where the match was found.
	LineRangeRaw []int64 `json:"line_range"`
}

AskalonoResultRanged is a version of the AskalonoResult that also contains the line range information.

type Bitbake added in v0.0.2

type Bitbake struct {
	Debug bool
	Logf  func(format string, v ...interface{})
}

Bitbake is a license backend for the bitbake .bb files which are very commonly seen in the yocto project. We use a trivial string parser for finding these-- this could be improved significantly if people write fancier .bb files, but this should get us 99% of the way there.

func (*Bitbake) ScanData added in v0.0.2

func (obj *Bitbake) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

func (*Bitbake) String added in v0.0.2

func (obj *Bitbake) String() string

type Cran added in v0.0.7

type Cran struct {
	Debug bool
	Logf  func(format string, v ...interface{})
}

Cran is a backend for DESCRIPTION files which store R package metadata. We are getting the license names from the License field in the text file.

func (*Cran) ScanData added in v0.0.7

func (obj *Cran) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

ScanData is used to extract license ids from data and return licenses based on the license ids.

func (*Cran) String added in v0.0.7

func (obj *Cran) String() string

String method returns the name of the backend.

type LicenseClassifier

type LicenseClassifier struct {
	Debug bool
	Logf  func(format string, v ...interface{})

	// XXX: also match with .header files
	// XXX: what default value do we want here?
	// XXX: what exactly does this do?
	IncludeHeaders bool

	// UseDefaultConfidence specifies whether we should use the default
	// confidence threshold that this library seems to use all the time.
	// I've noticed that without it, it misidentifies a lot of things. But
	// with it, it misses some things entirely, even if it incorrectly
	// identifies them.
	UseDefaultConfidence bool

	// SkipZeroResults tells this backend to avoid erroring when we aren't
	// able to determine if a file matches a known license. Since this
	// particular backend is not good at general file identification, and
	// only good at being presented with actual licenses, this is useful if
	// file filtering is not enabled.
	SkipZeroResults bool
}

LicenseClassifier is based on the licenseclassifier project.

func (*LicenseClassifier) ScanPath

func (obj *LicenseClassifier) ScanPath(ctx context.Context, path safepath.Path, info *interfaces.Info) (*interfaces.Result, error)

func (*LicenseClassifier) String

func (obj *LicenseClassifier) String() string

type Pom added in v0.0.5

type Pom struct {
	Debug bool
	Logf  func(format string, v ...interface{})
}

Pom is a backend for Pom or Project Object Model files. It is an xml file commonly used by the Maven Project under the name pom.xml. We are getting the license names by parsing the pom.xml file.

func (*Pom) ScanData added in v0.0.5

func (obj *Pom) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

ScanData method is used to extract license ids from data and return licenses based on the license ids.

func (*Pom) String added in v0.0.5

func (obj *Pom) String() string

String method returns the name of the backend.

type PomLicenses added in v0.0.5

type PomLicenses struct {
	// Names is a variable that will store the license names from pom.xml.
	Names []string `xml:"licenses>license>name"`
}

PomLicenses is a struct that helps store license names from the licenses field in a pom.xml file.

type Regexp added in v0.0.2

type Regexp struct {
	*RegexpCore

	// Filename is an absolute path to a file that we will read the patterns
	// from. The struct is described below and an example is available in
	// the examples folder.
	Filename string
}

Regexp is a simple backend that uses regular expressions to find certain license strings. It wraps the RegexpCore backend and adds the file input code.

func (*Regexp) ScanData added in v0.0.2

func (obj *Regexp) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

func (*Regexp) Setup added in v0.0.2

func (obj *Regexp) Setup(ctx context.Context) error

func (*Regexp) String added in v0.0.2

func (obj *Regexp) String() string

type RegexpConfig added in v0.0.2

type RegexpConfig struct {
	// Rules is the list of regexp and license id rules.
	Rules []*RegexpLicenseRule `json:"rules"`

	// Origin is the SPDX origin string if we want to have a custom
	// namespace for non-SPDX license ID's.
	Origin string `json:"origin"`

	// Comment adds a user friendly comment for this file. We could use it
	// to add a version string or maybe that could be a separate field.
	Comment string `json:"comment"`
}

RegexpConfig is the structure of the pattern config file.

type RegexpCore added in v0.0.2

type RegexpCore struct {
	Debug bool
	Logf  func(format string, v ...interface{})

	// Rules is a list of regexp license rules.
	Rules []*RegexpLicenseRule

	// Origin is the license field origin which is used if a non-SPDX ID is
	// specified. You can use this blank if you want. These are commonly
	// expressed in "reverse-dns" notation to provide a unique identifier
	// when naming your license. Eg: "yesiscan.awslabs.github.com".
	Origin string

	// MultipleMatch is set to true if you want the same regexp to be
	// allowed to match more than once in the same file. This is useful if
	// you want to be able to pull out every range where the pattern is
	// seen, even if you will keep getting the same license answer. Most of
	// the time you probably want to leave this as false.
	MultipleMatch bool
	// contains filtered or unexported fields
}

RegexpCore is a simple backend that uses regular expressions to find certain license strings. You should probably not use this backend directly, but wrap it with one of the other ones like Regexp.

func (*RegexpCore) ScanData added in v0.0.2

func (obj *RegexpCore) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

func (*RegexpCore) Setup added in v0.0.2

func (obj *RegexpCore) Setup(ctx context.Context) error

func (*RegexpCore) String added in v0.0.2

func (obj *RegexpCore) String() string

type RegexpLicenseRule added in v0.0.2

type RegexpLicenseRule struct {
	// Pattern is the expression we want to match. This uses the stock
	// golang regexp engine.
	Pattern string `json:"pattern"`

	// ID is the license ID we should use when the above pattern matches. It
	// should be an SPDX ID, but other strings are supported, they just
	// won't be treated as SPDX if they aren't in our database of allowed
	// license identifiers.
	ID string `json:"id"`
}

RegexpLicenseRule represents the data required for a regexp license rule. Reminder, you can use backticks to quote golang strings, which is particularly helpful when entering regular expressions into structs.

type Scancode added in v0.0.2

type Scancode struct {
	Debug bool
	Logf  func(format string, v ...interface{})
}

Scancode is based on the python scancode project. It uses their heuristic to identify licenses and other things. It would probably be pretty easy to just take the core license identification heuristic and implement it in pure golang and then use it that way. At the moment, this is not as efficient as it could be because we spawn many slow separate python processes to scan. Please note that the project spells it ScanCode, but here we use Scancode.

func (*Scancode) ScanPath added in v0.0.2

func (obj *Scancode) ScanPath(ctx context.Context, path safepath.Path, info *interfaces.Info) (*interfaces.Result, error)

func (*Scancode) Setup added in v0.0.2

func (obj *Scancode) Setup(ctx context.Context) error

func (*Scancode) String added in v0.0.2

func (obj *Scancode) String() string

type ScancodeFileResult added in v0.0.2

type ScancodeFileResult struct {
	// Path is the absolute path of the file scanned. It's only absolute if
	// we run scancode with the --full-root arg which we do.
	Path string `json:"path"`

	// Type is the type of file scanned. The most common string result is
	// "file".
	Type string `json:"type"`

	// Licenses is the list of licenses found.
	Licenses []*ScancodeLicenseResult `json:"licenses"`

	// LicenseExpressions is the list of licenses found. I think these are
	// all SPDX ID's, or rather the scancode version of this.
	LicenseExpressions []string `json:"license_expressions"`

	// PercentageOfLicenseText is some sort of a scoring result. It's not
	// clear if it's a measure of "what percentage of this file contains
	// this license?" vs. "how accurate is the match to this license?". In
	// any case, remember to divide by 100 if you want the more useful ratio
	// value.
	PercentageOfLicenseText float64 `json:"percentage_of_license_text"`

	// Copyrights is unused here at this time.
	Copyrights []interface{} `json:"copyrights"`

	// Holders is unused here at this time.
	Holders []interface{} `json:"holders"`

	// Authors is unused here at this time.
	Authors []interface{} `json:"authors"`

	// Summary is unused here at this time.
	Summary interface{} `json:"summary"`

	// ScanErrors is unused here at this time.
	// XXX: maybe we should check this?
	ScanErrors []interface{} `json:"scan_errors"`
}

ScancodeFileResult is the struct returned for each scanned file.

type ScancodeLicenseResult added in v0.0.2

type ScancodeLicenseResult struct {

	// Key is the SPDX ID's, or rather the scancode version of this I think.
	Key string `json:"key"`

	// Score is the confidence interval of the match I think. It is a
	// percentage as well, so divide by 100 for the more useful ratio.
	Score float64 `json:"score"`

	// Name is the long name of the license, eg: "Apache License 2.0".
	Name string `json:"name"`

	// ShortName is the short name of the license, eg: "Apache 2.0".
	ShortName string `json:"short_name"`

	// Category is the license category, eg: "Permissive". We don't use this
	// classification in this project.
	Category string `json:"category"`

	// IsException needs to be defined here. TODO: what is this?
	IsException bool `json:"is_exception"`

	// IsUnknown needs to be defined here. TODO: what is this?
	IsUnknown bool `json:"is_unknown"`

	// Owner is the author of the license. Eg: "Apache Software Foundation".
	// TODO: Is this correct?
	Owner string `json:"owner"`

	// HomepageUrl is the home of the license.
	HomepageUrl string `json:"homepage_url"`

	// TextUrl is the location where you can find the license.
	TextUrl string `json:"text_url"`

	// ReferenceUrl is the reference link in the aboutcode.org database.
	// TODO: Is this correct?
	ReferenceUrl string `json:"reference_url"`

	// ScancodeTextUrl is the location of the scancode text for this
	// license. Eg: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE
	ScancodeTextUrl string `json:"scancode_text_url"`

	// ScancodeDataUrl is the location of the scancode data for this
	// license. Eg: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml
	ScancodeDataUrl string `json:"scancode_data_url"`

	// SpdxLicenseKey is the SPDX ID of this license. Eg: "Apache-2.0".
	SpdxLicenseKey string `json:"spdx_license_key"`

	// SpdxUrl is the location of the SPDX page for this license. Eg:
	// https://spdx.org/licenses/Apache-2.0
	SpdxUrl string `json:"spdx_url"`

	// StartLine is the line number for the start of the license match.
	StartLine int64 `json:"start_line"`

	// EndLine is the line number for the end of the license match.
	EndLine int64 `json:"end_line"`

	// MatchedRule is a big struct describing how this match was done. This
	// is currently not being used here.
	MatchedRule interface{} `json:"matched_rule"`
}

ScancodeLicenseResult is the struct returned for each license entry in the ScancodeFileResult.

type ScancodeOutput added in v0.0.2

type ScancodeOutput struct {
	// Headers are some output about scancode itself mostly.
	Headers interface{} `json:"headers"`

	// Summary is a top-level overview. We can build something similar
	// ourselves, so we don't need this.
	Summary interface{} `json:"summary"`

	// Files is an absolute file path to the file being scanned.
	Files []*ScancodeFileResult `json:"files"`
}

ScancodeOutput is modelled after the scancode output format.

example:

{
 "headers": [
   {
     "tool_name": "scancode-toolkit",
     "tool_version": "30.1.0",
     "options": {
       "input": [
         "/home/ANT.AMAZON.COM/purple/code/yesiscan/COPYING"
       ],
       "--copyright": true,
       "--full-root": true,
       "--json-pp": "-",
       "--license": true,
       "--only-findings": true,
       "--summary-with-details": true
     },
     "notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
     "start_timestamp": "2022-05-16T173951.395171",
     "end_timestamp": "2022-05-16T173953.393255",
     "output_format_version": "1.0.0",
     "duration": 1.9980971813201904,
     "message": null,
     "errors": [],
     "extra_data": {
       "spdx_license_list_version": "3.14",
       "OUTDATED": "WARNING: Outdated ScanCode Toolkit version! You are using an outdated version of ScanCode Toolkit: 30.1.0 released on: 2021-09-24. A new version is available with important improvements including bug and security fixes, updated license, copyright and package detection, and improved scanning accuracy. Please download and install the latest version of ScanCode. Visit https://github.com/nexB/scancode-toolkit/releases for details.",
       "files_count": 1
     }
   }
 ],
 "summary": {
   "license_expressions": [
     {
       "value": "apache-2.0",
       "count": 1
     }
   ],
   "copyrights": [
     {
       "value": null,
       "count": 1
     }
   ],
   "holders": [
     {
       "value": null,
       "count": 1
     }
   ],
   "authors": [
     {
       "value": null,
       "count": 1
     }
   ]
 },
 "files": [
   {
     "path": "/home/ANT.AMAZON.COM/purple/code/yesiscan/COPYING",
     "type": "file",
     "licenses": [
       {
         "key": "apache-2.0",
         "score": 100,
         "name": "Apache License 2.0",
         "short_name": "Apache 2.0",
         "category": "Permissive",
         "is_exception": false,
         "is_unknown": false,
         "owner": "Apache Software Foundation",
         "homepage_url": "http://www.apache.org/licenses/",
         "text_url": "http://www.apache.org/licenses/LICENSE-2.0",
         "reference_url": "https://scancode-licensedb.aboutcode.org/apache-2.0",
         "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE",
         "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml",
         "spdx_license_key": "Apache-2.0",
         "spdx_url": "https://spdx.org/licenses/Apache-2.0",
         "start_line": 2,
         "end_line": 202,
         "matched_rule": {
           "identifier": "apache-2.0.LICENSE",
           "license_expression": "apache-2.0",
           "licenses": [
             "apache-2.0"
           ],
           "referenced_filenames": [],
           "is_license_text": true,
           "is_license_notice": false,
           "is_license_reference": false,
           "is_license_tag": false,
           "is_license_intro": false,
           "has_unknown": false,
           "matcher": "1-hash",
           "rule_length": 1581,
           "matched_length": 1581,
           "match_coverage": 100,
           "rule_relevance": 100
         }
       }
     ],
     "license_expressions": [
       "apache-2.0"
     ],
     "percentage_of_license_text": 100,
     "copyrights": [],
     "holders": [],
     "authors": [],
     "summary": {
       "license_expressions": [
         {
           "value": "apache-2.0",
           "count": 1
         }
       ],
       "copyrights": [
         {
           "value": null,
           "count": 1
         }
       ],
       "holders": [
         {
           "value": null,
           "count": 1
         }
       ],
       "authors": [
         {
           "value": null,
           "count": 1
         }
       ]
     },
     "scan_errors": []
   }
 ]
}

type Spdx added in v0.0.2

type Spdx struct {
	Debug bool
	Logf  func(format string, v ...interface{})
}

Spdx is based on the Software Package Data Exchange project. It is built with a slightly objectionable parser as prescribed in the official tools repo.

func (*Spdx) ScanData added in v0.0.2

func (obj *Spdx) ScanData(ctx context.Context, data []byte, info *interfaces.Info) (*interfaces.Result, error)

func (*Spdx) String added in v0.0.2

func (obj *Spdx) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL