mediawiki

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2020 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package mediawiki contains custom queries which tap into the MediaWiki API using go-mwclient (gitlab.com/melancholera/go-mwclient).

Bot passwords

The user could use a bot password to log into their mwclient.Client instance. See https://www.mediawiki.org/wiki/Manual:Bot_passwords for more information.

Because most functions in this package will require the user to be logged in, users of this package should be careful about hosting their passwords online.

Liability

While the author of this package has tried their best to make this package work as intended, responsibility still lies on the user. The author is not, and cannot be held, responsible for any damages intentional or otherwise caused by users of this package.

Normalization and denormalization

Normalization and denormalization functions are written so that links in a page could be spotted even if they are written in a really different way. For instance, `[[Some title]]`, `[[Some_title]]`, `[[:Some title]]`, and `[[_ _:Some _ _ title _ ]]` will refer to the same page titled "Some title", but `[[Some Title]]` and `[[Some title with text]]` won't. The denormalization strategy is somewhat naive: for instance, it does not detect URL encoded links.

Refer to https://www.mediawiki.org/wiki/Manual:Title.php#Title_structure for more information.

Index

Examples

Constants

View Source
const (
	NookipediaAPIUrl = "https://nookipedia.com/w/api.php"
	DefaultUserAgent = "[https://gitlab.com/melancholera/sonchou/ sonchou]"
)

Some constants that can be recycled

View Source
const WikiSpecial = `\^$.|?*+()[]{}`

WikiSpecial denotes special characters (in RegEx) that can be found in Mediawiki titles

Variables

View Source
var (
	WikiTextBlank       = regexp.MustCompile(`[\t _\xA0\x{1680}\x{180E}\x{2000}-\x{200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}]+`)
	WikiTextBlankOrBidi = regexp.MustCompile(`[\t _\xA0\x{1680}\x{180E}\x{2000}-\x{200B}\x{200E}\x{200F}\x{2028}-\x{202F}\x{205F}\x{3000}]*`)
)

blank spaces

Functions

func BatchUpload

func BatchUpload(folder string, client *mwclient.Client, paramFunc func(string) params.Values) error

BatchUpload uploads all files in a folder which uploads them in their filename and includes a function that takes in a filename and returns a map of parameters. If paramFunc is nil, they will be uploaded with default options, that is, with the same filename as the local file and with no extra parameters.

All folders in folder will be ignored. If an error has been encountered for some file, it will print the error and stack trace out to glog.Warning but will continue uploading the other files.

Remember that paramFunc can overwrite filenames by returning a params.Values that has the "filename" parameter and can actually skip upload of some files by setting "filename" to "". If its "filename" value is set to "", it will ignore the file. In addition, if paramFunc returns nil, it will also ignore the file.

Warnings and glog

This script uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the folder it may encounter some problems.

It prints to Info if a file is actually a folder, or if a file has been successfully uploaded.

It prints to Warning if there is an error in uploading some file. Note that it does not return an error immediately.

Example

The following example is what I have used in https://nookipedia.com/wiki/Category:Animal_Crossing:_Wild_World_furniture_items. Do not forget to handle errors.

folder := "folder to open"
client, err := mwclient.New(mediawiki.NookipediaAPIUrl, mediawiki.DefaultUserAgent)
if err != nil {
	err = errors.Wrap(err, "could not instantiate mwclient object")
	return
}

err = client.Login("username", "password")
if err != nil {
	err = errors.Wrap(err, "could not log in")
	return
}

err = mediawiki.BatchUpload(folder, client, func(filename string) params.Values {
	// make sure filename fits "...WW Icon.png"
	values := make(params.Values)
	if !strings.HasSuffix(filename, "WW.png") {
		values["filename"] = "" // skip!
	}

	values["comment"] = fmt.Sprintf("Uploaded via %v", mediawiki.DefaultUserAgent)

	// generate the upload text
	text := new(strings.Builder)
	text.WriteString("Image of the ")
	text.WriteString(strings.TrimSuffix(filename, "WW.png"))
	text.WriteString(" from {{WW}}. Taken from [https://animalcrossingwiki.de/acww/katalog/einrichtung Einrichtungsgegenstände] from [https://animalcrossingwiki.de/ Animal Crossing Wiki].\n\n")
	text.WriteString("== Licensing ==\n{{game_sprite}}\n\n[[Category:Animal Crossing: Wild World furniture items]]")
	values["text"] = text.String()

	return values
})

return
Output:

func DenormalizeLink(interwiki, localized, canonical, title string) string

DenormalizeLink creates a regex string that matches links that refer to a specific page with some specific interwiki prefix, localized and canonical namespaces, and title. For example, localized:Fichier, canonical:File, title:Example.png will find [[ : File: Example.png ]], [[Fichier:_Example.png]], and [[File:Example.png|some text]]. Its first match for the above would be " : File: Example.png ", "Fichier:_Example.png", and "File:Example.png".

Note that canonical must be non-empty for categories to be detected. localized may be empty.

func DenormalizeNamespace added in v0.3.2

func DenormalizeNamespace(localized, canonical string) string

DenormalizeNamespace creates a regex string that matches either the localized or canonical namespace. If the two are equal, or if localized is blank, it will denormalize the canonical namespace. Its first match is the namespace.

func DenormalizeTitle added in v0.3.2

func DenormalizeTitle(title string) string

DenormalizeTitle denormalizes a title and returns a string that encodes a regex string. Its first match is the title.

func NormalizeLink(interwiki, localized, canonical, title string) string

NormalizeLink creates the normalized version of a link provided an interwiki, localized and canonical namespaces, and title. It will not add a colon after "[[" e.g., "[[Category:Something]]" instead of "[[:Category:Something]]"

func NormalizeNamespace added in v0.3.2

func NormalizeNamespace(namespace string) string

NormalizeNamespace turns a namespace into its normalized form. It sets all characters to lowercase and the first character to uppercase.

func NormalizeTitle added in v0.3.2

func NormalizeTitle(title string) string

NormalizeTitle normalizes a title by setting its first character to uppercase and all whitespace to a space. For example, NormalizeTitle("New_Year's Day") turns into "New Year's Day".

func OpenAndUpload

func OpenAndUpload(local, remote string, client *mwclient.Client, params params.Values) (*jason.Object, error)

OpenAndUpload opens a local file and uploads it with a filename remote with some parameters. Note that remote does not need to start with "File:". Returns the response and an error.

func Perform added in v0.3.0

func Perform(pagename string, function PageFunction, client *mwclient.Client, optional ...interface{}) (err error)

Perform performs a PageFunction in a single pagename and returns an error. If PageFunction panics it will return an appropriate error.

func PerformInCategory added in v0.2.1

func PerformInCategory(category string, function PageFunction, client *mwclient.Client, optional ...interface{}) []error

PerformInCategory performs a PageFunction in all pages and subcategories in a category and will return a slice of all errors it encounters. The category parameter does not need to include the prefix "Category:" or its localized equivalent.

PageFunction

PerformInCategory takes in an arbitrary number of values optional that will be passed as optional values in the PageFunction. Care must be taken on using the appropriate optional values on a specified PageFunction. If the function panics it will continue iterating through the pages in the category, and information about the panic will be written to the slice of errors.

Errors and glog

This function uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the category pages it may encounter some problems.

It prints to Info if function returns a DidNothingError to a particular page, or if the function performed successfully.

It prints to Error if it couldn't process a particular page, or if the PageFunction panics as per Perform.

func PerformInFiles added in v0.3.0

func PerformInFiles(title string, function PageFunction, client *mwclient.Client, optional ...interface{}) []error

PerformInFiles performs a PageFunction in all images and files found in title and will return a slice of all errors it encounters.

PageFunction

PerformInFiles takes in an arbitrary number of values optional that will be passed as optional values in the PageFunction. Care must be taken on using the appropriate optional values on a specified PageFunction. If the function panics it will continue iterating through the pages, and information about the panic will be written to the slice of errors.

Errors and glog

This function uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the links it may encounter some problems.

It prints to Info if function returns a DidNothingError to a particular page, or if the function performed successfully.

It prints to Error if it couldn't process a particular page, or if the PageFunction panics as per Perform.

Multiple pages on title

If title contains multiple pages (e.g., "Title One|Title Two"), function would still work through the links of both pages, although this may result in undefined behavior, especially on returning errors. Please do not do this.

func PerformInLinks(title string, function PageFunction, client *mwclient.Client, optional ...interface{}) []error

PerformInLinks performs a PageFunction in all pages that title links to and will return a slice of all errors it encounters. The links of a page does not necessarily include the files in said page.

PageFunction

PerformInLinks takes in an arbitrary number of values optional that will be passed as optional values in the PageFunction. Care must be taken on using the appropriate optional values on a specified PageFunction. If the function panics it will continue iterating through the pages in the category, and information about the panic will be written to the slice of errors.

Errors and glog

This function uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the links it may encounter some problems.

It prints to Info if function returns a DidNothingError to a particular page, or if the function performed successfully.

It prints to Error if it couldn't process a particular page, or if the PageFunction panics as per Perform.

Multiple pages on title

If title contains multiple pages (e.g., "Title One|Title Two"), function would still work through the links of both pages, although this may result in undefined behavior, especially on returning errors. Please do not do this.

func PerformInPageChannel added in v0.3.1

func PerformInPageChannel(pages <-chan string, function PageFunction, client *mwclient.Client, optional ...interface{}) chan error

PerformInPageChannel performs a PageFunction in all pages on a channel of strings until such channel closes. It returns a chan error containing all errors that may occur during the processing, which will only be closed once all pages have been processed.

If PageFunction returns a DidNothingError it will not be counted as an error and won't be appended to the error slice.

Errors and glog

This function uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the pages it may encounter some problems.

It prints to Info if function returns a DidNothingError to a particular page, or if the function performed successfully.

It prints to Error if it couldn't process a particular page using the Perform function.

func PerformInPages added in v0.3.0

func PerformInPages(pages []string, function PageFunction, client *mwclient.Client, optional ...interface{}) []error

PerformInPages performs a PageFunction in all pages in a slice of strings and returns a slice of errors it encounters.

If PageFunction returns a DidNothingError it will not be counted as an error and won't be appended to the error slice.

Errors and glog

This function uses the Google logging module https://github.com/golang/glog for fine tuned logging. As the function iterates through the pages it may encounter some problems.

It prints to Info if function returns a DidNothingError to a particular page, or if the function performed successfully.

It prints to Error if it couldn't process a particular page using the Perform function.

func QueryCategoryMembers added in v0.3.0

func QueryCategoryMembers(category string, client *mwclient.Client) chan string

QueryCategoryMembers returns a channel containing the pages, subcategories, and files in a category. The channel is closed afterwards. Note that cateogry does not need the prefix "Category:". If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func QueryFiles added in v0.3.0

func QueryFiles(title string, client *mwclient.Client) chan string

QueryFiles returns a channel containing the names of the files used in a page and will close it afterwards. Note that the output strings will start with "File:".

If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func QueryLinks(title string, client *mwclient.Client) chan string

QueryLinks returns a channel containing the names of all links of a page and will close it afterwards. If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func QuerySpecialPage added in v0.3.1

func QuerySpecialPage(client *mwclient.Client, qppage string) chan string

QuerySpecialPage returns a channel containing pages in some special page qppage. The channel will close once all pages have been printed.

Note that it is case sensitive (e.g., "Ancientpages", "Lonelypages"). See https://www.mediawiki.org/wiki/API:Querypage for possible values of qppage.

If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func QueryUncategorizedImages added in v0.3.1

func QueryUncategorizedImages(client *mwclient.Client) chan string

QueryUncategorizedImages returns a channel containing all the names of the files that have been uncategorized (Special:UncategorizedFiles) and will close it afterwards. Note that the output strings will start with "File:".

If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func QueryUnusedImages added in v0.3.1

func QueryUnusedImages(client *mwclient.Client) chan string

QueryUnusedImages returns a channel containing all the names of the files that have been unused (Special:UnusedFiles) and will close it afterwards. Note that the output strings will start with "File:".

If there are any errors in querying, they will be printed to glog.Error, but they will not be returned.

func Recategorize added in v0.3.1

func Recategorize(pagename string, client *mwclient.Client, optional ...interface{}) error

Recategorize is a PageFunction that takes in two optional parameters: a slice of strings containing the categories it wishes to remove (optional[0]), and a slice of strings containing the categories it wishes to add (optional[1]). If type assertion could not be done, or if optional is too short, the function will panic.

Note that it is possible for either optional[0] or optional[1] to be empty string slices ([]string{}), that is, either remove or add no categories, but not nil.

The slices for old and new categories do not need to start with "Category:".

Methodology for categorization

Recategorization uses the normalization-denormalization technique.

Do note that "Category:" will be replaced by the localized namespace that the client uses. In addition, the function will first check if "[[Category:NEWCAT]]" is in the string before adding one at the very end.

After replacing said text, it will then edit the page with the edit summary "Recategorized: -[[:Category:OldCat1]] -[[:Category:OldCat2]] ... +[[:Category:NewCat1]] +[[:Category:NewCat2]] ... using client.UserAgent". If there were no changes performed, it will return a DidNothingError.

The edit will be marked as a Bot edit.

Bugs

This function will not check URL encoded strings, for example, [[Category:Some %2F thing]].

func RecategorizeContent added in v0.3.2

func RecategorizeContent(content string, oldCat, newCat []string, localized string) (string, error)

RecategorizeContent changes a text such that it will remove all categories found in oldCat and will append all categories in newCat. It will also return a necessary error in case it couldn't compile any regex expression. If localized is blank the category namespace will be the canonical "Category:".

Removal will also try to remove any spaces or newlines before or after the declaration of [[Category:Some category]] for categories in oldCat. Appending categories from newCat will be done at the very end of the text where the categories are in their canonical form.

Types

type DidNothingError added in v0.2.2

type DidNothingError struct {
	Pagename string
}

DidNothingError is an "error" type that a PageFunction can return if it encounters a page and it explicitly chooses to do nothing.

func (DidNothingError) Error added in v0.2.2

func (e DidNothingError) Error() string

type PageFunction added in v0.2.1

type PageFunction func(pagename string, client *mwclient.Client, optional ...interface{}) error

PageFunction is a function that takes in a MediaWiki page's name, a *mwclient.Client, and optional parameters, and returns an optional return value and an error.

Usage

PageFunction may take in arbitrary values, more specifically, empty interfaces. Due to the unsafe nature of empty interfaces, care must be taken in specifying the types of these optional parameters or risk a panic.

Namespaces

PageFunction does not necessarily know what namespace pagename belongs to, although it can be inferred manually using its prefix (e.g., "File:").

See the variables for sample PageFunctions.

var CategorizeToNLVillagerHouses PageFunction = func(pagename string, client *mwclient.Client, optional ...interface{}) error {
	if !strings.HasPrefix(pagename, "File:House of ") ||
		!(strings.HasSuffix(pagename, " NL.jpg") ||
			strings.HasSuffix(pagename, " NL.jpeg") ||
			strings.HasSuffix(pagename, " NL.png")) {
		return DidNothingError{Pagename: pagename}
	}

	content, timestamp, err := client.GetPageByName(pagename)
	if err != nil {
		return errors.Wrapf(err, "could not get page %v", pagename)
	}
	if strings.Contains(content, "[[Category:New Leaf villager houses]]") {
		return DidNothingError{Pagename: pagename}
	}

	content = strings.Replace(content, "[[Category:New Leaf images]]", "[[Category:New Leaf villager houses]]", 1)
	editParams := make(params.Values)
	editParams["title"] = pagename
	editParams["text"] = content
	editParams["summary"] = "Recategorized to villager images. Edited via " + client.UserAgent
	editParams["basetimestamp"] = timestamp
	editParams["bot"] = "1"
	if err = client.Edit(editParams); err != nil {
		return errors.Wrapf(err, "could not edit page %v", pagename)
	}
	return nil
}

CategorizeToNLVillagerHouses is an example of a PageFunction used to categorize images to https://nookipedia.com/wiki/Category:New_Leaf_villager_houses. It ignores any optional parameters.

This function takes in a name of a pagename and categorizes it by naïvely replacing "[[Category:New leaf images]]" with "[[Category:New Leaf villager houses]]". If pagename does not start with "File:House of " or does not end with either " NL.jpg", " NL.jpeg", or " NL.png"; or if its contents already contain "[[Category:New Leaf villager houses]]", it will return a DidNothingError. Otherwise it will return whatever error occurs after editing.

var PrintPageAndContents PageFunction = func(pagename string, client *mwclient.Client, optional ...interface{}) error {
	lineCount := optional[0].(int)
	contentRaw, _, err := client.GetPageByName(pagename)
	if err != nil {
		return err
	}
	fmt.Println("Page name:", pagename)
	content := strings.Split(contentRaw, "\n")
	for ii := 0; ii < lineCount; ii++ {
		fmt.Println(content[ii])
	}
	return nil
}

PrintPageAndContents writes to standard output the pagename and its contents. It takes only one optional parameter: an int, that determines how many lines of contents it will print out. If optional is length zero, if the value is not an int, or if there are fewer lines than the provided parameter it will panic.

This is written as a test to see if the Perform function manages to catch the panic.

var PrintPageAndContentsSafe PageFunction = func(pagename string, client *mwclient.Client, optional ...interface{}) error {
	if len(optional) < 1 {
		return errors.New("no optional parameters")
	}
	lineCount, ok := optional[0].(int)
	if !ok {
		return errors.Errorf("%v not an int", optional[0])
	}
	fmt.Println("Page name:", pagename)
	contentRaw, _, err := client.GetPageByName(pagename)
	if err != nil {
		return err
	}
	content := strings.Split(contentRaw, "\n")
	for ii := 0; ii < lineCount && ii < len(content); ii++ {
		fmt.Println(content[ii])
	}
	return nil
}

PrintPageAndContentsSafe is like PrintPageAndContents but has additional checks to make sure that this function does not panic.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL