parser

package module
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2022 License: MIT Imports: 8 Imported by: 0

README


Metadata Parser

⚡️ Extract data from web links, including title, description, photos, videos, and more [via OpenGraph].

Golang GitHub issues GitHub pull requests GitHub License GitHub Contributions

AboutInstallationUsageRoadmapContributingCreditsSupportLicense

About

This package lets you to use Facebook OpenGraph tags to extract information from a website (url / link) and retrieve metadata like title, description, photos, videos, and more.

Installation


go get -u github.com/LinkviteApp/metaparser

Usage

After installing the package, create a go file and paste the example below to get started.


package main

import (
    "fmt"

    parser "github.com/LinkviteApp/metaparser"
)

func main() {
    var link string

    // You can provide your own link.
    link = "https://www.twitter.com/tryLinkvite"

    // Optional: Or use GetLink(). This runs on the terminal.
    link = parser.GetLink()

    // Optional: Pass custom headers parameters.
    // See the table below for the available parameters.
    headers := make(map[string]string)
	headers["Accept-Language"] = "en-US"
	headers["User-Agent"] = "googlebot"

    // Pass the required URL and optional parameters.
    options := parser.Parameters{
		URL:           link,
		Timeout:       10,
		AllowRedirect: false,
		Headers:       headers,
	}

    data, err := parser.ParseLink(options)

    if err != nil {
        panic(err)
    }

    //optional: Convert the parsed data to JSON
    result := parser.ToJSON(data)
    fmt.Println(result)
}

Options

Aside the required URL, you can pass optional parameters which should add more functionality in the parsing of the provided link.

Property Name Result
URL (required) The URL to parse. (Must start with http or https)
Headers (optional) (ex: { 'user-agent': 'googlebot', 'Accept-Language': 'en-US' }) Add request headers to fetch call
Timeout (optional) (ex: 1000) (default 10) Timeout for the request to fail
AllowRedirect (optional) (default false) For security reasons, the library does not automatically follow redirects, a malicious agent can exploit redirects to steal data, turn this on at your own risk

In your root directory, run

go run .

Once the program is running, you'll get


👋 Enter the url of the web page 👇

================================================================

Next provide the link you want to parse


👋 Enter the url of the web page 👇

================================================================

https://github.com

✅ Successful Response

If you provided a valid url, you will get a response that looks like this:


✅ Valid URL provided.

================================================================

✅ Generated metadata template.

================================================================

⏳ Updating metadata from html document...

================================================================

✅ Updated metadata from html document.

================================================================

⏱ Total time taken: 494 milliseconds.

================================================================

📋 Metadata:

================================================================

{
    "name": "GitHub",

    "title": "GitHub: Where the world builds software",

    "description": "GitHub is where over 83 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and feat...",

    "domain": "github.com",

    "url": "https://github.com/",

    "type": "website",

    "images": ["https://github.githubassets.com/images/modules/site/social-cards/github-social.png"],

    "favicons": ["https://github.githubassets.com/favicons/favicon.svg"]
}

❌ Error Response

PS All links must be of scheme http or https. An error response would look like this:


👋 Enter the url of the web page 👇

================================================================

github.com

================================================================

❌ Failed to parse the url. Reason: The url must be of scheme http or https.

================================================================

Roadmap

  • Parse any website

  • Return custom reponse

  • Retrieve favicons

  • Retrieve multiple images

  • Your awesome feature 😉

See the open issues for a full list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

Don't forget to give the project a star! Thanks again!

  1. Fork the Project

  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)

  3. Commit your Changes (git commit -m 'Add some AmazingFeature')

  4. Push to the Branch (git push origin feature/AmazingFeature)

  5. Open a Pull Request

Credits

Support

Feel free to reach out on twitter @tryLinkvite or @kayode0x.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetLink() string

Get the link from the user.

func PrintError

func PrintError(why string) error

Make the error function available to the package.

func ToJSON

func ToJSON(v interface{}) string

Covert struct to JSON.

func ValidateLink(link string) (l string, e error)

Validate the link

Return the domain name if valid or error if invalid.

Types

type Link struct {
	URL    string
	Domain string
}
func NewLink(link, domain string) *Link

Instantiate a new link object.

type MetaData

type MetaData struct {
	Title       string   `json:"title"`
	Description string   `json:"description"`
	SiteName    string   `json:"name"`
	Domain      string   `json:"domain"`
	URL         string   `json:"url"`
	SiteType    string   `json:"type"`
	Images      []string `json:"images"`
	Favicons    []string `json:"favicons"`
}

func NewMetaData

func NewMetaData() *MetaData

Instantiate a new metadata object.

func ParseLink(p Parameters) (MetaData, error)

Return the metadata of the page or error.

type Parameters added in v0.1.8

type Parameters struct {
	URL           string
	Timeout       int
	AllowRedirect bool
	Headers       map[string]string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL