recipe

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2023 License: Apache-2.0 Imports: 2 Imported by: 2

README

go-recipe-logo

godoc semver tag go report card coverage status license

go-recipe

go-recipe is a Go library that scrapes recipes from websites.

Installation

$ go get github.com/kkyr/go-recipe@latest

Usage

package main

import "github.com/kkyr/go-recipe/pkg/recipe"

func main() {
  url := "https://minimalistbaker.com/quick-pickled-jalapenos/"

  recipe, err := recipe.ScrapeURL(url)
  if err != nil {
      // handle err
  }

  ingredients, ok := recipe.Ingredients()
  instructions, ok := recipe.Instructions()
  // ... & more fields available
}

Scraping

The go-recipe default scraper looks for a ld+json encoded Schema Recipe on the target website and is able to retrieve most fields defined in the schema. However, some websites have incomplete Schema data or simply do not encode their recipe in such a format.

Therefore, custom scrapers exist that are used to scrape specific websites. These scrapers can make use of the default scraper so that custom scraping logic is only defined for fields that the default scraper could not find any data for.

The custom scrapers are registered in pkg/recipe/scrapers.go and are identified by host name, which represents the website that they are used for. When a client provides go-recipe with a link to scrape, the host name is extracted from the link and is used to find the corresponding custom scraper. The default scraper is used if no custom scraper is defined.

Contributing

Contributions are welcome! You can contribute in a few ways: by adding a custom scraper, patching a bug, implementing a feature based on the roadmap (see further below), or by incorporating any other feature that you'd like to see. For the latter case, please first open a discussion so that we can align on the change before you start coding.

Custom scrapers

Creating a custom scraper is easy as pie thanks to the code generator that's included in this package.

The generator requires two arguments: a link to a recipe on a website and the domain of the website that the recipe is hosted on. The domain is used to generate the source code (particularly the file and struct names), and the link is used to scrape recipe data, which is then used to generate a fully functioning unit test. If the generator is unable to scrape recipe data (which can happen if the website does not contain a Schema Recipe), a test will still be generated but test assertions will be made against empty fields.

To use the code generator, run the following command while inside the go-recipe package:

$ go run cmd/scrapergen/*.go \
  -d CopyKat \
  -u https://copykat.com/dunkin-donuts-caramel-iced-coffee

(replace the domain and link with your own)

Important: the domain should be provided in PascalCase so that the generated structs are correctly cased. Otherwise, your PR will not get approved.

The sample command above would generate the following files:

go-recipe/internal/html/scrape/custom/copykat.go
go-recipe/internal/html/scrape/custom/copykat_test.go
go-recipe/internal/html/scrape/custom/testdata/copykat.com.html

The generator can't do everything (at least not yet), so there's some final touches that you must put in:

  1. Register your custom scraper in the hostToScraper map located in pkg/recipe/scrapers.go. Please maintain alphabetical ordering.
  2. (Optional) Modify the custom scraper to add your own scraping logic.
  3. Verify that the generated test is correct.
  4. Verify that make lint and make test are passing.

You should now be ready to send a PR!

Roadmap
  • Refactor scrapergen and add unit tests.
  • Modify scrapergen so that it also adds the new scraper to the hostToScraper map.
  • Add option for user to specify http client timeout.
  • Add option for user to specify "strict" mode: in this mode only custom scrapers will be used if defined, otherwise fail.
  • Add more Schema Recipe fields.
  • Add CLI wrapper over the scraper, providing output in JSON.

Acknowledgements

go-recipe is heavily inspired by recipe-scrapers.

Documentation

Overview

Package recipe is a Go library that scrapes recipes from websites.

The go-recipe default scraper looks for a `ld+json` encoded Schema Recipe on the target website and is able to retrieve most fields defined in the schema. However, some websites have incomplete Schema data or simply do not encode their recipe in such a format.

Therefore, custom scrapers exist that are used to scrape specific websites. These scrapers can make use of the default scraper so that custom scraping logic is only defined for fields that the default scraper could not find any data for.

The custom scrapers are registered in scrapers.go and are identified by host name, which represents the website that they are used for. When a client provides go-recipe with a link to scrape, the host name is extracted from the link and is used to find the corresponding custom scraper. The default scraper is used if no custom scraper is defined.

Example

package main

import "github.com/kkyr/go-recipe/pkg/recipe"

func main() {
  url := "https://minimalistbaker.com/quick-pickled-jalapenos/"

  recipe, err := recipe.ScrapeURL(url)
  if err != nil {
	// handle err
  }

  ingredients, ok := recipe.Ingredients()
  instructions, ok := recipe.Instructions()
  // ... & more fields available
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Diet

type Diet int

Diet is a diet restricted to certain foods or preparations.

const (
	UnknownDiet Diet = iota
	DiabeticDiet
	GlutenFreeDiet
	HalalDiet
	HinduDiet
	KosherDiet
	LowCalorieDiet
	LowFatDiet
	LowLactoseDiet
	LowSaltDiet
	VeganDiet
	VegetarianDiet
)

func (Diet) String

func (i Diet) String() string

type Nutrition

type Nutrition struct {
	// The number of calories.
	Calories float32
	// The number of grams of carbohydrates.
	CarbohydrateGrams float32
	// The number of milligrams of cholesterol.
	CholesterolMilligrams float32
	// The number of grams of fat.
	FatGrams float32
	// The number of grams of fiber.
	FiberGrams float32
	// The number of grams of protein.
	ProteinGrams float32
	// The number of grams of saturated fat.
	SaturatedFatGrams float32
	// The serving size, in terms of the number of volume or mass.
	ServingSize string
	// The number of milligrams of sodium.
	SodiumMilligrams float32
	// The number of grams of sugar.
	SugarGrams float32
	// The number of grams of trans fat.
	TransFatGrams float32
	// The number of grams of unsaturated fat.
	UnsaturatedFatGrams float32
}

Nutrition represents nutritional information about a recipe.

type Scraper

type Scraper interface {
	// Author is the author of the recipe.
	Author() (string, bool)
	// Categories are the categories of the recipe, e.g. appetizer, entrée, etc.
	Categories() ([]string, bool)
	// CookTime is the time it takes to actually cook the dish.
	CookTime() (time.Duration, bool)
	// Cuisine is the cuisine of the recipe, e.g. mexican-inspired, french, etc.
	Cuisine() ([]string, bool)
	// Description is the description of the recipe.
	Description() (string, bool)
	// ImageURL is a URL to an image of the dish.
	ImageURL() (string, bool)
	// Ingredients are all the ingredients used in the recipe.
	Ingredients() ([]string, bool)
	// Instructions are all the steps in making the recipe.
	Instructions() ([]string, bool)
	// Language is the language used in the recipe expressed in IETF BCP 47 standard.
	Language() (string, bool)
	// Name is the name of the recipe.
	Name() (string, bool)
	// Nutrition is nutritional information about the dish.
	Nutrition() (Nutrition, bool)
	// PrepTime is the length of time it takes to prepare the items to be used in the instructions.
	PrepTime() (time.Duration, bool)
	// SuitableDiets indicates dietary restrictions or guidelines for which the recipe is suitable.
	SuitableDiets() ([]Diet, bool)
	// TotalTime is the total time required to perform the instructions (including the prep time).
	TotalTime() (time.Duration, bool)
	// Yields is the quantity that results from the recipe.
	Yields() (string, bool)
}

Scraper is a type that is returns recipe information from an underlying data source.

Directories

Path Synopsis
cmd
internal
url
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL