scraply

module

v0.0.0-...-ff7c352 Latest Latest Go to latest Published: Jul 7, 2022 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

README ¶

Scraply

Scraply, is a very simple html scraping tool, if you know css & jQuery then you can use it!, scraply should be simple and tiny as well it could be used as a component in a large system something like this use-case

Overview

you can use scraply within your stack via cli or http.

# here is the CLI usage

# extracting the title and the description from scraply github repo page
$ scraply extract \
    -u "https://github.com/alash3al/scraply" \
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

# same thing but with custom user agent
$ scraply extract \
    -u "https://github.com/alash3al/scraply" \
    -ua "OptionalCustomUserAgent"\
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

# same thing but with asking scraply to return the response body for debugging purposes
$ scraply extract \
    --return-body \
    -u "https://github.com/alash3al/scraply" \
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

for http usage, we will run the http server then using any http client to interact with it.

# running the http server
# by default it listens on address ":8010" which equals to "0.0.0.0:8010"
# for more information execute `$ scraply help`
$ scraply serve

# then in another shell let's execute the following curl 
$ curl http://localhost:8010/extract \
    -H "Content-Type: application/json" \
    -s \
    -d '{"url": "https://github.com/alash3al/scraply", "extractors": {"title": "$(\"title\").text()"}, "return_body": false, "user_agent": "CustomeUserAgent"}'

for debugging, there is shell

$ scraply shell -u https://github.com/alash3al/scraply
➜ (scraply) > $("title").text()
GitHub - alash3al/scraply: Scraply a simple dom scraper to fetch information from any html based website and convert that info to JSON APIs

➜ (scraply) > request.url
https://github.com/alash3al/scraply

➜ (scraply) > response.status_code
200

➜ (scraply) > response.url
https://github.com/alash3al/scraply

➜ (scraply) > response.body
<html>.....

Download ?

you can go to the releases page and pick the latest version. or you can $ docker run --rm -it ghcr.io/alash3al/scraply scraply help

Contribution ?

for sure you can contribute, how?

clone the repo
create your fix/feature branch
create a pull request

nothing else, enjoy!

About

I'm Mohamed Al Ashaal, a software engineer :)

Directories ¶

Path	Synopsis
cmd
commands
pkg
fetch

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL