mamarkt-scraper

command module
v0.0.0-...-279ffdf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2023 License: MIT Imports: 10 Imported by: 0

README

Mamarkt Scraper

This Go application is designed to collect product information from various websites, scrape data from web pages, and store the information in a DynamoDB database. It uses the Colly library for collecting XML data and the Chromedp library for web scraping.

This repository is the part of early stages of the actual mamarkt and it is uploaded to give a brief idea how it works. This code is from the times that I run the crawler as an AWS ECS TASK.

Table of Contents

Features

  • Collect product information URLs from a sitemap.
  • Scrape product information, including name and price, from web pages.
  • Batch store product information in a DynamoDB database.
  • Concurrent processing to improve performance.

Getting Started

Prerequisites

Before running the application, make sure you have the following prerequisites installed:

Installation
  1. Clone the repository to your local machine:

    git clone https://github.com/ozdemirrulass/mamarkt-scraper.git
    
  2. Change the directory to the project's root:

    cd gishbulk-scraper
    
  3. Install the required dependencies:

    go get ./...
    

Usage

To run the application, use the following command:

go run main.go

Contributing

Contributions are welcome! If you'd like to improve the application or fix any issues, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and test thoroughly.
  4. Commit your changes.
  5. Push your changes to your fork.
  6. Create a pull request to the original repository.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL