godeclutter

command module
v1.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2022 License: MIT Imports: 8 Imported by: 0

README

godeclutter

Declutters URLs in a lightning fast and flexible way, for improving input for web hacking automations such as crawlers and vulnerability scans.

godeclutter is a very simple tool that will take a list of URLs, clean those URLs, and output the unique URLs that are present. This will reduce the number of requests you will have to make to your target website, and also filter URLs that are most likely uninteresting.

Features

godeclutter will perform the following steps on your URL host section:

  • Clean http:// URLs pointing to the default SSL port (443) and vice-versa, since they are mostly CDN error pages.
  • Clean port notation of URLs pointing to the default protocol ports, since those ports are already implied by the protocol scheme. (such as :443 and :80)
  • Clean http:// URLs if a https:// to the same host and port is present, since 99,9% of those cases will just be a redirect to https://.
  • Remove uninteresting media extensions such as png, jpg, css, etc. (This one will keep .js files since those are sometimes interesting)
  • Remove urls with uninteresting words such as bootstrap, jquery, node_modules, etc.
  • Sort query parameters
  • Lowercase all schemes and hostnames, since upper-casing is irrelevant for those.
  • Replace all lower-case URI encoding escapes to upper-case, to maintain a standard.
  • Decode unnecessary escapes for characters that are not special in the URL context (i.e http://example.com/%41 ).
  • Remove empty query strings (i.e http://example.com/? )
  • Remove trailing slashes, this is rather aggressive but filters a lot of un-interesting duplicates on the majority of the cases. (i.e http://host/path/ -> http://host/path)
  • Normalize dot segments, also rather aggressive but useful when working with dirty sources. (http://host/path/./a/b/../c -> http://host/path/a/c)

Install

go install github.com/c3l3si4n/godeclutter@HEAD

Basic Usage

You can send URLs by sending them to stdin.

user@arch ~/D/g/godeclutter (main)> cat test_urls.txt 
https://1.1.1.1:443/
http://1.1.1.1/
http://1.1.1.1:80/
https://1.1.1.1:443/
https://1.1.1.1:80/
https://1.1.1.1:8443/
https://1.1.1.1:8443/
https://1.1.1.1:443/
https://1.1.1.1:8443/
https://1.1.1.1:8443/
https://1.1.1.1:443/?
http://1.1.1.1:443/?
https://1.1.1.1:80/?a
https://1.1.1.1:443/?
https://1.1.1.1:443/?
https://1.1.1.1:443/?1=1
https://1.1.1.1:443/?a=a&b=1
https://1.1.1.1:443/?a=a&b=1
https://1.1.1.1:443/a.js?a=a&b=1
https://1.1.1.1:443/fiqef.html?a=a&b=1
https://1.1.1.1:443/fmef.jpg?b=1&a=a
https://1.1.1.1:443/?b=1&a=a
https://1.1.1.1:443/a.js?
https://1.1.1.1:443/a.jpg
https://1.1.1.1:8443/
https://1.1.1.1:443/
https://1.1.1.1:443/node_modules/
https://1.1.1.1:443/path/scripts/jquery.js
user@arch ~/D/g/godeclutter (main)> cat test_urls.txt | ./godeclutter -bw -be -c -p
https://1.1.1.1/
https://1.1.1.1:8443/
https://1.1.1.1/?1=1
https://1.1.1.1/?a=a&b=1
https://1.1.1.1/a.js?a=a&b=1
https://1.1.1.1/fiqef.html?a=a&b=1
https://1.1.1.1/a.js

Arguments

$> ./godeclutter -h
Usage of ./godeclutter:
  -be	Blacklist Extensions - clean some uninteresting extensions. (default true)
  -bw	Blacklist words - clean some uninteresting words. (default true)
  -c	Clean URLs - Aggressively clean/normalize URLs before outputting them.
  -p	Prefer HTTPS - If there's a https url present, don't print the http for it. (since it will probably just redirect to https)

Aknowledgements

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL