link

package
v0.0.0-...-55065b5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2021 License: MIT Imports: 3 Imported by: 0

README

Exercise details

In this exercise your goal is create a package that makes it easy to parse an HTML file and extract all of the links (<a href="">...</a> tags). For each extracted link you should return a data structure that includes both the href, as well as the text inside the link. Any HTML inside of the link can be stripped out, along with any extra whitespace including newlines, back-to-back spaces, etc.

Links will be nested in different HTML elements, and it is very possible that you will have to deal with HTML similar to code below.

<a href="/dog">
  <span>Something in a span</span>
  Text not in a span
  <b>Bold text!</b>
</a>

In situations like these we want to get output that looks roughly like:

Link{
  Href: "/dog",
  Text: "Something in a span Text not in a span Bold text!",
}

Once you have a working program, try to write some tests for it to practice using the testing package in go.

Notes

1. Use the x/net/html package

I recommend checking out the x/net/html package for this task, which you will need to go get. It is provided by the Go team, but isn't included in the standard library. This makes it a little easier to parse HTML files.

2. Ignore nested links

You can ignore any links nested inside of another link. Eg with following HTML:

<a href="#">
  Something here <a href="/dog">nested dog link</a>
</a>

It is okay if your code returns only the outside link.

3. Get something working before focusing on edge-cases

Don't worry about having perfect code. Chances are there will be a lot of edge cases here that will be kinda tricky to handle. Just try to cover the most basic use cases first and then improve on that.

4. A few HTML examples have been provided

I created a few simpler HTML files and included them in this repo to help with testing. They won't cover all potential use cases, but should help you start testing out your code.

5. The fourth example will help you remove comments from your link text

Chances are your first version will include the text from comments inside a link tag. Mine did. Use ex4.html to test that case out and fix the bug.

Hint: See NodeType constants and look for the types that you can ignore.

External Resources

In the solution for this exercise I end up using a DFS, which is a graph theory algorithm. If you want to learn a little more about that, I have discussed it on YouTube here - https://www.youtube.com/watch?v=zboCGDMnU3I

There is a complete series on algorithms and graph theory, though at this time it is somewhat incomplete. I never have enough time in the day 🙁. Hopefully one day Let's Learn Algorithms will be its own series like Gophercises.

Bonus

The only bonuses here are to improve your tests and edge-case coverage.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Link struct {
	Href string
	Text string
}

Link represents a link (<a href="...">) in an HTML document.

func Parse

func Parse(r io.Reader) ([]Link, error)

Parse will take in an HTML document and will return a slice of links parsed from it

Directories

Path Synopsis
examples
ex1
ex2
ex3
ex4

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL