link

package

v0.0.0-...-55065b5 Latest Latest Go to latest Published: Apr 5, 2021 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/2n3g5c9/gophercises

Links

Open Source Insights

README ¶

Exercise #4: HTML Link Parser

Exercise details

In this exercise your goal is create a package that makes it easy to parse an HTML file and extract all of the links (<a href="">...</a> tags). For each extracted link you should return a data structure that includes both the href, as well as the text inside the link. Any HTML inside of the link can be stripped out, along with any extra whitespace including newlines, back-to-back spaces, etc.

Links will be nested in different HTML elements, and it is very possible that you will have to deal with HTML similar to code below.

<a href="/dog">
  <span>Something in a span</span>
  Text not in a span
  <b>Bold text!</b>
</a>

In situations like these we want to get output that looks roughly like:

Link{
  Href: "/dog",
  Text: "Something in a span Text not in a span Bold text!",
}

Once you have a working program, try to write some tests for it to practice using the testing package in go.

Notes

1. Use the x/net/html package

I recommend checking out the x/net/html package for this task, which you will need to go get. It is provided by the Go team, but isn't included in the standard library. This makes it a little easier to parse HTML files.

2. Ignore nested links

You can ignore any links nested inside of another link. Eg with following HTML:

<a href="#">
  Something here <a href="/dog">nested dog link</a>
</a>

It is okay if your code returns only the outside link.

3. Get something working before focusing on edge-cases

Don't worry about having perfect code. Chances are there will be a lot of edge cases here that will be kinda tricky to handle. Just try to cover the most basic use cases first and then improve on that.

4. A few HTML examples have been provided

I created a few simpler HTML files and included them in this repo to help with testing. They won't cover all potential use cases, but should help you start testing out your code.

5. The fourth example will help you remove comments from your link text

Chances are your first version will include the text from comments inside a link tag. Mine did. Use ex4.html to test that case out and fix the bug.

Hint: See NodeType constants and look for the types that you can ignore.

External Resources

In the solution for this exercise I end up using a DFS, which is a graph theory algorithm. If you want to learn a little more about that, I have discussed it on YouTube here - https://www.youtube.com/watch?v=zboCGDMnU3I

There is a complete series on algorithms and graph theory, though at this time it is somewhat incomplete. I never have enough time in the day 🙁. Hopefully one day Let's Learn Algorithms will be its own series like Gophercises.

Bonus

The only bonuses here are to improve your tests and edge-case coverage.

Documentation ¶

Index ¶

type Link
- func Parse(r io.Reader) ([]Link, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Link ¶

type Link struct {
	Href string
	Text string
}

Link represents a link (<a href="...">) in an HTML document.

func Parse ¶

func Parse(r io.Reader) ([]Link, error)

Parse will take in an HTML document and will return a slice of links parsed from it

Source Files ¶

View all Source files

parse.go

Directories ¶

Path	Synopsis
examples
ex1
ex2
ex3
ex4

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL