Parser
Parser is a tiny Go library that fetches a webpage and pulls out the title, meta description, links, and body text. It also tokenises text into words.
It normalises URLs, strips scripts and styles, filters out mailto/tel/etc links, and only keeps words longer than 2 characters when tokenising.
Technical Details
- Go standard library +
goqueryfor HTML parsing/selection - Single struct return: title, description, links, text, tokens
- URL normalisation (resolve relative, strip fragments)
- HTML sanitisation (strip
<script>,<style>) - Word tokenisation with minimum length filter