pup
This article is a stub. You can help the IndieWeb wiki by expanding it.
pup is a command line tool for parsing and transforming HTML using some CSS selectors.
pup supports returning the HTML elements themselves, or a JSON representation of the element, so that other tools such as jq can then further extract information without falling back to regex or other FSM.
It seems does not follow in-browser JavaScript DOM conventions, such as innerText or innerHTML properties. Some users may find that surprising.
Selecting using hierarchy of tags only is documented (see below) to have caused some users some issues.
See Also
- Lewis Cowles writes an introduction.