web-page-extract
Fetch a web page and return cleaned page content, metadata, and normalized links.
Why install it
Most agent systems need a reliable first-step web ingestion primitive. This tool gives you one packageable contract for page fetch + cleaning instead of rebuilding the same fetch/parse glue inside every agent.
Inputs
url: page URL to fetchformat:markdownortextinclude_links: include extracted linkstimeout_ms: request timeoutmax_chars: output content cap
Outputs
url: original requested URLfinal_url: final fetched URL after redirectstitlecanonical_urlpublished_atbylineexcerptformatcontentlinksmetadata
Local development
node --test
To build the packaged entrypoint:
npm run build
Example invocation
printf '%s' '{"url":"https://example.com","format":"markdown"}' | node dist/index.js