document-convert
Convert local documents into normalized markdown or plain text with lightweight metadata.
Why install it
Knowledge workflows need a common representation for mixed source files. This tool gives an agent a simple document normalization primitive for text, markdown, HTML, JSON, and CSV inputs.
Inputs
path: path to the local document to convertto_format:markdownortextextract_metadata: whether to include file metadata such as size and line count
Outputs
path: original file pathmedia_type: detected media type for the inputcontent: converted document bodymetadata: file and conversion metadata
Local development
The source code for this tool can be found here
Test:
python -m unittest discover -s tests -p 'test_*.py'
Example invocation
python -u document_convert/__main__.py < input.json
With input.json containing:
{
"path": "fixtures/sample.html",
"to_format": "markdown"
}