
Overview
A Python utility that turns Compiled HTML Help (.chm) into clean, AI-ready Markdown — a profile system, automatic encoding detection, code-block preservation, and generated search/lookup indexes. Built for the Autodesk Revit API docs; the generic profile handles any CHM.
Role
Author & maintainer (open source)
Stack
- Python
- asyncio
- chardet
- 7-Zip
- Markdown
- MCP
Highlights
- 01Profile-based conversion: a generic profile for any CHM, a tuned profile for Revit API docs
- 02Automatic encoding detection (UTF-8, GB18030/GBK/GB2312) via chardet with CJK fallbacks
- 03Code-block language detection across C#, Python, C++, F#, Java, JS/TS, SQL, XML, JSON
- 04Async, batched pipeline with bounded concurrency — handles 6,000+ page CHM files without OOM
- 05Generates file_index.json / id_lookup.json / index.md for search and AI integration
Outcomes
6000+
pages per CHMMCP
AI-ready output