DTDucas™
Sign inSign in
DTDucas
  • Projects
  • About
  • Capabilities
  • Blog
  • Contact
  • Sign in
  • LinkedIn
  • GitHub
  • X
contact@dtducas.com
|
P06 · 2025

CHM docs → clean, AI-ready Markdown + vector DB

All projects
CHM Converter — CHM docs → clean, AI-ready Markdown + vector DB
Overview

A Python utility that turns Compiled HTML Help (.chm) into clean, AI-ready Markdown — a profile system, automatic encoding detection, code-block preservation, and generated search/lookup indexes. Built for the Autodesk Revit API docs; the generic profile handles any CHM.

Role

Author & maintainer (open source)

Stack
  • Python
  • asyncio
  • chardet
  • 7-Zip
  • Markdown
  • MCP
View repositoryView repository
Highlights
  • 01Profile-based conversion: a generic profile for any CHM, a tuned profile for Revit API docs
  • 02Automatic encoding detection (UTF-8, GB18030/GBK/GB2312) via chardet with CJK fallbacks
  • 03Code-block language detection across C#, Python, C++, F#, Java, JS/TS, SQL, XML, JSON
  • 04Async, batched pipeline with bounded concurrency — handles 6,000+ page CHM files without OOM
  • 05Generates file_index.json / id_lookup.json / index.md for search and AI integration
Outcomes

6000+

pages per CHM

MCP

AI-ready output
Next project

RevitMCPSDK

Tell me about your project and I'll advise on the fit, scope, and approach — architecture, APIs, data pipelines, cloud, and CI/CD — and the right level of automation for your goals, technical constraints, and timeline.

Let's build something reliable, end to end

ContactContact
BLCK. 01
  • Projects
  • About
  • Capabilities
  • Blog
  • Contact
BLCK. 02
  • Privacy Policy
  • Terms
  • Cookie Policy
  • FAQ
BLCK. 03
  • LinkedIn
  • GitHub
  • X
  • |
DTDucas™
© 2026 DTDucas. All rights reserved.contact@dtducas.com