Skip to content

ankit2101/convertToMarkDown

Repository files navigation

Doc ⇄ MD Converter

A native macOS app that converts Word, Excel, PowerPoint, and PDF documents to clean Markdown — and back again from Markdown to Word, PowerPoint, or PDF. Entirely offline, no data leaves your machine.

Platform Swift Python License Privacy


What it does

Doc → MD Converter takes your documents and turns them into clean, readable Markdown:

Input Output What's preserved
.docx / .doc .md Headings, bold, italic, tables, lists
.xlsx / .xls .md All sheets as Markdown tables
.pptx / .ppt .md Slide titles, bullet points, tables
.pdf .md Page text, embedded tables

It also converts Markdown back into documents:

Input Output How it maps
.md / .markdown .docx Headings, bold/italic, bullets, tables, code
.md / .markdown .pptx Each top-level heading becomes a slide
.md / .markdown .pdf Styled headings, bullets, and gridded tables

Common use cases:

  • Feeding documents into AI tools that accept Markdown (ChatGPT, Claude, Notion AI)
  • Importing legacy Word/PowerPoint content into Obsidian, Notion, or Logseq
  • Converting Excel reports into readable Markdown tables for wikis or documentation
  • Archiving presentations as plain text for version control

Features

  • Two-way conversion — documents → Markdown, and Markdown → .docx / .pptx / .pdf, split across tabs
  • Native macOS UI — built with SwiftUI, looks and behaves like a proper Mac app
  • Drag & drop — drag files directly onto the window
  • Batch conversion — add multiple files, convert all in one click
  • Any output folder — choose exactly where your .md files are saved
  • Progress tracking — per-file status and a live progress bar
  • 100% offline — no internet connection, no API keys, no subscriptions
  • No telemetry — zero data collection of any kind

Requirements

  • macOS 12 Monterey or later (Apple Silicon or Intel)
  • Xcode Command Line Tools — for Swift compilation:
    xcode-select --install
  • Python 3.9+ — ships with macOS, no separate install needed

Installation

One-command build and install

git clone https://github.com/ankit2101/convertToMarkDown.git
cd convertToMarkDown
chmod +x build.sh
./build.sh

This will:

  1. Compile the Swift UI binary (~332 KB)
  2. Create a Python virtual environment and install conversion libraries
  3. Bundle everything into Doc to MD.app
  4. Install to /Applications (prompts for your password once)

After installation, find the app in Launchpad or via Spotlight (⌘ Space → "Doc to MD").

Rebuilding after code changes

./build.sh

How it works

The app has two layers that communicate via subprocess:

┌─────────────────────────────┐
│  SwiftUI (native macOS UI)  │  ← window, drag-drop, file list, progress
└────────────┬────────────────┘
             │  Process() — one subprocess per file
┌────────────▼────────────────┐
│  Python (conversion engine) │  ← mammoth, openpyxl, python-pptx, pdfplumber
└─────────────────────────────┘

Swift layer (swift/DocToMD.swift):

  • Renders the native SwiftUI window with drag-and-drop support
  • Handles file selection, output folder picker, and conversion progress
  • Spawns /usr/bin/python3 as a subprocess per file, with the bundled venv on PYTHONPATH

Python layer (converter.py, convert_single.py):

  • mammoth converts Word XML to HTML; markdownify renders it as clean Markdown
  • openpyxl reads each Excel sheet and produces Markdown tables
  • python-pptx walks slides extracting titles, body text, and table shapes
  • pdfplumber extracts text and table data page-by-page from PDFs

The Python venv lives inside the .app bundle at Contents/Resources/venv/ — no system-wide packages are modified.


Project structure

convertToMarkDown/
├── swift/
│   └── DocToMD.swift        # Complete SwiftUI app (UI + subprocess calls)
├── converter.py             # Document → Markdown conversion logic
├── convert_single.py        # CLI wrapper invoked by the Swift subprocess
├── requirements.txt         # Python dependencies
├── build.sh                 # Build + install script
└── README.md

Python dependencies

Package Purpose
mammoth Word (.docx) → HTML → Markdown
markdownify HTML → clean Markdown
openpyxl Excel (.xlsx/.xls) reading
python-pptx PowerPoint (.pptx/.ppt) reading
pdfplumber PDF text and table extraction

All packages install into a local venv inside the app bundle — nothing is installed globally.


Markdown output examples

From a Word document

# Q3 Financial Report

## Executive Summary

Revenue grew **23%** year-over-year, driven primarily by...

| Region | Q3 Revenue | Growth |
|--------|-----------|--------|
| North  | $4.2M     | +18%   |
| South  | $3.1M     | +31%   |

From an Excel spreadsheet

## Sheet1

| Name       | Department  | Start Date |
|------------|-------------|------------|
| Alice Chen | Engineering | 2021-03-15 |
| Bob Kumar  | Marketing   | 2020-07-01 |

From a PowerPoint presentation

## Slide 1

### Product Roadmap 2025

- Q1: Launch v2.0 with new onboarding flow
- Q2: Mobile app beta release
- Q3: Enterprise tier rollout
- Q4: International expansion

From a PDF

## Page 1

Technical Specification v3.2

This document describes the API contract between the frontend
and backend services...

| Endpoint        | Method | Auth |
|-----------------|--------|------|
| /api/users      | GET    | JWT  |
| /api/users/{id} | PUT    | JWT  |

Privacy

No data ever leaves your machine.

  • No internet connection is made during conversion
  • No API keys, accounts, or subscriptions required
  • No analytics, crash reporting, or telemetry of any kind
  • Documents are read from disk and written back to disk — that's it

Verify it yourself while a conversion runs:

lsof -i -n -P | grep "Doc to MD"
# → no output = zero network connections

Using the converter from the command line

You can also run the Python converter directly without the UI:

# Clone and set up
git clone https://github.com/ankit2101/convertToMarkDown.git
cd convertToMarkDown
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Convert a single file
python3 convert_single.py path/to/document.docx ./output/

# Use the converter module directly in Python
from converter import convert_file
from pathlib import Path
convert_file(Path("report.xlsx"), Path("./output"))

Limitations

  • Images in Word/PowerPoint are not extracted (text and tables only)
  • Complex Word formatting (text boxes, SmartArt, WordArt) is simplified or skipped
  • Password-protected files are not supported
  • Scanned PDFs (image-only, no text layer) produce no output — run OCR first (e.g. Tesseract)

Contributing

Pull requests welcome. For large changes, open an issue first to discuss.

git clone https://github.com/ankit2101/convertToMarkDown.git
cd convertToMarkDown

# Test the Python converter without building the Swift app
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python3 convert_single.py path/to/test.docx ./output/

License

MIT — free to use, modify, and distribute.


Related tools

Tool What it does
Pandoc Universal document converter (CLI, cross-platform)
Marker High-accuracy PDF → Markdown with layout awareness
markitdown Microsoft's Office → Markdown CLI tool
Obsidian Markdown-based knowledge base (great place to import your output)

Releases

No releases published

Packages

 
 
 

Contributors