axiomape.com  ·  HTML to Markdown Converter

Conversion Settings

Input: Paste Raw HTML Here 0 chars
Output: Clean Markdown 0 chars
Lines: 0 Words: 0 Headings: 0 Links: 0 Code blocks: 0 Tables: 0
Copied!
Your data never leaves your device. This tool processes all code locally in your browser using client-side JavaScript (DOM Parsing). We do not store, save, or transmit your sensitive documents or web code to any server.

The Ultimate Guide to Converting HTML to Markdown

Everything developers, writers, and content creators need to know

Markdown was created in 2004 by John Gruber as a way to write text that is readable as-is, but can also be converted to clean HTML. Today the relationship has flipped: much of the web is built in HTML first, and developers, technical writers, and content strategists constantly need to go the other direction - from HTML back to Markdown. Whether you are pulling content from a CMS, archiving web pages, migrating a blog, or preparing documentation for a GitHub repository, converting HTML to Markdown by hand is slow, error-prone, and deeply tedious. This tool automates every step of that process using a browser-side parsing engine called Turndown.js, so you get accurate, clean Markdown in real time without ever leaving the page.

The conversion process works through what is called DOM Parsing. The term "DOM" stands for Document Object Model - it is the internal tree structure a web browser builds when it reads HTML. Each tag (<h1>, <p>, <ul>, etc.) becomes a node in that tree. When you paste HTML into the input box, this tool builds that same tree in memory, then walks through every node and replaces each HTML element with its Markdown equivalent: <strong> becomes **bold**, <h2> becomes ## Heading, <a href="..."> becomes [text](url), and so on. The result is clean, portable, human-readable Markdown - free of any HTML syntax.

Markdown is dramatically lighter and more portable than HTML. A Markdown file can be read by a human without any rendering - the syntax is intuitive enough that # Heading and **bold** are obvious at a glance. HTML, by contrast, is verbose and intended for machines. Here are the most common real-world reasons developers make this conversion:

  • Documentation and README files: GitHub, GitLab, and Bitbucket render Markdown natively, making it the standard language for project documentation, wikis, and README files.
  • Static site generators: Tools like Jekyll, Hugo, Eleventy, and Astro use Markdown as their primary content format. If you have an old HTML site you want to migrate, you need Markdown first.
  • Content migration: Moving content from WordPress, Squarespace, or a legacy CMS often means extracting raw HTML that then needs to be cleaned up into Markdown for a new system.
  • AI and LLM prompts: Large language models process Markdown far more efficiently than HTML because the signal-to-noise ratio is much higher - there is actual content rather than markup overhead.
  • Version control: Markdown files diff cleanly in Git because changes are semantic. HTML files are full of tags that obscure what actually changed in the content.

Standard Markdown (the original spec from John Gruber) is intentionally minimal. It handles headings, bold, italics, links, images, code, and lists - but it does not define a standard way to render tables, strikethrough text, or syntax-highlighted code blocks. Different platforms filled that gap in different ways, creating fragmentation.

GitHub Flavored Markdown - abbreviated as GFM - is GitHub's formal extension of the Markdown spec, published in 2017. It has since become the de facto standard across the developer ecosystem. The key additions GFM introduces are:

  • Tables: Using pipe characters (|) to define rows and columns, with a separator row to indicate header cells.
  • Strikethrough: Wrapping text in double tildes (~~like this~~) to render struck-through text.
  • Task lists: Using - [ ] and - [x] syntax to create interactive checkboxes in GitHub issues and PRs.
  • Fenced code blocks: Using triple backticks (```) with an optional language label (like ```javascript) to enable syntax highlighting.

Enabling GFM in this tool ensures that HTML <table> elements are converted to proper pipe-delimited Markdown tables, and that <del> or <s> tags become ~~strikethrough~~ syntax. If your target platform does not support GFM (rare, but possible), you can toggle it off and the converter will handle those elements differently or skip them.

CSS and JavaScript are both completely removed during conversion, and this is intentional. Markdown is a plain-text format. It carries meaning through structure (headings, lists, bold text, links) but it has no concept of color, font size, layout, or interactivity. There is no Markdown equivalent for color: red or font-weight: 700 applied inline - those styles simply cannot be expressed in the format.

Here is exactly what happens to each element type during the conversion process:

  • Inline CSS (style="..." attributes): Stripped entirely. The visual formatting is lost; only the semantic structure of the tag is preserved.
  • CSS class and ID attributes (class="...", id="..."): Stripped. Markdown has no class system.
  • <style> blocks: Removed completely - the converter processes content nodes, not stylesheet declarations.
  • <script> blocks: Removed completely. JavaScript has no place in a Markdown document.
  • Data attributes (data-*): Stripped. Custom data attributes are HTML-specific and have no Markdown analogue.
  • Semantic meaning is preserved: An <h3 style="color:red"> becomes a Markdown ### Heading - the heading level is preserved even though the red color is discarded.

This stripping behavior is a feature, not a bug. The output Markdown will be leaner, cleaner, and significantly more portable than the original HTML.

HTML that is "messy" - meaning it has unclosed tags, deeply nested formatting wrappers, redundant inline styles, or mixed indentation - can produce Markdown that is technically correct but structurally confusing. The parser does its best using the browser's own HTML engine (which is extremely forgiving), but garbage in tends to produce garbage out at a structural level.

Common sources of messy HTML include content copied from Microsoft Word or Google Docs (which adds hundreds of useless style tags and wrapper spans), HTML exported by visual page builders (which wrap every sentence in <div> after <div>), or legacy content from old CMS platforms that used font tags and table-based layouts. When converting this kind of HTML, you may notice:

  • Extra blank lines in the Markdown output from deeply nested empty div structures.
  • Code blocks that appear around content not intended to be code, because of unusual pre/code tag nesting.
  • Escaped special characters (like \* or \_) appearing in the output to prevent unintended Markdown formatting.

The best practice is to paste only the inner content HTML (the part between <body> tags) rather than full page HTML with <head>, scripts, and navigation. This produces the cleanest, most focused Markdown output. You can then use the output as a starting point and make any final adjustments directly in the output box if you enable the "Make Output Editable" option.

Markdown supports two distinct syntaxes for headings, and both are part of the original spec. ATX style uses hash symbols at the start of the line - one hash for H1, two for H2, up to six for H6. For example: # Main Title, ## Section, ### Subsection. This style works for all six heading levels and is the universally supported modern standard. Almost every Markdown editor, static site generator, and documentation platform handles ATX headings correctly.

Setext style is an older approach that only supports two heading levels. H1 headings are indicated by underlining the text with equals signs (===), and H2 headings are underlined with hyphens (---). There is no Setext equivalent for H3 through H6, so any deeper headings automatically fall back to ATX hash syntax anyway. Setext headings can look more readable in raw text form for some writers, but they are less flexible and less universally supported.

The recommendation for nearly all use cases is to stick with the default ATX style. It is consistent across all heading levels, works everywhere, and is what the vast majority of Markdown tools and linters expect. Setext is primarily useful if you are targeting a very specific legacy system or have a personal preference for the visual appearance of raw Markdown files.