Conversion Mapping Rules: HTML Elements to Markdown
Overview
This skill documents how html-to-markdown maps 60+ HTML element types to their Markdown equivalents. The conversion logic respects Markdown syntax variations (ATX vs Setext headings, fenced vs indented code, etc.) and maintains semantic accuracy.
Heading Elements (h1-h6)
ATX Style (Default)
# Heading 1 ## Heading 2 ### Heading 3 #### Heading 4 ##### Heading 5 ###### Heading 6
Implementation:
- •Option:
HeadingStyle::Atx(default) - •Each heading level uses n hashes
- •Single space after hashes required
- •Trailing hashes optional (ATX closed style adds them)
HTML Example:
<h1>Title</h1> → # Title <h2 id="intro">Intro</h2> → ## Intro <h3>Detail</h3> → ### Detail
Setext/Underlined Style
Heading 1 ========= Heading 2 ---------
Implementation:
- •Option:
HeadingStyle::Underlined - •H1:
=characters for full line width - •H2:
-characters for full line width - •H3+ not supported in Setext (fallback to ATX)
HTML Example:
<h1>Main Title</h1> → Main Title\n=========== <h2>Subtitle</h2> → Subtitle\n--------- <h3>Detail</h3> → ### Detail (fallback to ATX)
ATX Closed Style
# Heading 1 # ## Heading 2 ## ### Heading 3 ###
Implementation:
- •Option:
HeadingStyle::AtxClosed - •Closing hashes must match opening count
- •Single space before closing hashes
- •Less common, but valid Markdown
Block-Level Elements
Paragraph (<p>)
Mapping:
- •Text content extracted and escaped
- •Trailing/leading whitespace trimmed
- •Single newline after paragraph
Example:
<p>This is a paragraph with <strong>bold</strong> text.</p> → This is a paragraph with **bold** text.\n
Division (<div>)
Behavior:
- •Transparent wrapper for Markdown
- •Content treated as block-level
- •No wrapping markers in output
- •Preserves child semantics
Example:
<div> <p>Paragraph inside div</p> </div> → Paragraph inside div\n
Blockquote (<blockquote>)
Mapping:
- •Each line prefixed with
> - •Nested blockquotes:
> > - •Handles multiple paragraphs
Example:
<blockquote> <p>Quote line 1</p> <p>Quote line 2</p> </blockquote> → > Quote line 1\n>\n> Quote line 2\n
Preformatted Text (<pre>)
Behavior:
- •Whitespace preserved exactly
- •Treated as code block (see Code Blocks below)
- •No entity decoding in content
- •Trimmed and indented
Example:
<pre> code with spaces</pre> → (indented code or fenced, depends on CodeBlockStyle)
Code Blocks
Indented Style (Default):
line 1
line 2
line 3
Implementation:
- •Option:
CodeBlockStyle::Indented - •Each line prefixed with 4 spaces
- •Requires blank line before/after
- •CommonMark default
Fenced Backtick Style:
```language code here
**Implementation:** - Option: `CodeBlockStyle::Backticks` - Triple backticks with optional language specifier - Language from HTML class (e.g., `language-rust` → `rust`) - Can contain blank lines **Fenced Tilde Style:** ```markdown ~~~rust code here ~~~
Implementation:
- •Option:
CodeBlockStyle::Tildes - •Triple tildes with optional language specifier
- •Less common variant of fenced style
HTML Mapping:
<pre><code>simple code</code></pre> <pre><code class="language-python">def foo(): pass</code></pre> <pre>indented code</pre>
Horizontal Rule (<hr>)
Output: ---\n (three dashes)
Alternatives: ***, ___ all valid but standardized to ---
List Elements
Unordered Lists (<ul>)
Default Syntax (dashes):
- Item 1
- Item 2
- Nested item
- Deeply nested
Implementation:
- •
-marker (could be*or+, but-is default) - •Indentation for nesting: spaces or tabs
- •Option:
ListIndentType::Spaces(default) orListIndentType::Tabs
Ordered Lists (<ol>)
1. First item 2. Second item 3. Third item
Implementation:
- •
1.through9.for first 9 items (reset per list) - •Number must be followed by
.(dot space) - •Indentation matches unordered for nesting
List Items (<li>)
Behavior:
- •Content can include block elements (paragraphs, code blocks)
- •Continuation lines indented to match marker
- •Multi-line items:
- First paragraph Second paragraph (indented)
HTML Example:
<ul>
<li>
<p>Item with paragraph</p>
<p>Second paragraph</p>
</li>
</ul>
Definition Lists (<dl>, <dt>, <dd>)
Term : Definition Another Term : Definition 1 : Definition 2
Implementation:
- •
<dt>: Term on its own line - •
<dd>: Definition with:prefix and indentation - •Multiple definitions per term supported
Tables (<table>, <tr>, <td>, <th>)
Mapping:
| Header 1 | Header 2 | |----------|----------| | Cell 1 | Cell 2 | | Cell 3 | Cell 4 |
Implementation:
- •
<table>→ GFM (GitHub Flavored Markdown) table - •
<thead>content becomes header row - •
<tbody>rows become data rows - •Cells separated by
|pipes - •Separator row:
|---|---|(minimum 3 dashes) - •Right-alignment:
:---|Left:|:--Center::--:
Cell Content:
- •Escaped for pipe characters (
|→\|) - •Nested elements converted (e.g.,
<strong>→**) - •Newlines converted to
<br>representation
Semantic HTML5 Elements
Article (<article>)
- •Treated as transparent block wrapper
- •No semantic markers in Markdown
- •Content flows as-is
Section (<section>)
- •Transparent block wrapper
- •Could insert heading separator in future
Nav (<nav>)
- •List-like wrapper
- •Children converted normally
- •Could insert navigation markers
Aside (<aside>)
- •Optional blockquote prefix (configurable)
- •Or treated as transparent block
Header (<header>)
- •Transparent wrapper
- •Content converted normally
Footer (<footer>)
- •Transparent wrapper
- •Could insert footer marker (e.g.,
---\n)
Main (<main>)
- •Transparent wrapper
- •Content flows normally
Inline Elements
Emphasis (<em>, <i>)
Mapping: *text* or _text_
Implementation:
- •Default:
*(asterisk italic) - •No underscore escaping needed in this context
- •Trimmed of excess whitespace
Example:
<em>emphasized</em> → *emphasized* <i>italic</i> → *italic*
Strong (<strong>, <b>)
Mapping: **text**
Implementation:
- •Double asterisks (bold)
- •Trimmed of excess whitespace
- •Can be nested with emphasis
Example:
<strong>bold</strong> → **bold** <b>bold</b> → **bold** <strong><em>bold italic</em></strong> → ***bold italic***
Code (<code>)
Mapping: `text` (backtick inline code)
Implementation:
- •Single backticks for inline
- •Escaped if backticks present in content
- •No entity decoding within code
Example:
<code>variable_name</code> → `variable_name` <code>don't</code> → `don't` <code>`already_quoted`</code> → `` `already_quoted` ``
Link (<a href>)
Mapping: [link text](url "title")
Implementation:
- •
hrefattribute becomes URL - •Text content becomes link text
- •
titleattribute becomes optional title (in quotes) - •URL preserved as-is (no extra encoding)
- •Special link types:
- •
href="#section"→ Anchor link - •
href="/page"→ Internal link (relative) - •
href="https://external.com"→ External link - •
href="mailto:user@example.com"→ Email link - •
href="tel:+1234567890"→ Phone link
- •
Examples:
<a href="https://example.com">Link</a> → [Link](https://example.com) <a href="/page" title="My Page">Internal</a> → [Internal](/page "My Page") <a href="#section">Anchor</a> → [Anchor](#section) <a href="mailto:test@example.com">Email</a> → [Email](mailto:test@example.com)
Image (<img>)
Mapping: 
Implementation:
- •
srcattribute becomes URL - •
altattribute becomes alt text - •
titleattribute becomes optional title - •Dimensions (
width,height) captured in metadata - •Data URIs:
 - •Relative paths preserved
Examples:
<img src="photo.jpg" alt="A photo"> →  <img src="image.png" alt="Image" title="My Image" width="200" height="150"> →  <img src="data:image/png;base64,..." alt="Embedded"> → 
Line Break (<br>)
Mapping:
- •Two spaces + newline:
\n - •Or backslash + newline:
\\\n
Option: NewlineStyle::Spaces (default) or NewlineStyle::Backslash
Example:
<p>Line 1<br>Line 2</p> → Line 1 \nLine 2\n
Strikethrough (<s>, <del>, <strike>)
Mapping: ~~strikethrough~~
Implementation:
- •GFM strikethrough syntax (double tilde)
- •Not standard Markdown, but widely supported
- •Trimmed of excess whitespace
Example:
<del>removed text</del> → ~~removed text~~ <s>strikethrough</s> → ~~strikethrough~~
Subscript/Superscript (<sub>, <sup>)
Behavior:
- •No native Markdown support
- •Typically converted to plain text or HTML passthrough
- •Implementation: Extract text content, no markup
Example:
H<sub>2</sub>O → H2O (plain text) E=mc<sup>2</sup> → E=mc2 (plain text)
Mark/Highlight (<mark>)
Options:
- •
HighlightStyle::DoubleEqual:==text== - •
HighlightStyle::Html:<mark>text</mark> - •
HighlightStyle::Bold:**text** - •
HighlightStyle::None: plain text
Example:
<mark>highlighted</mark> → ==highlighted== (DoubleEqual mode) → <mark>highlighted</mark> (Html mode) → **highlighted** (Bold mode)
Ruby Annotations (<ruby>, <rt>, <rp>)
Mapping:
- •Japanese ruby text support
- •Format:
text {rt_text}or similar - •Implementation: Extract base text with rt annotation
Example:
<ruby>漢字<rt>かんじ</rt></ruby> → 漢字 (かんじ)
Media Elements
Audio (<audio>)
Behavior:
- •No direct Markdown equivalent
- •Typically extracted as metadata or skipped
- •Could insert link to source if
srcattribute
Handling:
<audio src="sound.mp3">Audio</audio> → (Skipped or converted to link in metadata)
Video (<video>)
Behavior:
- •Similar to audio
- •Could extract
posterimage - •Typically skipped in markdown output
Picture/Source (<picture>, <source>)
Behavior:
- •Responsive image container
- •Extract from child
<img>inside - •Or use first source
src
Form Elements
Input (<input>)
Behavior:
- •Generally skipped or marked as form element
- •Could convert to metadata about form structure
- •Types: text, checkbox, radio, button, hidden
Implementation:
- •Placeholder preserved in metadata
- •Value not typically included in markdown
Select/Option (<select>, <option>)
Behavior:
- •Converted to list or metadata
- •Option text extracted
- •Selected state noted
Button (<button>)
Behavior:
- •Text content extracted (ignores
<button>wrapper) - •Click handlers ignored
- •Treated as inline text
Textarea (<textarea>)
Behavior:
- •Content treated as code block or preformatted
- •Whitespace preserved
Special Elements
SVG (<svg>)
Behavior:
- •Can be preserved as inline image or skipped
- •Feature:
inline-imagescan extract inline SVG - •Typically rendered as-is in compatible markdown renderers
MathML (<math>)
Behavior:
- •Skipped in standard markdown
- •Could be preserved with feature gate
- •Converted to LaTeX or plain text fallback
iframe (<iframe>)
Behavior:
- •Generally skipped
- •Could extract as metadata (video embeds, etc.)
- •URL captured if needed
Whitespace and Formatting Context
Whitespace Mode
Normalized (default):
- •Multiple spaces collapsed to single space
- •Multiple newlines → single newline
- •Leading/trailing whitespace trimmed per element
Strict:
- •All whitespace preserved exactly
- •Multiple spaces and newlines intact
- •Useful for poetry, ASCII art, etc.
Text Escaping
Options:
- •
escape_asterisks:*→\* - •
escape_underscores:_→\_ - •
escape_misc: Special chars\ & <[ > ~ # = + | -` - •
escape_ascii: All ASCII punctuation (CommonMark spec)
Example:
<p>Price: $10 & free shipping *limited time*</p> escape_misc=true: → Price: $10 \& free shipping *limited time* escape_asterisks=true: → Price: $10 & free shipping \*limited time\* escape_ascii=true: → Price: \$10 \& free shipping \*limited time\*
Implementation Details Location
Key Files:
- •
/crates/html-to-markdown/src/converter.rs- Element dispatch and conversion - •
/crates/html-to-markdown/src/options.rs- Style configuration enums - •
/crates/html-to-markdown/src/text.rs- Text escaping and normalization
Element Dispatch Example
// From converter.rs pattern
match element.tag_name() {
"h1" | "h2" | "h3" | "h4" | "h5" | "h6" => convert_heading(...),
"p" => convert_paragraph(...),
"a" => convert_link(...),
"img" => convert_image(...),
"strong" | "b" => convert_strong(...),
"em" | "i" => convert_em(...),
"code" => convert_code(...),
"pre" => convert_pre(...),
"blockquote" => convert_blockquote(...),
"ul" | "ol" => convert_list(...),
"li" => convert_list_item(...),
"table" => convert_table(...),
"br" => convert_br(...),
"hr" => convert_hr(...),
// ... 40+ more elements
_ => convert_generic_element(...)
}
Complete Element Reference
See /crates/html-to-markdown/src/visitor.rs for exhaustive NodeType enum covering all 60+ supported elements.
Quick Reference Table
| HTML Element | Markdown Output | Notes |
|---|---|---|
<h1> | # text | ATX style default |
<p> | text\n | Paragraph |
<strong> | **text** | Bold |
<em> | *text* | Italic |
<a href> | [text](url) | Link |
<img> |  | Image |
<ul> | - item | Unordered list |
<ol> | 1. item | Ordered list |
<code> | `text` | Inline code |
<pre> | Indented or fenced | Code block |
<blockquote> | > text | Quote |
<table> | GFM table | Pipe-delimited |
<br> | \n | Line break |
<hr> | --- | Horizontal rule |
<del> | ~~text~~ | Strikethrough |
<mark> | ==text== | Highlight (configurable) |