Tips & Best Practices

Cleaning ChatGPT output for your website

You paste AI output into your CMS and the formatting breaks. Escaped entities, markdown artifacts, code block wrappers. Here is how to fix it.

7 min read
Free Guide

You ask ChatGPT to write a product description. It looks great in the chat window. You copy it, paste it into your CMS, hit publish, and your page now has visible markdown asterisks, escaped HTML entities, and a stray code fence at the top.

This is not a CMS bug. It is a format mismatch between what AI chat interfaces produce and what your website expects.

Why it happens

AI chat tools render markdown in their interface. When you copy from the chat window, you are not copying the rendered HTML. You are copying a mix of plain text, markdown syntax, and invisible formatting characters that your browser added during the copy operation.

Your CMS then interprets that pasted content according to its own rules. Some strip formatting entirely. Some try to parse the markdown. Some treat it as raw HTML. The result is unpredictable and almost always wrong.

What each tool leaves behind

ChatGPT

ChatGPT's copy behaviour varies depending on whether you copy from the rendered output or use the copy button. Common artifacts include:

  • Bold text pasted as **text** with literal asterisks
  • Lists that lose their indentation and become run-on text
  • Code blocks wrapped in triple backtick fences that render as visible text
  • Smart quotes and special Unicode characters
  • Escaped HTML entities like &, <, and  

Claude

Claude tends to produce cleaner plain text on copy, but still leaves behind:

  • Markdown header syntax (##, ###) as literal characters
  • Numbered lists that lose their formatting
  • Em-dashes and typographic characters
  • Occasional HTML tags mixed into plain text output

Gemini

Gemini's output often includes:

  • Bold markers as literal asterisks
  • Bullet points that paste as special Unicode bullet characters
  • Inconsistent line breaks and spacing
  • Mixed formatting when the response includes both text and code

Auto-detection matters

The problem is that no two pastes are the same. Sometimes you get clean text. Sometimes you get markdown. Sometimes you get HTML with escaped entities. A good cleanup tool needs to detect what it is looking at before it can fix it.

Our HTML tools auto-detect the format of pasted content. If it finds markdown syntax, it converts. If it finds escaped entities, it decodes. If it finds raw HTML, it tidies. You do not need to tell it what you pasted. It figures it out.

What gets cleaned vs. what gets preserved

Getting this right is the hard part. You want to fix formatting artifacts without destroying intentional content.

Cleaned:

  • Markdown syntax converted to clean HTML or plain text
  • Escaped HTML entities decoded to their real characters
  • Smart quotes normalised to straight quotes
  • Invisible Unicode characters stripped
  • Extra whitespace and blank lines condensed
  • Code fence wrappers removed when the content is not actually code

Preserved:

  • Actual code blocks (when they contain real code)
  • Intentional HTML structure
  • Links and URLs
  • Line breaks that reflect real paragraph structure
  • Lists and their hierarchy

Other sources of messy HTML

AI chat tools are not the only source. If you build websites, you deal with messy input from everywhere.

Microsoft Word

Copy-pasting from Word produces some of the worst HTML on the internet. Nested <span> tags with inline styles, mso- prefixed CSS properties, conditional comments for different Office versions, and XML namespaces. A single paragraph from Word can generate 40 lines of HTML.

Google Docs

Better than Word but still problematic. Docs pastes tend to include <span> wrappers with font-weight and font-style inline, <p> tags with margin styles, and IDs on every element. The HTML is valid but bloated.

Klaviyo

This one is worth calling out specifically. Klaviyo's email editor produces deeply nested <span> tags that serve no purpose. A single styled word can be wrapped in three or four nested spans, each applying one CSS property. When you copy content from a Klaviyo email template to edit it elsewhere, the nesting makes the HTML nearly unreadable.

The Klaviyo span nesting problem looks like this:

<span style="font-family: Arial;">
  <span style="font-size: 14px;">
    <span style="color: #333333;">
      <span style="font-weight: bold;">
        One word.
      </span>
    </span>
  </span>
</span>

Four nested spans for a bold word. Our HTML cleaner collapses these into a single element with combined styles, or strips the inline styles entirely if you want clean semantic HTML.

Output format options

Different destinations need different formats. Our HTML tools support several output modes:

  • Clean HTML - Semantic, minimal markup. Good for CMSs that accept HTML.
  • Plain text - All formatting stripped. Good for text fields, meta descriptions, and anywhere that does not render HTML.
  • Markdown - Clean markdown syntax. Good for static site generators, GitHub READMEs, and documentation platforms.

Pick the format that matches where the text is going. A Shopify product description needs different output than a markdown blog post.

The workflow

  1. 1Copy text from ChatGPT, Claude, Gemini, Word, Docs, or Klaviyo
  2. 2Paste it into our HTML tools
  3. 3The tool auto-detects the format and shows you a preview
  4. 4Choose your output format
  5. 5Copy the clean result into your CMS

Everything runs in your browser. The text you paste never leaves your device. There is no server processing, no account required, and no limit on how much you can clean.

Pair it with AI detection

If you are cleaning AI-generated text for your website, you probably also want to check how detectable it is. Run it through Unwrite GPT to remove the linguistic fingerprints that mark it as AI-written, then use text comparison to see exactly what changed.

Format cleanup and AI humanisation are different problems, but they often show up together. Clean the formatting first, then address the voice.