Cleaning ChatGPT output for your website

You ask ChatGPT to write a product description. It looks great in the chat window. You copy it, paste it into your CMS, hit publish, and your page now has visible markdown asterisks, escaped HTML entities, and a stray code fence at the top.

This is not a CMS bug. It is a format mismatch between what AI chat interfaces produce and what your website expects.

Why it happens

AI chat tools render markdown in their interface. When you copy from the chat window, you are not copying the rendered HTML. You are copying a mix of plain text, markdown syntax, and invisible formatting characters that your browser added during the copy operation.

Your CMS then interprets that pasted content according to its own rules. Some strip formatting entirely. Some try to parse the markdown. Some treat it as raw HTML. The result is unpredictable and almost always wrong.

What each tool leaves behind

ChatGPT

ChatGPT's copy behaviour varies depending on whether you copy from the rendered output or use the copy button. Common artifacts include:

Bold text pasted as **text** with literal asterisks
Lists that lose their indentation and become run-on text
Code blocks wrapped in triple backtick fences that render as visible text
Smart quotes and special Unicode characters
Escaped HTML entities like &, <, and

Claude

Claude tends to produce cleaner plain text on copy, but still leaves behind:

Markdown header syntax (##, ###) as literal characters
Numbered lists that lose their formatting
Em-dashes and typographic characters
Occasional HTML tags mixed into plain text output

Gemini

Gemini's output often includes:

Bold markers as literal asterisks
Bullet points that paste as special Unicode bullet characters
Inconsistent line breaks and spacing
Mixed formatting when the response includes both text and code

Auto-detection matters

The problem is that no two pastes are the same. Sometimes you get clean text. Sometimes you get markdown. Sometimes you get HTML with escaped entities. A good cleanup tool needs to detect what it is looking at before it can fix it.

Our HTML tools auto-detect the format of pasted content. If it finds markdown syntax, it converts. If it finds escaped entities, it decodes. If it finds raw HTML, it tidies. You do not need to tell it what you pasted. It figures it out.

What gets cleaned vs. what gets preserved

Getting this right is the hard part. You want to fix formatting artifacts without destroying intentional content.

Cleaned:

Markdown syntax converted to clean HTML or plain text
Escaped HTML entities decoded to their real characters
Smart quotes normalised to straight quotes
Invisible Unicode characters stripped
Extra whitespace and blank lines condensed
Code fence wrappers removed when the content is not actually code

Preserved:

Actual code blocks (when they contain real code)
Intentional HTML structure
Links and URLs
Line breaks that reflect real paragraph structure
Lists and their hierarchy

Other sources of messy HTML

AI chat tools are not the only source. If you build websites, you deal with messy input from everywhere.

Microsoft Word

Copy-pasting from Word produces some of the worst HTML on the internet. Nested <span> tags with inline styles, mso- prefixed CSS properties, conditional comments for different Office versions, and XML namespaces. A single paragraph from Word can generate 40 lines of HTML.

Google Docs

Better than Word but still problematic. Docs pastes tend to include <span> wrappers with font-weight and font-style inline, <p> tags with margin styles, and IDs on every element. The HTML is valid but bloated.

Klaviyo

This one is worth calling out specifically. Klaviyo's email editor produces deeply nested <span> tags that serve no purpose. A single styled word can be wrapped in three or four nested spans, each applying one CSS property. When you copy content from a Klaviyo email template to edit it elsewhere, the nesting makes the HTML nearly unreadable.

The Klaviyo span nesting problem looks like this:

<span style="font-family: Arial;">
  <span style="font-size: 14px;">
    <span style="color: #333333;">
      <span style="font-weight: bold;">
        One word.
      </span>
    </span>
  </span>
</span>

Four nested spans for a bold word. Our HTML cleaner collapses these into a single element with combined styles, or strips the inline styles entirely if you want clean semantic HTML.

Output format options

Different destinations need different formats. Our HTML tools support several output modes:

Clean HTML - Semantic, minimal markup. Good for CMSs that accept HTML.
Plain text - All formatting stripped. Good for text fields, meta descriptions, and anywhere that does not render HTML.
Markdown - Clean markdown syntax. Good for static site generators, GitHub READMEs, and documentation platforms.

Pick the format that matches where the text is going. A Shopify product description needs different output than a markdown blog post.

The workflow

1Copy text from ChatGPT, Claude, Gemini, Word, Docs, or Klaviyo
2Paste it into our HTML tools
3The tool auto-detects the format and shows you a preview
4Choose your output format
5Copy the clean result into your CMS

Everything runs in your browser. The text you paste never leaves your device. There is no server processing, no account required, and no limit on how much you can clean.

Pair it with AI detection

If you are cleaning AI-generated text for your website, you probably also want to check how detectable it is. Run it through Unwrite GPT to remove the linguistic fingerprints that mark it as AI-written, then use text comparison to see exactly what changed.

Format cleanup and AI humanisation are different problems, but they often show up together. Clean the formatting first, then address the voice.