How would one modify the filtering Gutenberg applies to pasted content?

Note: I’m adding information I discover that seems to be leading to a resolution in the note.

The Problem:

When pasting content from a source external to Gutenberg into Gutenberg some HTML/CSS formatting is lost.[1] While Gutenberg retains most HTML (semantic) elements it drops CSS (styling/non-semantic) elements. This means that properties such as font size, text alignment, text color, etc. are all removed during the paste event.

Not the Problem:

We could discuss the plugins, custom HTML blocks, etc. (e.g., Wordable, JetPack) available for converting external content sources (e.g., Google Docs) to WP friendly content but this question is decidedly not about those solutions. Instead, this question is exclusively focused on how to programmatically alter Gutenberg’s paste handling behavior.

Seeing the Problem in Action

This problem occurs in many circumstances. For example, try pasting the following block of HTML into the paragraph block in Gutenberg:

<p style="color:red">Hello WordPress StackExchange!</p>

Then view the HTML for that paragraph block and you’ll see:

<p>Hello WordPress StackExchange!</p>

The style="color:red" has been stripped out.

Looking at the Paragraph Block

One of the blocks that suffers from this stripping is the paragraph block (/gutenberg/packages/block-library/src/paragraph). This block[2] uses the RichText component (/gutenberg/packages/block-editor/rich-text) to implement its rich text editing functionality.

Looking at the RichText Component

In /rich-text/index.js we find the onPaste method which the paragraph block inherits. This function in turn calls the pasteHandler function (/gutenberg/packages/blocks/src/api/raw-handling/paste-handler.js).

Looking at the Paste Handler

The pasteHandler function “Converts an HTML string to known blocks. Strips everything else.” according to the JSDoc.

This function takes five parameters:

  • HTML = The source content to convert if in HTML format
  • plainText = The source content to convert if in text format
  • mode = Whether to paste the content in as blocks or inline content in existing block.
  • tagName = What tag we are inserting the content into.
  • canUserUseUnfilteredHTML = Initially I thought this determined whether one could use any HTML/CSS one desired but it appears to be more limited, AFAIK it only determines whether the iframeRemover function is run against the pasted content, which is only tangentially relevant.

We can see that pasteHandler is imported (index.js):

import { 
  children as childrenSource, 
 } from '@wordpress/blocks';

pasteHandler is then called from onPaste:

onPaste( { value, onChange, html, plainText, files } ) {


if ( files && files.length && ! html ) {
  const content = pasteHandler( {
    HTML: filePasteHandler( files),
    mode: 'BLOCKS',
} );


const content = pasteHandler ( {
  HTML: html,
} );



We are interested for our purposes only in a portion of the pasteHandler function:

const rawTransforms = getRawTransformations();
const phrasingContentSchema = getPhrasingContentSchema( 'paste' );
const blockContentSchema = getBlockContentSchema( rawTransforms, phrasingContentSchema, true );

const blocks = compact( flatMap( pieces, ( piece ) => {

    if ( ! canUserUseUnfilteredHTML ) {
        // Should run before `figureContentReducer`.
        filters.unshift( iframeRemover );

    const schema = {
        // Keep top-level phrasing content, normalised by `normaliseBlocks`.

    piece = deepFilterHTML( piece, filters, blockContentSchema );
    piece = removeInvalidHTML( piece, schema );
    piece = normaliseBlocks( piece );
    piece = deepFilterHTML( piece, [
    ], blockContentSchema );


    return htmlToBlocks( { html: piece, rawTransforms } );
} ) );

Even here, most of what occurs is not relevant to our current issue. We don’t care, for example, about Google Doc UIDs being removed or Word lists being converted.

Instead we are interested in:

  • rawTransforms – contains the results of a call to getRawTransformations, also defined in paste-handler.js.

    • I don’t think this code is involved, but maybe someone can help me understand what it does? 🙂
  • phrasingContentSchema – contains the results of calling getPhrasingContentSchema, defined in phrasing-content.js.

    • This appears to remove a few invisible attributes (u, abbr, data, etc.) which could be part of this problem but the more likely issues folks will run into are with the CSS styles, not these attributes.
  • blockContentSchema – contains the results of a call to getBlockContentSchema, defined in utils.js.

    • Again,not entirely sure I understadn what it does, but I don’t think it is involved.
  • phrasingContentReducer – one of the filters, defined in phrasing-content-reducer.js.

    • I’m unsure but I suspect this snippet may be involved:
if ( node.nodeName === 'SPAN' && ) {
  const {
  } =;
  • deepFilterHTML – defined in utils.js, essentially a wrapper for deepFilterNodeList, also found in utils.js.

    • Again, not sure I understand this segment of code, could be involved.
  • removeInvalidHTML – defined in utils.js, essentially a wrapper for cleanNodeList, also found in utils.js.

    • Believe this is involved, cleanNodeList JSDoc states, “Given a schema, unwraps or removes nodes, attributes and classes on a node”.

You’ll notice several functions that did not make the list – after reviewing their code, I don’t believe they are involved in the current problem (e.g., normaliseBlocks, brRemover, emptyParagraphRemover, etc).

The Conclusion

I just rewrote most of this question, I’ll try to refine a bit later and share more on what specific snippets of code that I did not understand does when I have a chance to look at it. Hoping that this may be helpful to others / someone may be able to explain to me what I am missing…or I can keep slogging away. 🙂

[1] Technically, this isn’t always true. Some blocks may accept most/all content pasted into them – for example the HTML block. But the retaining of pasted content is an exception to and not the rule.

[2] You can find the reference in /paragraph/edit.js in the ParagraphBlock function.


In my blocks.js , located in wp-includes/js/dist/ I find the RemoveInvalidHTML function, which seems to responsible for removing styles from pasted HTML.

 * Given a schema, unwraps or removes nodes, attributes and classes on HTML.
 * @param {string} HTML   The HTML to clean up.
 * @param {Object} schema Schema for the HTML.
 * @param {Object} inline Whether to clean for inline mode.
 * @return {string} The cleaned up HTML.

function removeInvalidHTML(HTML, schema, inline) {
  var doc = document.implementation.createHTMLDocument('');
  doc.body.innerHTML = HTML;
  cleanNodeList(doc.body.childNodes, doc, schema, inline);
  return doc.body.innerHTML;

As you can see, this returns only the innerHTML . Modifying (creating a customized version of) this function would likely solve your problem.

Leave a Comment