When does WordPress wrap inline scripts in CDATA?

I’m debugging a problem with a third party script of ours that wordpress users use by copy / pasting a snippet of script and html into their post’s bodies like (non-real world example of course):

<script>
window.foobar = window.foobar || { hello: function(){ console.log('Hello World'); } };
window.foobar.hello();
</script>

I noticed that some installations of wordpress will wrap this in CDATA, some won’t (probably by doing some sort of DOCTYPE checking – although all themes I tested this on were using an HTML5 doctype).

Yet, when wrapping the script in CDATA the users will get bitten by the following bug: https://core.trac.wordpress.org/ticket/3670 (the closing > is incorrectly replaced by &gt;) which leads to the browser ignoring the script content:

<script>// <![CDATA[  window.foobar = window.foobar || { hello: function(){ console.log('Hello World'); } }; window.foobar.hello();  // ]]&gt;</script>

I don’t own too much WP-Fu myself and googling only led me to identifying the problem as is, so my question would be: when exactly does WordPress wrap inline scripts into CDATA sections? Can the user somehow prevent this behavior? Can the user somehow work around the above bug without modifying WP core?

1

Actually, it is not WordPress that is inserting the CDATA tags, but the visual editor, TinyMCE. Details of TinyMCE are offtopic here, but you can read a solution to this on Stackoverflow.

That said, stopping TinyMCE may not be the full solution you want. WordPress itself also has a function for adding CDATA tags, wxr_cdata, which is used when outputting a valid xml-file, for instance if you want to export the file of use the content in a rss-feed. Themes and/or plugins could decide to attach this filter to the content if they want the document to be valid xhtml.

This is where you then run into the bug, which was first documented twelve years ago and remains unsolved. It’s about these three lines in the_content:

$content = apply_filters( 'the_content', $content );
$content = str_replace( ']]>', ']]&gt;', $content );
echo $content;

As you can see, the str_replace is hardcoded, immediately followed by the echo. There’s no way to intercept this replacement.

What you can do, however, if you control your theme, is buffer the_content and reverse the replacement. Like this:

ob_start();
the_content();
$content = ob_get_clean();
$content = str_replace( ']]&gt', ']]>', $content ); 
echo $content;

Leave a Comment