Why escape if the_content isnt?

The built in function the_content runs through several filters, but does not escape output. It would be difficult for it to do so, as HTML and even some scripts must be allowed through.

When outputting, the_content seems to run through these filters (as of 5.0):

add_filter( 'the_content', 'do_blocks', 9 );
add_filter( 'the_content', 'wptexturize' );
add_filter( 'the_content', 'convert_smilies', 20 );
add_filter( 'the_content', 'wpautop' );
add_filter( 'the_content', 'shortcode_unautop' );
add_filter( 'the_content', 'prepend_attachment' );
add_filter( 'the_content', 'wp_make_content_images_responsive' );

(and)

add_filter( 'the_content', 'capital_P_dangit' );
add_filter( 'the_content', 'do_shortcode' );

It also does a simple string replace:

$content = str_replace( ']]>', ']]>', $content );

And then get_the_content does a tiny bit of processing related to the “more” link and a bug with foreign languages.

None of those prevent XSS script injection, right?

When saving, the data is sanitized through wp_kses_post. But as this is an expensive process, I understand why it’s not used on output.

The rule of thumb for WordPress escaping is that everything needs to be escaped, regardless of input sanitation, and as lately as possible. I’ve read several articles saying this, because the database is not to be considered a trusted source.

But for the reasons above, the_content doesn’t follow that. Nor do the core themes (i.e. TwentyNineteen) add additional escaping on output.

So…why is it helping anything to escape elsewhere? If I were a hacker with access to the database, wouldn’t I just add my code to a post’s content?

4 s
4

If I were a hacker with access to the database, wouldn’t I just add my
code to a post’s content?

If you’ve got access to the database, chances are that you’ve got enough access that escaping isn’t going to stop you. Escaping is not going to help you if you’ve been hacked. It’s not supposed to. There’s other reasons to escape. The two main ones that I can think of are:

To deal with unsanitized input

WordPress post content is sanitized when it’s saved, but not everything else is. Content passed via a query string in the URL isn’t sanitized, for example. Neither is content in translation files, necessarily. Both those are sources of content that have nothing to do with the site being compromised. So translatable text and content pulled from the URL need to be escaped.

To prevent users accidentally breaking markup

Escaping isn’t just for security. You also need it to prevent users accidentally breaking their site’s markup. For example, if the user placing quotes or > symbols in some content in your plugin would break the markup, then you should escape that output. You don’t want to be over-aggressive in sanitising on input, because there’s perfectly valid reasons a user might want to use those characters.


“Escaping isn’t only about protecting from bad guys. It’s just making
our software durable. Against random bad input, against malicious
input, or against bad weather.”

That’s from the WordPress VIP guidelines on escaping. It has a lot more to say on this matter, and you should give it a read.

Leave a Comment