why is esc_html() returning nothing given a string containing a high-bit character?

In PHP 5.2, filter_var() sanitizes text. In WP, esc_html() sanitizes text. The former works with a high-bit character in the text string, e.g. à , but the latter doesn’t. esc_html seems to be totally eating a string containing a high-bit character. Here’s the example, written as a simple WP plugin:

<?php
/*
Plugin Name: bugz tester
*/
class bugz_tester { 
    function __construct() {
        if ( ! is_admin() )
            return;

        add_action('admin_menu', array(&$this,'admin_page'));   
    }

    function admin_page() { 
        add_options_page('Bugz tester', 'bugz', 'edit_posts', 'bugz_sheet', array(&$this,'test_page'));
    }


    function test_page() {    
        ?>
        <div class="wrap">
        <?php
        $ts="blah à blah";
        echo "original: " . $ts . "<br/>" ;
        echo  "PHP sanitized: " . $this->sanitize_txt( $ts ) . "<br/>" ;
        echo  "WP sanitized: " . esc_html( $ts ) . "<br/>";               
        die();
        ?>
        </div>
        <?php
    }

    function sanitize_txt ( $text ) {
        $san_text = filter_var($text, FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH | FILTER_FLAG_STRIP_LOW ) ;
        return $san_text;
    }   

}
new bugz_tester();
?>

Here’s the output:

original: blah � blah
PHP sanitized: blah à blah
WP sanitized:

I’m not obsessed with using esc_html(). But if I use instead filter_var(), the string vanishes when I add it to a WP custom field. Somehow WP sanitation is killing the string.

I’m mystified. Would be grateful for a clue.

2 Answers
2

Perhaps because the entity is a non-UTF8 character?

Here’s what esc_html() does:

function esc_html( $text ) {
      $safe_text = wp_check_invalid_utf8( $text );
      $safe_text = _wp_specialchars( $safe_text, ENT_QUOTES );
      return apply_filters( 'esc_html', $safe_text, $text );
}

If not that, then it’s getting sanitized when filtered by _wp_specialchars(), which does double-encoding(by default,no) and all sorts of things.

For reference:

1) esc_html() in source

2) _wp_specialchars() in source

Leave a Comment