How to to stop html editor from addig tags to shortcodes, images, etc

I know this this is covered as individual topics but What I am looking for is a single function that stops <p> tags from being wrapped around via the WordPress editor to shortcodes and images and also removes empty <p> tages, for example:

  • remove <p> tags that get wrapped around shortcodes, i.e., <p>[shortcode]</p>
  • remove empty <p></p> tags
  • remove <p> tags with whitespace, i.e., <p> </p> or <p> </p> etc.
  • remove <p> tags that get wrapped around <img> tags, i.e., <p><img blabla /></p>

I have been using a few different functions that take $content and strips out the <p> tags via preg_replace() but I can not figure out how to get all four requirements to work in a single function.

Its seems like too much overhead to run $content through four different functions before $content is displayed.

Also can you recommend a good source to learn how to use all the agreements for preg_replace()?

3 Answers
3

1. Filtering the content

Here’s a one-function content filter to meet the above four requirements:

add_filter( 'the_content', 'strip_some_paragraphs', 20 );
function strip_some_paragraphs( $content ) {

    $content = preg_replace(
        '/<p>(([\s]*)|[\s]*(<img[^>]*>|\[[^\]]*\])[\s]*)<\/p>/',
        '$3',
        $content
    );

    return $content;
}

2. Resources for Regular Expressions

  • regular-expressions.info – Manual, Basic & Advanced Tutorials
  • regexpal.com – on the fly regex tester, very helpful
  • as for php’s regex functions, the manual remains the best resource

3. The RegEx in 1. explained

The actual regex at hand is <p>(([\s]*)|[\s]*(<img[^>]*>|\[[^\]]*\])[\s]*)<\/p>' denotes the beginning and end of a string, as usual, / is the pattern delimiter.

You’ve mentioned 4 cases in which you want to remove <p> tags. So first off, our pattern must start with one such tag <p> and end with its closing companion </p>. That goes for all four cases. Inside, we want to allow for four different options to be valid matches. We group those options in brackets and use the pipe | character to separate them. | matches either side of it and can be strung together. You can think of it as “OR”.

Now for the options:

Let’s begin with the whitespaces. \s denotes the whitespace character class (spaces, tabs, and line breaks). We use the star quantifier [\s]* to match zero or more of the preceding character class.
So now we match all empty paragraph tags. And by chance decreased the cases to match to 3 – zero or more takes care of both <p></p> and <p> </p>. Nice.

As for the other two, we will wrap both in further [\s]*, so that not only <p>[shortcode]</p>, but also for instance

<p>
    [shortcode] </p>

is matched.
What we have left to do now is come up with patterns to match shortcodes and img tags. Here we make use of character class negation. The caret ^ at the beginning of a character class negates it. Hence, [^>] matches any character that is not >.
We start the pattern for the images with an opening tag <img and for the shortcode with a square bracket \[. The latter must be escaped with a slash, since it is a regex special character.
Now we use the above mentioned negated character class with the star quantifier. [^>]* for the img and [^\]]* for the shortcode, matching anything but the respective closing character. Then we match that very closing character once and are done.

So we get <img[^>]* for the images and \[[^\]]*\] for the shortcode.
We wrap those in possible multiple whitespaces: [\s]*<img[^>]*>[\s]* and [\s]*\[[^\]]*\][\s]*
Grouping those two and adding only whitespaces as the first option yields the inside of the brackets and that we finally wrap in the paragraph tags.

For the replacement we use the backreference $3 which takes care of the actual image and shortcode tags not disappearing. In order for the whitespaces not to remain, we’ve made two subgroups of the possible options. Only img and shortcode are targeted by the back reference.

4. Sidenote

This question is borderline of the scope of WPSE – as it’s mostly on PHP & Regular Expressions. It might have better been asked on StackOverflow.
Anyhow, now it’s answered.

Leave a Comment