I came across a weird issue.

Say you access a random url, three or more levels deep:

http://example.com/a/b/c
http://example.com/a/b/c/d
...

Then is_404() is true. So far so good. But for some reason the last posts are queried.

$wp_query->request

is

SELECT SQL_CALC_FOUND_ROWS wp_posts.ID 
    FROM wp_posts 
    WHERE 1=1 
        AND wp_posts.post_type="post" 
        AND (
            wp_posts.post_status="publish" 
            OR wp_posts.post_status="private"
            ) 
    ORDER BY wp_posts.post_date DESC 
    LIMIT 0, 5

Which then of course makes have_posts() return true and so on. Can someone explain this?

What I found out so far:

The reason that only kicks in at three or more levels deep is that before that WP looks for posts and attachments which somehow results in some other behaviour.

It seems that even though WP recognises the request as a 404 at one point it then fetches the most recent posts. With the help from @kaiser and @G.M. I’ve tracked this down to somewhere from /wp-includes/class-wp.php:608

1
1

You may be surprised, but there is nothing strange there.

First of all let’s clarify that in WordPress when you visit a frontend URL you trigger a query. Always.

That query is just a standard WP_Query, just like the ones run via:

$query = new WP_Query( $args );

There is only one difference: the $args variables are generated by WordPress using the WP::parse_request() method. What that method does is just look at the URL, and at the rewrite rules, and convert the URL into an array of arguments.

But what happens when that method is not able to do that because the URL is non-valid? The query args is just an array like this:

array( 'error' => '404' );

(Source here and here).

So that array is passed to WP_Query.

Now try to do:

$query = new WP_Query( array( 'error' => '404' ) );
var_dump( $query->request );

Are you surprised that the query is exactly the one in OP? I’m not.

So,

  1. parse_request() builds an array with an error key
  2. That array is passed to WP_Query, that just runs it
  3. handle_404() that runs after the query, looks at the 'error' parameter and sets is_404() to true

So, have_post() and is_404() are not related. The problem is that WP_Query has no system to short-circuit the query when something goes wrong, so once the object is built, pass some args to it and the query will run…

Edit:

There are 2 ways to overcome this problem:

  • Create a 404.php template; WordPress will load that on 404 URLs and there you don’t have to check for have_posts()
  • Force $wp_query to be empty on 404, something like:

    add_action( 'wp', function() {
        global $wp_query;
        if ( $wp_query->is_404() ) {
            $wp_query->init();
            $wp_query->is_404 = true; // init() reset 404 too
        }
    } );
    

Leave a Reply

Your email address will not be published. Required fields are marked *