Make Google index the entire post if it is separated into several pages

While this is a question, it is as well a serious warning to publishers that separate their posts using <!--nextpage-->.

Bare with me. I run a highly trafficked multiple author platform that rank very high on search engines. While making SEO related improvements, I noticed that only the first page of an article that is separated into several pages is being indexed by Google. How did I notice this?

Because someone grabbed the content of the unindexed pages of an article and posted it in their own site. This resulted in them ranking higher than me on that particular keyword. Actually, let me rephrase that, my site’s article is not even indexed beyond the first page.

My question is, how can I make the search engines index the whole article if it is separated into several pages? Ideally, a script that displays the full article if the visitor is a search engine (if that would not be harmful for SEO). Other suggestions are welcome.

On a side-note, if I was into Black Hat techniques, there would be a goldmine of content out there that I could exploit. Think about it.

And on a personal note, this issue should be addressed in a future update of WordPress.

Update: A similar question that detailed how WordPress creates the same canonical URL for all pages in a paginated sequence was asked here, however the answer that was posted and marked as correct does not answer the question. That solution works for paged comments only, not paginated posts.

Update 2: According to this blog post by Google, we can use rel=”next” and rel=”prev” to indicate the relation between pages. So for instance, this is what we should include in <head> on page 2:

<link rel="canonical" href="http://domain.com/article/2" />
<link rel="prev" href="http://domain.com/article/1" />
<link rel="next" href="http://domain.com/article/3" />

And there should not be a rel=”prev” or “next” if there isn’t a page after/before it.

2 s
2

The basic problem for a script solution is: rel_canonical does not offer any useful filter. So we have to replace that function:

remove_action( 'wp_head', 'rel_canonical' );
add_action( 'wp_head', 't5_canonical_subpages' );

The next problem: $GLOBALS['numpages'] is empty before setup_postdata(). We could call that function already here, but it might have side-effects.

Here is a solution that adds correct prev/next links too, and it prevents conflicting relations caused by adjacent_posts_rel_link_wp_head. We have too hook into wp_head not later than priority 9 to deactivate the latter hook.

remove_action( 'wp_head', 'rel_canonical' );
add_action(    'wp_head', 't5_canonical_subpages', 9 );

/**
 * Extend version of the native function rel_canonical()
 *
 * @wp-hook wp_head
 * @return  void
 */
function t5_canonical_subpages()
{
    if ( ! is_singular() )
        return;

    if ( ! $id = $GLOBALS['wp_the_query']->get_queried_object_id() )
        return;

    $post = get_post( $id );
    setup_postdata( $post );

    # let WordPress do all the work
    if ( empty ( $GLOBALS['page'] ) )
        return rel_canonical();

    $permalink = get_permalink( $id );
    $canonical = t5_page_permalink( $permalink, $GLOBALS['page'] );
    echo "<link rel="canonical" href="https://wordpress.stackexchange.com/questions/87514/$canonical" />";

    # next and prev links
    if ( 1 < $GLOBALS['page'] )
    {
        $prev = t5_page_permalink( $permalink, $GLOBALS['page'] - 1 );
        print "<link rel="prev" href="$prev" />";
    }

    if ( isset ( $GLOBALS['numpages'] ) && $GLOBALS['page'] < $GLOBALS['numpages'] )
    {
        $next = t5_page_permalink( $permalink, $GLOBALS['page'] + 1 );
        print "<link rel="next" href="$next" />";
    }

    # avoid conflicting pev/next links
    remove_action( 'wp_head', 'adjacent_posts_rel_link_wp_head' );
}

/**
 * Helper to get correct permalinks for sub-pages.
 *
 * @param  string $permalink
 * @param  int    $page
 * @return string
 */
function t5_page_permalink( $permalink, $page )
{
    if ( 1 == $page )
        return $permalink;

    # no pretty permalinks
    if ( '' === get_option( 'permalink_structure' ) )
        return add_query_arg( 'page', $page, $permalink );

    return $permalink . user_trailingslashit( $page, 'single_paged' );
}

Leave a Comment