Resource Issues: Importing external data into custom post type and keeping records up to date

I have a custom post type that imports external data nightly. When the import script runs for the first time, it imports approximately 4,000 records (data will grow). Each record consists of a post title, body and 15 additional meta fields.

After the import has run at least once, all subsequent nights it updates at least one meta field from each record to indicate that the record existed in the external data set and that it is up to date.

The external data contains a modified date field which indicates when a record has changed outside of WordPress. If the import script detects that a record changed, it will update all fields.

At the very end of the import, an internal script will look at all records and find the ones that didn’t update. If a record didn’t update, a delete flag is set and the published status is changed. Eventually that record is deleted.

Lastly, each record can be overwritten in WordPress, which sets a flag that this post should not be updated during an import. This was one of the reasons why we decided to place them in a custom post type, as it provides data control before the record is displayed on the site.

I’m using wp_insert_post to insert and update records, update_post_meta to insert and update meta fields, a WP_Query to get all current and expired records, and eventually I delete the records that need to be deleted with wp_delete_post and delete_post_meta.

The problem is that this script is too resource intensive to run and eventually WordPress will timeout and not complete the script.

I figure I have two options:

  1. I break this script into batches. For example, I could process 1,000 records at a time and store an index so that it would know where to pick-up during the second iteration.
  2. I could try using different functions. For example, during the initial import, I run update_post_meta once per field (15x). I could probably find a way to make that one query via a custom query.

My question is if anyone has ever run into resource issues in WordPress when synchronizing data with an external source, and how they’ve managed to mitigate those issues. Was it via option #1, #2 or perhaps an entirely different method?

1 Answer
1

One big help I have found with importing / updating large qty’s of posts is to use InnoDB tables and transactions, in batches of 200-500 posts (with more postmeta, use smaller post batches). It greatly improves performance, given that much of it is individual MySQL statements each of which must hit the db transaction log at some point. So:

  • issue a $wpdb->query('start transaction');
  • after (batch count), issue a $wpdb->query('commit'); and another $wpdb->query('start transaction');
  • on completion, issue a $wpdb->query('commit');
  • on failure / exception, issue a $wpdb->query('rollback');

After doing that, then look at your #2 for the postmeta, building a single SQL statement for inserting (or updating) all postmeta for a post at once, because that will also make a big difference once you have transactions.

Leave a Comment