Let’s assume that there is a plugin that displays 20 related posts (for each post) with a very complex query. And then using data from this query, it builds complex HTML layout. Also, should be noted that the plugin is public, and can be installed on any server with any configuration.
Something like:
/* complex and large query */
$related_posts = get_posts( ... );
$html_output="";
foreach($related_posts as $key => $item) {
/* complex layout rendering logic (but not as slow as the previous query) */
$html_output .= ...;
}
So my questions are:
- What’s the safest and the most correct way to cache such data?
- Should I use Transient API to cache
$related_posts
array, or $html_output
string? If I’ll cache $html_ouput
string, will it reach some max-size limit? Should I maybe gzip it, before saving?
- Should I use Transient API at all here?
Should I use Transient API at all here?
No.
In a stock WordPress install transients are stored in the wp_options table, and only cleaned up during core upgrades. Suppose you have 50,000 posts, that’s 50,000 additional rows in the options table. Obviously they’re set to autoload=no, so it’s not going to consume all your memory, but there’s another caveat.
The autoload field in the options table does not have an index, which means that the call to wp_load_alloptions()
is going to perform a full table scan. The more rows you have, the longer it will take. The more often you write to the options table, the less efficient MySQL’s internal caches are.
If the cached data is directly related to a post, you’re better off storing it in post meta. This will also save you a query every time you need to display the cached content, because post meta caches are (usually) primed during the post retrieval in WP_Query.
Your data structure for the meta value can vary, you can have a timestamp and perform your expensive query if the cached value is outdated, much like a transient would behave.
One other important think to keep in mind is that WordPress transients can be volatile in environments with persistent object caching. This means that if you store your cached data for 24 hours in a transient, there’s absolutely no guarantee it will be available in 23 hours, or 12, or even 5 minutes. The object cache backend for many installs is an in-memory key-value store such as Redis or Memcached, and if there’s not enough allocated memory to fit newer objects, older items will be evicted. This is a huge win for the meta storage approach.
Invalidation can be smarter too, i.e. why are you invalidating related posts caches in X hours? Is it because some content has changed? A new post has been added? A new tag has been assigned? Depending on your “complex and large query” you may choose to invalidate ONLY if something happened that is going to alter the results of your query.
Should I use Transient API to cache $related_posts array, or $html_output string? If I’ll cache $html_ouput string, will it reach some max-size limit? Should I maybe gzip it, before saving?
It depends a lot on the size of your string, since that’s the data that’s going to be flowing between PHP, MySQL, etc. You’ll need to try very hard to reach MySQL’s limits, but for example Memcached default per-object limit is only 1 mb.
How long does your “complex layout rendering logic” actually take? Run it through a profiler to find out. Chances are that it’s very fast will never become a bottleneck.
If that’s the case, I would suggest caching the post IDs. Not the WP_Post objects, because those will contain the full post contents, but just an array of post IDs. Then just use a WP_Query
with a post__in
which will result in a very fast MySQL query by primary key.
That said, if the data needed per item is fairly simple, perhaps title, thumbnail url and permalink, then you can store just those three, without the overhead of an extra round-trip to MySQL, and without the overhead of caching very long HTML strings.
Wow that’s a lot of words, hope that helps.