I’m trying to write a oneboxing routine that gives WordPress blog entries special treatment. So given a simple, unadorned URL in content, such as
http://blog.stackoverflow.com/2011/03/a-new-name-for-stack-overflow-with-surprise-ending/
How would I detect that this is a WordPress installation, ideally without doing a full HTTP GET on every URL I see?
There are certainly common conventions for WordPress URLs that we could start with, which eliminates at least some URLs from contention. In this case it is …
http://example.com/year/month/slug-goes-here
But that isn’t a universal constant either.
I tried looking at the headers of that URL using HTTP HEAD, and I see:
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:18340
Content-Type:text/html; charset=UTF-8
Date:Thu, 07 Jun 2012 07:07:38 GMT
Keep-Alive:timeout=15, max=100
Server:Apache/2.2.9 (Ubuntu) DAV/2 PHP/5.2.6-2ubuntu4.2 with Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8g
Vary:Cookie,Accept-Encoding
WP-Super-Cache:Served legacy cache file
X-Pingback:http://blog.stackoverflow.com/xmlrpc.php
X-Powered-By:PHP/5.2.6-2ubuntu4.2
I don’t think relying on the presence of WP-Super-Cache
would be particularly reliable, and that’s the only thing I see in the headers that would help, so maybe there are zero common HTTP headers in a WordPress install?