What is the correct way for plugins to create tables with special charset/collation considerations?

I’m using WordPress >=3.8. The default charset/collation values in wp-config.php are

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', ''); // Becomes utf8_general_ci

I am in a situation where utf8_general_ci would be adequate, but something more specific (utf8_danish_ci) will always be able to describe my data better. It is also the case that the data, which is fetched from an external source, will always be UTF-8.

I am developing a plugin that creates some table

CREATE TABLE plugin_table(
    some_id INT NOT NULL,
    some_data VARCHAR(100) NOT NULL,
    PRIMARY KEY  (some_id)
) ENGINE=InnoDB;

What is the correct way to handle charset and/or collation?

One solution is to use the above statement, falling back on global configuration. As mentioned, this will work in my situation, and from a developer perspective I think it is reasonable to require that either utf8 or utf8mb4 is specified.

The other solution is to explicitly set the charset/collation settings:

...) ENGINE=InnoDB [DEFAULT CHARSET=utf8] [COLLATE utf8_danish_ci];

This will allow me to configure the table in the most appropriate way for the data I need to store, and in testing I have confirmed that the letter Ă…, which is treated differently by utf8_general_ci and utf8_danish_ci, is compared correctly. This appears to work, but I don’t know if it is good or bad practice, and, in particular, I don’t know if this could cause complications elsewhere (e.g. should I be changing settings when querying the table?).

This question has several cousins but this doesn’t seem to be a duplicate. The answer that comes closest also mentions the explicit method above, with a comment that it may be more appropriate to use the default values “so the user can override them if they have good reason”.

If I were going to go out of my way to use the core configuration, then it seems silly to specify these settings in the first place.

It is important to understand that there will never be a “good reason” to use non-UTF-8. With respect to collation, the question is less clear.

The question boils down to the following: given some non-UTF-8 core configuration, which would be considered unsupported, should I take steps to make my plugin behave correctly?

1 Answer
1

I think this issue is handled by WooCommerce in their plugin. It also creates its own tables and this is how they take care of Collation part. (I am referring to file class-wc-install.php of WooCommerce). They have written following code in the create_tables function

global $wpdb;

$collate="";

if ( $wpdb->has_cap( 'collation' ) ) {
    if ( ! empty($wpdb->charset ) ) {
        $collate .= "DEFAULT CHARACTER SET $wpdb->charset";
    }
    if ( ! empty($wpdb->collate ) ) {
        $collate .= " COLLATE $wpdb->collate";
    }
}

And then this $collate variable is appended at the end of CREATE TABLE query.

So in this case, query may look like this:

CREATE TABLE plugin_table(
    some_id INT NOT NULL,
    some_data VARCHAR(100) NOT NULL,
    PRIMARY KEY  (some_id)
) $collate;

So they find out WordPress’s Collate and charset and append them.

Hope it helps 🙂

Leave a Comment