I have post name in Thai which uses UTF-8
character. Many of them encode into super long in ASCII e.g. วิธีการหลีกเลี่ยงข้อผิดพลาดทั้ง-8-ในชีวิตการแต่งงาน
I’ve changed the type of “post_name” using phpMyAdmin to VARCHAR(1000)
and collation to utf8_unicode_ci
.
However, in my WordPress backend editor, the above post name is still automatically cut to วิธีการหลีกเลี่ยงข้อผิ when I try to save the URL.
There is this plugin that lift the character limit but it’s in Thai which I can’t read.
Any ideas?
It happens because when you save a post, WordPress calls sanitize_title
function to sanitize your title. This function applies sanitize_title
filter.
One of core hooks for sanitize_title
filter is sanitize_title_with_dashes
function, which checks title on utf8 format by calling seems_utf8
function and if the title has utf8 format, the function call utf8_uri_encode
function.
utf8_uri_encode
function receives two arguments: $utf8_string
and $length
. The first one is your title and the second argument is the length, which the title shouldn’t overflow.
sanitize_title_with_dashes
function passes your title with limit of 200 characters long. So if you want to change the limit you have to change standard hook for sanitize_title
filter. Here we come to a bit dirty solution, but however it should help you:
// first of all lets remove standard hook
remove_filter( 'sanitize_title', 'sanitize_title_with_dashes' );
// add our custom hook
add_filter( 'sanitize_title', 'wpse8170_sanitize_title_with_dashes', 10, 3 );
function wpse8170_sanitize_title_with_dashes( $title, $raw_title="", $context="display" ) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 1000); // <--- here is the trick!
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = str_replace('.', '-', $title);
if ( 'save' == $context ) {
// Convert nbsp, ndash and mdash to hyphens
$title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );
// Strip these characters entirely
$title = str_replace( array(
// iexcl and iquest
'%c2%a1', '%c2%bf',
// angle quotes
'%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
// curly quotes
'%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
'%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
// copy, reg, deg, hellip and trade
'%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
// grave accent, acute accent, macron, caron
'%cc%80', '%cc%81', '%cc%84', '%cc%8c',
), '', $title );
// Convert times to x
$title = str_replace( '%c3%97', 'x', $title );
}
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
As you can see we use completely the same sanitize_title_with_dashes
function with one change: instead of passing 200
, we pass 1000
as the limit for title.