I’m trying to inject some data into blocks via PHP but am running into trouble with parse_blocks/serialize_blocks breaking my content
I’m using the default 2020 theme and have no plugins installed
add_action('wp', function() {
$oPost = get_post(119);
printf("<h1>Post Content</h1><p>%s</p>", var_dump($oPost->post_content));
$aBlocks = parse_blocks($oPost->post_content);
printf("<h1>Parsed Blocks</h1><pre>%s</pre>", print_r($aBlocks, true));
$sSerialisedBlocks = serialize_blocks($aBlocks);
printf("<h1>Serialised Blocks</h1><p>%s</p>", var_dump($sSerialisedBlocks));
}, PHP_INT_MAX);
The first print (just outputting the post content) contains this text…
<h3>What types of accommodation are available in xxxx?<\/h3>
The second (after parsing into blocks) contains this…
<h3>What types of accommodation are available in xxxx?</h3>
But after re-serialising the blocks I get this…
\u003ch3\u003eWhat types of accommodation are available in xxxx?\u003c\/h3\u003e
Could someone tell me what I’m doing wrong?
EDIT
Ok so I followed the source code for serialize_blocks and it does seem like this is intentional with serialize_block_attributes explicitly converting some characters
My question is why then are these characters showing up in the WYSIWYG instead of being correctly converted back?
1 Answer
This happens in serialize_block_attributes
, the docblock explains why:
/**
...
* The serialized result is a JSON-encoded string, with unicode escape sequence
* substitution for characters which might otherwise interfere with embedding
* the result in an HTML comment.
...
*/
So this is done as an encoding measure to avoid attributes accidentally closing a HTML comment and breaking the format of the document.
Without this, a HTML comment inside a block attribute would break the block and the rest of the content afterwards.
But How Do I Stop The Mangling?!!!
No, it isn’t mangled. It’s just encoding certain characters by replacing them with unicode escaped versions to prevent breakage.
Proof 1
Lets take the original code block from the question, and add the following fixes:
- Wrap all in
<pre>
tags - Use
esc_html
so we can see the tags properly - Fix the
printf
by removingvar_dump
and usingvar_export
with the second parameter so it returns rather than outputs - Add a final test case where we re-parse and re-serialize 10 times to compare the final result with the original
function reparse_reserialize( string $content, int $loops = 10 ) : string {
$final_content = $content;
for ($x = 0; $x <= $loops; $x++) {
$blocks = parse_blocks( $final_content );
$final_content = serialize_blocks( $blocks );
}
return $final_content;
}
add_action(
'wp',
function() {
$p = get_post( 1 );
echo '<p>Original content:</p>';
echo '<pre>' . esc_html( var_export( $p->post_content, true ) ) . '</pre>';
$final = reparse_reserialize( $p->post_content );
echo '<p>10 parse and serialize loops later:</p>';
echo '<pre>' . esc_html( var_export( $final, true ) ) . '</pre>';
echo '<hr/>';
},
PHP_INT_MAX
);
Running that, we see that the content survived the process of being parsed and re-serialized 10 times. If mangling was occuring we would see progressively greater mangling occur
Proof 2
If we take the mangled markup:
\u003ch3\u003eWhat types of accommodation are available in xxxx?\u003c\/h3\u003e
Turn it into a JSON string, then decode it:
$json = '"\u003ch3\u003eWhat types of accommodation are available in xxxx?\u003c\/h3\u003e"';
echo '<pre>' . esc_html( json_decode( $json ) ) . '</pre>';
We get the original HTML:
<h3>What types of accommodation are available in xxxx?</h3>
So no mangling has taken place.
Summary
There is no mangling or corruption. It’s just encoding the <
and >
to prevent breakage. JSON processors handle the unicode escape characters just fine.
If you are seeing these encoded characters in the block editor, then that is a bug, either in the block, or the ACF plugin. You should report it as such