Sunday, March 25, 2012

Are you loosing your angle brackets? - php and libxml2

problem: php processes some xml and your angle brackets in said xml vanish!

TLDR: use the CDATA tag to wrap your character data to avoid having angle brackets vanish.

I encountered this bug for the first time when I was importing a Cacti xml graph template via the Cacti web UI. On the surface everything seems to go well with the import process but then nothing was graphing and looking deeper, it was clear the config for the graph(s) was broken, due to missing < > angle brackets.
The brackets we're correctly encoded in the xml, it seems that somewhere between php and certain version of libxml the encoded angle brackets get stripped out.

Online there are a few bug reports but no single central bug id that I could find on this. One of the more useful shares online was a bug detail report for a closed google code project which provides Cacti mysql templates. Here is the bug detail, very useful info from Elan there.

During my search for solutions, it seemed likely that a bug was regressed or introduced in libxml, but that isn't certain. It would seem that the latest stable php 4.2 on Debian squeeze and libxml2 (as of writing 2.7.8.dfsg-2+squeeze3) still has the bug.

There is also some useful info on a bug report for MediaWiki project, entitled: Import strips angle brackets on some installations (libxml2 entity bug). To summarise, the consensus seems to be that its an upstream but with libxml2. The evidence I have found would agree with this.

Currently my systems are pinned on PHP 4.2 packages, perhaps this bug is not a factor in non pinned Debian squeeze systems?

impact: wastes time - fixing things that shouldn't really be broken.

solution: use the XML CDATA tag

I can provide is a way of easily checking if your set up has the bug or not. Props to Elan for this.
$ php -r '$p = xml_parser_create(); xml_parse_into_struct($p, "<path_php_binary>", $vals, $index); print_r($vals);'
A system suffering from the bug will include the output:
[value] => path_php_binary
A system NOT suffering from the bug will output:
[value] => <path_php_binary>
Now add the CDATA tag and see if the bug goes away? You don't even need to use entities when using the CDATA tag.
$ php -r '$p = xml_parser_create(); xml_parse_into_struct($p, "<![CDATA[<path_php_binary>]]>", $vals, $index); print_r($vals);'

citation:

Props to:
Elan Ruusamae for their bug detail report on the mysql cacti templates project.

No comments: