LibXML2 namespace bug

April 3, 2008

Problem: we have an XML document with multiple namespaces, one of which has no prefix:

<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">

 <record>

 	<controlfield tag="001">714400</controlfield>		<datafield tag="245" ind1="1" ind2="0">

 		<subfield code="a">Crete</subfield>

 		<subfield code="h">[electronic resource] /</subfield>

 		<subfield code="c">by D.M. Davin.</subfield>

 	</datafield>

 </record>

</collection>

Answer: Use XML::LibXML::XPathContext, defining the default namespace twice:

use XML::LibXML;

use XML::LibXML::XPathContext;my $parserTitles = XML::LibXML->new;

my $structAuthors = $parserTitles->parse_file( 'NZETC_marc.exp.200706211556.xml' );

my $rootTitles = XML::LibXML::XPathContext->new($structAuthors);

$rootTitles->registerNs('xsi', 'http://www.w3.org/2001/XMLSchema-instance');

$rootTitles->registerNs('m21', 'http://www.loc.gov/MARC21/slim');

my $titleNodes = ($rootTitles->findnodes("//m21:record/m21:datafield[attribute::tag='245']"));

if ($titleNodes)

{

 foreach my $titleNode ($titleNodes->get_nodelist)

 {

 	$titleNode = XML::LibXML::XPathContext->new( $titleNode );

 	$titleNode->registerNs('m21', 'http://www.loc.gov/MARC21/slim');

 	my $titlesControlFieldNode = ($titleNode->findnodes("../m21:controlfield[attribute::tag='001']"))[0];

 	my $bbid = $titlesControlFieldNode->findvalue('.');

 	my $titlesRecordNode = ($titleNode->findnodes("ancestor::m21:record"))[0];

 	my $titlesTitle = $titleNode->findvalue("m21:subfield[attribute::code='a']/.");

print "$titlesTitle [$bbid]\n";

 }

}

Unfortunately this nasty hack also seems to be necessary when dealing with only a single namespace, but where that namespace has no prefix.