LibXML2 namespace bug
April 3, 2008
Problem: we have an XML document with multiple namespaces, one of which has no prefix:
<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> <record> <controlfield tag="001">714400</controlfield> <datafield tag="245" ind1="1" ind2="0"> <subfield code="a">Crete</subfield> <subfield code="h">[electronic resource] /</subfield> <subfield code="c">by D.M. Davin.</subfield> </datafield> </record> </collection>
Answer: Use XML::LibXML::XPathContext, defining the default namespace twice:
use XML::LibXML;
use XML::LibXML::XPathContext;my $parserTitles = XML::LibXML->new;
my $structAuthors = $parserTitles->parse_file( 'NZETC_marc.exp.200706211556.xml' );
my $rootTitles = XML::LibXML::XPathContext->new($structAuthors);
$rootTitles->registerNs('xsi', 'http://www.w3.org/2001/XMLSchema-instance');
$rootTitles->registerNs('m21', 'http://www.loc.gov/MARC21/slim');
my $titleNodes = ($rootTitles->findnodes("//m21:record/m21:datafield[attribute::tag='245']"));
if ($titleNodes)
{
foreach my $titleNode ($titleNodes->get_nodelist)
{
$titleNode = XML::LibXML::XPathContext->new( $titleNode );
$titleNode->registerNs('m21', 'http://www.loc.gov/MARC21/slim');
my $titlesControlFieldNode = ($titleNode->findnodes("../m21:controlfield[attribute::tag='001']"))[0];
my $bbid = $titlesControlFieldNode->findvalue('.');
my $titlesRecordNode = ($titleNode->findnodes("ancestor::m21:record"))[0];
my $titlesTitle = $titleNode->findvalue("m21:subfield[attribute::code='a']/.");
print "$titlesTitle [$bbid]\n";
}
}
Unfortunately this nasty hack also seems to be necessary when dealing with only a single namespace, but where that namespace has no prefix.