LibXML2 namespace bug

April 3, 2008

Problem: we have an XML document with multiple namespaces, one of which has no prefix:

<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">

 <record>

 	<controlfield tag="001">714400</controlfield>		<datafield tag="245" ind1="1" ind2="0">

 		<subfield code="a">Crete</subfield>

 		<subfield code="h">[electronic resource] /</subfield>

 		<subfield code="c">by D.M. Davin.</subfield>

 	</datafield>

 </record>

</collection>

Answer: Use XML::LibXML::XPathContext, defining the default namespace twice:

use XML::LibXML;

use XML::LibXML::XPathContext;my $parserTitles = XML::LibXML->new;

my $structAuthors = $parserTitles->parse_file( 'NZETC_marc.exp.200706211556.xml' );

my $rootTitles = XML::LibXML::XPathContext->new($structAuthors);

$rootTitles->registerNs('xsi', 'http://www.w3.org/2001/XMLSchema-instance');

$rootTitles->registerNs('m21', 'http://www.loc.gov/MARC21/slim');

my $titleNodes = ($rootTitles->findnodes("//m21:record/m21:datafield[attribute::tag='245']"));

if ($titleNodes)

{

 foreach my $titleNode ($titleNodes->get_nodelist)

 {

 	$titleNode = XML::LibXML::XPathContext->new( $titleNode );

 	$titleNode->registerNs('m21', 'http://www.loc.gov/MARC21/slim');

 	my $titlesControlFieldNode = ($titleNode->findnodes("../m21:controlfield[attribute::tag='001']"))[0];

 	my $bbid = $titlesControlFieldNode->findvalue('.');

 	my $titlesRecordNode = ($titleNode->findnodes("ancestor::m21:record"))[0];

 	my $titlesTitle = $titleNode->findvalue("m21:subfield[attribute::code='a']/.");

print "$titlesTitle [$bbid]\n";

 }

}

Unfortunately this nasty hack also seems to be necessary when dealing with only a single namespace, but where that namespace has no prefix.

Advertisements

This, I think, is going to be the title of my essay for my ENGL444 essay. It turns out that New Zealand, although not having an effusion of authors of school stories, does have at least one and possibly two worthy of study:

  • Phillis Garrard
    • Hilda at School: A New Zealand Story (1929)
    • The Doings of Hilda (1932)
    • Hilda’s Adventures (1938)
    • Hilda Fifteen (1944)
  • Clare Mallory (Winifred Hall, nee McQuilkan) 1913-1991
    • Merry Begins (1947)
    • Merry Again (1947)
    • Merry Marches On (1947)
    • Leith and Friends (1950)
    • The Pen and Pencil Girls (1949)
    • Juliet Overseas (1948)
    • The New House at Winwood (1949)
    • Tony Against the Prefects (1949)
    • The Two Linties (1950)
    • The League of the Smallest (1951)

As far as boys’ school stories, there appears to only be one author that fits into the genre, with C. R. Allen’s A Poor Scholar: A Tale of Progress (1936)

Clare Mallory in particular sounds interesting:

  • Her books were regarded as being more well written than most of the genre, and if at times they imitated (Juliet Overseas is inspired by Brenda Page’s Schoolgirl Rivals, while Leith and Friends is modelled on Josephine Elder’s Evelyn Finds Herself), they surpassed their primogenitors. Interestingly, these antecedents, both being published in 1927, would have most likely been read by Winnifred when a 14-year-old schoolgirl.

There’s also the few school stories published amongst the Whitcomb and Tombs Story Books output:

  • Hilda Bridges
    • Bobby’s First Term: A School Boy’s Story (1925)
    • Connie of the Fourth Form (1930)
  • Lillian Maxwell Pyke
    • Squirmy& Bubbles: A School Story for Girls (1924)
  • Josephine Howe
    • TheSchool in Cigam Square (1946)

Links:

Notes on setting up Cocoon to use XSLT 2.0 under Tomcat are already on the web, but in the spirit of repitition:

  • For support of XSLT 2.0, you need to add and configure the Saxon 9 libraries. Get the Saxon-B download from here. Then extract all the .jar libraries into cocoon/WEB-INF/lib.
  • Now we need to configure Cocoon so that Saxon can be called. First, open cocoon/WEB-INF/cocoon.xconf, and find the bit that refers to Saxon XSLT, which is commented out by default. Uncomment the code and change it according to the instructions in the file, so that it enables Saxon 9:
          	<component logger="core.xslt"
                       role="org.apache.excalibur.xml.xslt.XSLTProcessor/saxon"
                       class="org.apache.cocoon.components.xslt.TraxProcessor">
                <parameter name="use-store" value="true"/>
                <parameter name="transformer-factory" value="net.sf.saxon.TransformerFactoryImpl"/>
              </component>
  • Now we need to edit cocoon/sitemap.xmap to enable the Saxon transformer. In the <map:transformers> section, add this below the other XSLT transformers:
          <map:transformer name="saxon" pool-grow="2" pool-max="32" pool-min="8"
          		     src="org.apache.cocoon.transformation.TraxTransformer">
          	    <use-request-parameters>false</use-request-parameters>
          	    <use-browser-capabilities-db>false</use-browser-capabilities-db>
          	    <xslt-processor-role>saxon</xslt-processor-role>
              </map:transformer>
  • Add a suitable match to the pipeline:
       <map:match pattern="*.xml">
         <map:generate src="text/{1}.txt" type="text"/>
         <map:transform type="saxon" src="xsl/tokenise-string-to-xml.xsl"/>
         <map:serialize type="xml"/>
       </map:match>

I’ve long been a user of UltraEdit for writing scripts where there’s no particularly good language-specific IDE (and by good I mean responsive and with features that don’t get in your way, rather than overloaded with features but slow as a result).

Writing Perl scripts in UltraEdit is made easier with syntax highlighting and the ability to hook the command line into UltraEdit so that I can quickly perform syntax-checking.

As I’m in the process of learning Ruby, I thought I’d set UltraEdit up in a similar fashion, so that I can enjoy the same features while writing Ruby scripts. It’s easy as:

  1. Setting the tool configuration to do the syntax checking:
    UltraEdit Ruby Configuration
  2. Add the following contents into the UltraEdit C:\Program Files\UltraEdit\WORDFILE.TXT:

/L10"Ruby" Line Comment Num = 2# Block Comment On = =begin Block Comment Off = =end String Chars='" Escape Char = \ File Extensions = rb rbw
/Indent Strings = "do" "begin" "{" "|"
/Unindent Strings = "}" "end"
/Delimiters = ~^[]{}()<>.,+ *|/' "
/Function String = "%[ ^t]++^(module[ ^t]+[a-z0-9_.]+^)[ ^p^r^n]"
/Function String 1 = "%[ ^t]++^(class[ ^t]+[a-z0-9_.]+^)[ ^p^r^n]"
/Function String 2 = "%[ ^t]++^(def[ ^t]+[a-z0-9_.]+^)[ ^p^r^n(]"
/C1"Ruby Keywords"
(
)
#
#{
{
}
__FILE__ __LINE__
alias and
begin break
case class
def defined? do
else elsif end ensure
false for
if in
module
next nil not
or
quit
redo rescue retry return
self super
then true
undef unless until
when while
yield
BEGIN
END
/C2"Ruby Classes/Exceptions"
`
ArgumentError Array
Bignum Binding
Class Complex ConditionVariable Continuation
DelegateClass Dir
English EOFError Errno::ENOENT Errno::EPERM Exception
FalseClass Fatal File File::Stat Fixnum Float FloatDomainError
GetoptLong
Hash
IndexError Integer Interrupt IO IOError
LoadError LocalJumpError
MatchData Method Module Mutex
NameError NilClass NoMemoryError NotImplementedError Numeric
Object
Proc Pstore
Range RangeError Regexp RegexpError RuntimeError
ScriptError SecurityError SimpleDelegator Singleton StandardError String
Struct Struct::Tms Symbol
SyntaxError SystemCallError SystemExit SystemStackError
Tempfile Thread ThreadGroup Time TrueClass TypeError
WeakRef
ZeroDivisionError
/C3"Ruby Libraries/Modules"
mkmf
win32api win32ole
BasicSocket Benchmark
CGI Comparable Config CONFIG
DATA Date
Enumerable Errno
FALSE FileTest Find FTP
GC
HTTP HTTPResponse
IPSocket
Kernel
Marshal
Math
NET Net::FTP Net::HTTP Net::HTTPResponse Net::POP Net::APOP Net::POPMail
Net::SMTP Net::Telnet NIL
ObjectSpace
Observable
ParseDate POP POPMail Process
Session SMTP Socket SOCKSSocket Stat STDERR STDIN STDOUT
TCPServer TCPSocket Telnet Tms TOPLEVEL_BINDING TRUE
UDPSocket UNIXServer UNIXSocket
Win32API WIN32OLE WIN32OLE_EVENT
/C4"Ruby Constants/Strings"
"
cutime cstime
domain
expires
secure stime
AF_APPLETALK AF_AX25 AF_INET6 AF_INET AF_IPX AF_UNIX AF_UNSPEC
AI_ALL AI_CANONNAME AI_MASK AI_NUMERICHOST AI_PASSIVE AI_V4MAPPED_CFG
ARGF ARGV
Complex::I
Default
E EXTENDED
EAI_ADDRFAMILY EAI_AGAIN EAI_BADFLAGS EAI_BADHINTS EAI_FAIL EAI_FAMILY
EAI_MAX EAI_MEMORY
EAI_NODATA EAI_NONAME EAI_PROTOCOL EAI_SERVICE EAI_SOCKTYPE EAI_SYSTEM
FTP_PORT
IGNORECASE
IP_ADD_MEMBERSHIP IP_DEFAULT_MULTICAST_LOOP IP_DEFAULT_MULTICAST_TTL
IP_MAX_MEMBERSHIPS IP_MULTICAST_IF IP_MULTICAST_LOOP IP_MULTICAST_TTL
LOOKUP_INET6 LOOKUP_INET LOOKUP_UNSPEC
MSG_DONTROUTE MSG_OOB MSG_PEEK
MULTILINE
PF_APPLETALK PF_AX25 PF_INET6 PF_INET PF_IPX PF_UNIX PF_UNSPEC
PI PLATFORM PRIO_PGRP PRIO_PROCESS PRIO_USER
RUBY_PLATFORM RUBY_RELEASE_DATE RUBY_VERSION
SOCK_DGRAM SOCK_PACKET SOCK_RAW SOCK_RDM SOCK_SEQPACKET SOCK_STREAM
SOL_ATALK SOL_AX25 SOL_IPX SOL_IP SOL_SOCKET SOL_TCP SOL_UDP
SOPRI_BACKGROUND SOPRI_INTERACTIVE SOPRI_NORMAL
SO_BROADCAST SO_DEBUG SO_DONTROUTE SO_ERROR SO_KEEPALIVE SO_LINGER
SO_NO_CHECK SO_OOBINLINE SO_PRIORITY SO_RCVBUF SO_REUSEADDR SO_SNDBUF
SO_TYPE
TCP_MAXSEG TCP_NODELAY
WIN32OLE::VERSION WNOHANG WUNTRACED
/C5"Ruby Methods"
. .. ...
! !=
~
% %= %q %w %Q %W
@
& && &=
* ** *= **=
+ += +@
- -= -@
> >= >> >>=
< <= <<= <> << <=>
= == === => =~
[ []=
]
| || |= ||=
^ ^=
::
// /= /
_id2ref __id__ __send__
abort abort_on_exception abort_on_exception! abs abs2 add alias_method
alive? ancestors
append_features arg arity asctime assoc at atan2 atime
attr attr_accessor attr_reader attr_writer at_exit autoload
backtrace basename between? binding binmode blksize blockdev? block_given?
blocks
broadcast
call callcc caller capitalize capitalize! casefold? catch ceil center chomp
chomp! chop chop!
chardev? chr chdir chmod chown chroot class_eval class_variables clear clone
close closed? close_read close_write cmp coerce collect collect!
compact compact! compare compile concat conjugate
const_defined? const_get const_set constants copy cp cos count
create_makefile critical critical= crypt ctime current
day default default= define_finalizer delete delete! delete_at delete_if
detect dev
directory? dirname dir_config disable display divmod downcase downcase!
downto dump dup
each each_byte each_index each_key each_line each_object each_pair
each_with_index
egid egid= empty? enable england entries
eof eof? eql? equal? error? error_message escape euid euid= eval exception
exclude_end?
exec executable? executable_real?
exist? exist2? existw? exit exit! exp expand_path extend extend_object
fail fcntl fetch file? fileno fill find find_all find_library finite? first
flatten flatten!
flock flush foreach fork format freeze frexp frozen? ftype
garbage_collect get getc getogrp getpriority gets getwd get_option
gid gid= glob global_variables
gm gmt? gmtime gregorian gregorian_leap? grep grpowned? gsub gsub!
has_key? has_value? hash have_func have_header have_library hex hour
id id2name image include include? included_modules index indexes indices
inherited initialize integer? iterator?
intern ino inspect install
instance_eval instance_methods instance_of? instance_variables ioctl is_a?
isatty isdst italy
jd join julian julian_leap?
kcode key? keys kill kind_of?
lambda last last_match ldexp leap? length lineno lineno= link list ljust
load local local_variables
localtime lock locked? log log10 loop lstat
main makedirs makepath map map! match max mday member? members message
method
method_added method_defined? method_missing methods min mjd
mkdir mktime mode module_eval module_function modulo mon month move mv mtime
name nan? nesting new new1 new2 new3 newsg neww next! nil? nitems nlink
nonzero? now ns?
oct open ordering ordering= os? owned?
p pack parsedate pass path pid pipe pipe? polar pop popen pos pos= ppid
print printf priority priority=
private private_methods proc protected protected_methods prune
public_methods
private_class_method private_instance_methods protected_instance_methods
public_class_method public_instance_methods
public push putc puts pwd
quiet quiet= quiet? quote
raise rand rassoc reject! read readable? readable_real?
readchar readline readlines readlink real rehash reject reject!
remainder remove_const remove_method rename reopen replace require
respond_to? restore reverse reverse! reverse_each rewind rdev rindex rjust
rmdir
rm_f round run
safe_level safe_unlink scan sec seek select send setgid? setpgid setpgrp
setpriority
setsid setuid? set_backtrace set_options set_trace_func sg
shift signal sin singleton_methods singleton_method_added size size? sleep
slice slice! socket? sort sort! source split sprintf squeeze squeeze! sqrt
srand start
stat status step sticky? stop stop? store strftime strip strip! sub sub!
succ succ! superclass
symlink symlink? sync synchronize sync= syscall syscopy sysread system
syswrite swapcase swapcase!
taint tainted? tell terminate test throw timeout times today to_a to_ary
to_f to_i
to_io to_proc to_r to_s to_str tr tr! trace_var trap try_lock tr_s tr_s!
truncate tty? tv_sec tv_usec type
uid uid= umask undef_method ungetc uniq uniq! unlink unlock unpack unshift
untaint untrace_var
upcase upcase! update upto usec utc utc? utime
value? values var
wait wait2 waitpid waitpid2 wakeup wday weakref_alive? write writable?
writable_real?
yday year
zero? zone
Comparisons
/C6"Ruby Library Methods"
a accept addr add_observer all
base bind binmode= blockquote
caption changed changed? checkbox checkbox_group cmd code const_load
content_type cookies connect count_observers
debug_mode debug_mode= delete_observer delete_observers
dir do_not_reverse_lookup do_not_reverse_lookup=
escapeElement escapeHTML
file_field form for_fd
getaddress getaddrinfo getbinaryfile gethostbyaddr gethostbyname gethostname
getnameinfo getservbyname getpeername getsockname getsockopt gettextfile
head header hidden html
img img_button invoke
lastresp listen local_path login lookup_order lookup_order= ls
mails message_loop multipart_form
notify_observers
on_event original_filename out
pair params parse passive passive= password_field peeraddr popup_menu port
post pretty
putbinaryfile puttextfile
radio_button radio_groupready recv recvfrom reset resume resume=
retrbinary retrlinew return_code rfc1123_date
scrolling_list sendmail setsockopt shutdown socketpair storbinary storlines
submit
telnetmode telnetmode= text_field textarea top
uidl unescape unescapeElement unescapeHTML
waitfor welcome
Call
/C7"Block-to-Proc Variable/Instance Variable/Class Variable"
** &a &b &c &d &e &f &g &h &i &j &k &l &m &n &o &p &q &r &s &t &u &v &w &x &y &z
** @a @b @c @d @e @f @g @h @i @j @k @l @m @n @o @p @q @r @s @t @u @v @w @x
@y @z @@a @@b @@c @@d @@e @@f @@g @@h @@i @@j @@k @@l @@m @@n @@o @@p @@q @@r
@@s @@t @@u @@v @@w @@x @@y @@z
/C8"Constant/Global/Symbol"
** A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
** $
** :

Yesterday I had a discussion about tag clouds, as a result of a heads-up about an NZETC example tag cloud generated by the Many Eyes site, and realised that this could be very useful for allowing the visualisation of names mentioned in each of the texts on the NZETC site, and is also pretty easy to implement.

After finding a post about an Javascript algorithm for tag clouds, I rolled a simple example to show how easy it actually is:

<html>

<head>
<script>
function processCloud(id,max) {
	var cloud = document.getElementById(id);
	if(!cloud) return;

	var tags = cloud.getElementsByTagName("a");
	for(var i=0;i<tags.length;i++) {
		var tag = tags[i];
		var title = tag.getAttribute("title");
		var f = title.substring(title.indexOf(":")+1);
		var fontSize = (150.0*(1.0+(1.5*f-max/2)/max))+"%";
		tag.style.fontSize = fontSize;
	}
}
</script>
</head>

<body onload="processCloud('cloud', 100);">

	<div id='cloud'>
		<a href='#' title='Joseph Banks:50'>Joseph Banks</a> 
		<a href='#' title='William Bligh:2'>William Bligh</a> 
		<a href='#' title='James Cook:80'>James Cook</a> 
		<a href='#' title='David Samwell:12'>David Samwell</a> 
		<a href='#' title='Tobia Furneaux:20'>Tobia Furneaux</a> 
		<a href='#' title='Omai:3'>Omai</a> 
	</div>

</body>

</html>

NZETC: Trackbacks now added

October 17, 2007

Just been adding Haloscan trackbacks to the NZETC blog.

So, as a test, I think I’ll link to the post about using iPod Touch devices to view eBooks.

Which reminds me, that I need to add a post to the NZETC blog about FreeBase (great technology, bad name). I must say, it feels a little funny to be creating circular blog references.

wxJavascript: a great find

October 5, 2007

I’ve just been installing and trying out wxJavascript, which is a very useful server-side javascript library that, for me, seems to solve the problem of running javascript server-side for the online word processor project (current code-name: “Remote Writer”).

Amongst other things, it has an Apache module (mod_wxjs)and support for sqlite, both quite important for this project.

The configuration wasn’t too bad, and consisted of following the quick start instructions on the website, as well as:

  1.  Experiencing a problem with adding the wxWidget modules in the Apache httpd.conf file, so what I ended up with was:
    # mod_wxjs server-side javascript stuff (see: http://www.wxjavascript.net/mod_wxjs/index.html)
    LoadFile d:/wxjs/bin/libapreq2.dll
    LoadModule apreq_module "d:/wxjs/bin/mod_apreq2.so"
    #LoadFile "d:/wxjs/bin/wxmsw28ud_core_vc_custom.dll"
    #LoadFile "d:/wxjs/bin/wxbase28ud_net_vc_custom.dll"
    #LoadFile "d:/wxjs/bin/wxmsw28ud_adv_vc_custom.dll"
    LoadModule wxjs_module "d:/wxjs/bin/mod_wxjs.dll"
    AddHandler wxjs .wxjs
    wxJS_Modules "d:/wxjs/bin/modules.xml"
    wxJS_RtSize 1M
    wxJS_CtxSize 8K

    Which is not too bad I think, as for the work I want to do, hopefully I won’t need the wxWidgets.

  2.  I had to play around with the modules.xml file, reordering the entry for wxJS_gui.dll to the bottom of the list, as otherwise, when Apache tried to load these modules, it produced an error message
    Error: Failed to load shared library 'd:\\wxjs\\modules\\wxJS_gui.dll'
    (error 126: the specified module could not be found.)
    Error: wxJS: Module gui(d:\\wxjs\\modules\\wxJS_gui.dll )not loaded

    It then apparently refused to load the following modules, which I would detect in the log after trying to run a script using the sqlite module:

    Error: wxJS: D:\\Program Files\\...\\maori-bibliography\\dbtest.wxjs(3) :
    ReferenceError: sqlite is not defined

I then wanted to run the following test script to check the Sqlite connection:

var exists = false;
var created = false;
var handle = new sqlite.Database("htdocs\\gears\\editor\\tinymce\\maori-bibliography\\store.db");
if ( handle.opened )
{
    var pragmaStmt = handle.prepare("PRAGMA user_version");
    var pragma = pragmaStmt.fetchArray();
    if ( pragma[0] == 0 )
    {
        // Create the tables and insert some example data
        handle.exec("CREATE TABLE authors(id INTEGER PRIMARY KEY, firstname TEXT, lastname INTEGER)");
        handle.exec("CREATE TABLE books(id INTEGER PRIMARY KEY, title TEXT, fk_author_id INTEGER)");
        handle.exec("PRAGMA user_version = 1");
        handle.exec("INSERT INTO authors(id, firstname, lastname) VALUES(1, 'JRR', 'Tolkien')");
        handle.exec("INSERT INTO authors(id, firstname, lastname) VALUES(2, 'John', 'Grisham')");
        handle.exec("INSERT INTO books(id, title, author) VALUES(1, 'Lords of the ring', 1)");
        handle.exec("INSERT INTO books(id, title, author) VALUES(2, 'The firm', 2)");
        created = true;
    } else {
        exists = true;
    }
}
else
{
        handle = null;
}
response.print("<html><head><title>wxJS database test program</title></head><body>");
response.print("<b>Was the database already present?:" + exists + "</b><br/>");
response.print("<b>Were the tables created?:" + created + "</b><br/>");
response.print("</body></html>");

However, running this test script wasn’t without some apparent wrinkles:

  1. Once I got this one worked out, I ran the script to create a database, which seemed to work fine, but the created Sqlite database didn’t appear in the home directory of the script, where I would have thought it would appear.
    It turns out that mod_wxjs seems to default to creating Sqlite databases in the Apache2.2 directory.
    So, in my script I had to specify the subpath to the location where I wanted the database created.

    var handle = new sqlite.Database("htdocs\\gears\\editor\\tinymce\\maori-bibliography\\store.db");
  2. The created database was not accessible via the command-line Sqlite3.exe tool, at least not until I’d shut down Apache (fair enough I suppose, as it created journalled entries in an associated file which were not applied until Apache shutdown).
  3. It seemed to ignore the Pragma identifying whether the database had been created, being quite happy to recreate the database. This errant behaviour stopped after I restarted Apache (i.e. after it had written out the journalled entries)

However, to date it seems to be usabl, and a great solution at that!