Zdeněk Žabokrtský & Rudolf Rosa
{zabokrtsky,rosa}@ufal.mff.cuni.cz
Tuesday 9.00–11.20
SU2
<book xmlns="http://www.nlp.org/book" xmlns:bib="http://www.nlp.org/bibliography"> ⋮ <chapter> ⋮ </chapter> <bib:book><bib:author> … </book>
+ XLink, XPointer
/book | root element named book | |
/book/chapter | all elements named chapter in the root element book | |
/book/* | all elements in the root element book |
/*/*/*/para | elements para in the 4th level | |
//chapter | elements chapter anywhere in the document | |
//bold | //italic | all elements bold or italic anywehere in the document | |
/book/chapter[2] | the second chapter | |
/book/chapter[last()] | the last chapter | |
//chapter[@id="ch2"] | The chapter(s) whose attribute id has the value "ch2" | |
../@lang | attribute lang of the parent of the current node | |
//*[count(para)=3] | all elements that contain exactly three para elements | |
//chapter/para[position()<3] | the first two paragraphs of every chapter | |
//song[@lang="en"]/title | titles of all the English songs |
xsltproc test.xsl test.xml > test.html
xsl
namespace
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> … declaration of templates … </xsl:stylesheet>
<xsl:template match="matching_expression"> … output … </xsl:template>
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head></head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="chapter"> <h1> <xsl:number/>. <xsl:value-of select="title"/> (id = <xsl:value-of select="@id"/>) </h1> <p> <xsl:apply-templates/> </p> </xsl:template> <xsl:template match="title"/><!-- do not process again --> </xsl:stylesheet>
<book> <chapter id="k1"> <title>Intro</title> Lorem ipsum dolor sit amet, consectetur adipiscing elit. </chapter> <chapter id="k2"> <title>Conclusion</title> Etiam euismod scelerisque dapibus. </chapter> </book>
Each type of event is processed by a subroutine with a dedicated name:
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use XML::SAX; use XML::SAX::Writer; my $out; my $writer = XML::SAX::Writer->new(Output=>\$out); my $filter = Element_Attribute_Counter->new(Handler => $writer); my $parser = XML::SAX::ParserFactory->parser(Handler => $filter); $parser->parse_uri($ARGV[0]); print "$out\n",Dumper $filter->get_count; package Element_Attribute_Counter; use base qw/XML::SAX::Base/; use Scalar::Util qw/refaddr/; my %count; sub start_element { my ($self,$element) = @_; my $addr = refaddr $self; $count{$addr}{$element->{Name}}++; my $attributes = $element->{Attributes}; foreach my $attribute (keys %$attributes) { $count{$addr}{"@".$attributes->{$attribute}{Name}}++; } $element->{Name} = uc $element->{Name}; $self->SUPER::start_element($element); } # start_element sub get_count { my $self = shift; return $count{refaddr $self}; } # get_counter
#!/usr/bin/perl use warnings; use strict; use XML::Twig; use Data::Dumper; my %count; sub element_handler { my ($self, $current) = @_; $count{ $current->name }++; $count{ '@' . $_ }++ for $current->att_names; $self->purge; } # element_handler my $twig = XML::Twig->new( twig_handlers => { "*" => \&element_handler, } ); $twig->parsefile($ARGV[0]); print Dumper \%count;
#!/usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; use Data::Dumper; my %count; my $reader = XML::LibXML::Reader->new(location => $ARGV[0]) or die "Cannot read $ARGV[0]\n"; while ($reader->nextElement) { $count{$reader->name}++; if ($reader->hasAttributes) { $reader->moveToFirstAttribute; do {{ $count{'@' . $reader->name}++; }} while $reader->moveToNextAttribute == 1; } } print Dumper \%count;
XML::DOM uses the expat library
#!/usr/bin/perl use strict; use warnings; use XML::DOM; my $parser = XML::DOM::Parser->new(); my $doc = $parser->parsefile('book.xml'); foreach my $chap ($doc->getElementsByTagName('chapter')) { print 'Attribute id contains: ', $chap->getAttribute('id'), "\n"; foreach my $child ($chap->getChildNodes) { my $type = $child->getNodeType; if ($type == ELEMENT_NODE) { print 'Element ', $child->getTagName, ' contains: ', $child->getFirstChild->getNodeValue, "\n"; } elsif ($type == TEXT_NODE) { print 'Text: ', $child->getNodeValue, "\n"; } } }
Similar to XML::DOM, but uses the libxml library
#!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $doc = XML::LibXML->load_xml(location => 'book.xml'); for my $chap ($doc->getElementsByTagName('chapter')) { print 'Attribute id contains: ', $chap->getAttribute('id'), "\n"; foreach my $child ($chap->getChildNodes) { my $type = $child->nodeType; if ($type == XML_ELEMENT_NODE) { print 'Element ', $child->nodeName, ' contains: ', $child->getFirstChild->nodeValue, "\n"; } elsif ($type == XML_TEXT_NODE) { print 'Text: ', $child->nodeValue, "\n"; } } }
XML::XSH2 is built on top of XML::LibXML
open book.xml ; my $ch := insert element chapter append book ; insert attribute 'id="a"' into $ch ; insert attribute 'nonumber=1' into $ch ; insert chunk "<title>Appendix</title>" prepend $ch ; insert text "Quamquam sunt sub aqua, sub aqua maledicere temptant." append $ch ; insert text {"\n"} after /book/chapter[last()] ; echo "<html><head></head><body>" ; my $count = 0; for my $ch in /book/chapter { echo "<h1>" ; if not($ch[@nonumber=1]) echo :s {++$count} "." ; echo :s $ch/title " (id = " $ch/@id ")</h1>" ; for $ch/text() echo "<p>" (.) "</p>" ; } echo "</body></html>" ;
Not a markup language, but a human readable data exchange format:
NPFL092 | Lecture 10 |