Friday, October 12, 2007

Parsing XML in PHP

Traverse the document tree
In this example we have a XML file which consists of a document with a specific version. The document contains persons with the attributes firstname, lastname and description. The code snippet finds the document version and prints it. It's pretty straight forward you just check all the top nodes of the document and look for the one named document.


foreach ( $tree->children as $document )
{
// parse the document
if ( $document->name == "document" )
{
// get the document version attribute
foreach ( $document->attributes as $documentAttr )
{
if ( $documentAttr->name == "version" )
{
print( "Found document with version: " . $documentAttr->content . "
" );
}
}

// find persons here
}
}


When you've found the document node you can start looking for persons. This is done in the same manner, check the children nodes and look for person.

To make the process of getting the attribute values simpler we write a helper function to fetch the attribute value from a node.


function getAttrValue( $node, $attrName )
{
$ret = false;

foreach ( $node->attributes as $nodeAttr )
{
if ( $nodeAttr->name == $attrName )
{
$ret = $nodeAttr->content;
}
}
return $ret;
}


Now we're ready to parse the information describing the persons in this example. When we've found the person node we check all the subnodes and look for the nodes we want and fetch the information from these nodes.


// parse all persons
foreach ( $document->children as $person )
{
if ( $person->name == "person" )
{
print( "Found a new person
" );

$firstName = "";
$lastName = "";
$descriptionName = "";

// get the name and description
foreach ( $person->children as $personAttribute )
{
switch ( $personAttribute->name )
{
case "firstname" :
{
$firstName = getAttrValue( $personAttribute, "value" );
}break;

case "lastname" :
{
$lastName = getAttrValue( $personAttribute, "value" );
}break;

case "description" :
{
// get the description text
foreach ( $personAttribute->children as $description )
{
if ( $description->type == 3 )
{
$description = $description->content;
}
}
}break;
}
}

print( "The persons firstname is: $firstName
" );
print( "The persons lastname is: $lastName
" );
print( "The persons description is: $description
" );
}
}


include_once( "ezxml/classes/ezxml.php" );

$xmlDocument =
"





Coder.






Coder.


";

$tree =& eZXML::domTree( $xmlDocument, array( "TrimWhiteSpace" => true ) );

foreach ( $tree->children as $document )
{
// parse the document
if ( $document->name == "document" )
{
// get the document version attribute
foreach ( $document->attributes as $documentAttr )
{
if ( $documentAttr->name == "version" )
{
print( "Found document with version: " . $documentAttr->content . "
" );
}
}

// parse all persons
foreach ( $document->children as $person )
{
if ( $person->name == "person" )
{
print( "Found a new person
" );

$firstName = "";
$lastName = "";
$descriptionName = "";

// get the name and description
foreach ( $person->children as $personAttribute )
{
switch ( $personAttribute->name )
{
case "firstname" :
{
$firstName = getAttrValue( $personAttribute, "value" );
}break;

case "lastname" :
{
$lastName = getAttrValue( $personAttribute, "value" );
}break;

case "description" :
{
// get the description text
foreach ( $personAttribute->children as $description )
{
if ( $description->type == 3 )
{
$description = $description->content;
}
}
}break;
}
}

print( "The persons firstname is: $firstName
" );
print( "The persons lastname is: $lastName
" );
print( "The persons description is: $description
" );
}
}
}
}

/*!
Function to fetch an attribute value.
Will return the value of the attribute if found. False if not found.
*/
function getAttrValue( $node, $attrName )
{
$ret = false;

foreach ( $node->attributes as $nodeAttr )
{
if ( $nodeAttr->name == $attrName )
{
$ret = $nodeAttr->content;
}
}
return $ret;
}


This code will produce the following output:

Found document with version: 42
Found a new person
The persons firstname is: Bård
The persons lastname is: Farstad
The persons description is: Coder.
Found a new person
The persons firstname is: Christoffer A.
The persons lastname is: Elo
The persons description is: Coder2.


Using XML is simple and straightforward with the eZ xml class. You don't need any external libraries, the class produce the same document tree as you would get from the XML functions in PHP (which all need external libraries). It is also the only PHP XML parser class which returns the same object tree as the library functions, making it easy to use your programs both on sites where XML is compiled into PHP and sites where it isn't.