Most important things a PHP developer has to know about XPath

I had an interesting task some time this year and i wrote down a few little bits for myself with intention to write an article later on. I might need it and it might help some others too so here is a list of most important things you have to know about XPath if you are a PHP developer.

When and how to use XPath in PHP

If you need to load relatively small documents (lets say they will easily fit in RAM including all parsed structures) and then access many of the nodes in random order, then DOM and XPath is probably what you want. Pull, Push and SimpleXML parsers are very good but not necessarily good fit for this access pattern. If you have to perform many searches on an XML document XPath is a great fit, it also works very fast once you parse the document.

PHP's DOM allows you to load and parse XML document. DOM is an in-memory representation of the XML data as a tree of xml dom nodes. Each of the nodes will have a list of children, attributes and helpful methods.

How to get some useful XML to play with

Ebay has a great set of web services to do almost anything you need via their APIs. There is SOAP api JSON and also REST XML one. All you have to do is to register with the ebay developers programe. Then in your account page you will see a set of security tokens. You will need to copy the AppId as this is the one you will be using while searching for items on ebay via RESP XML API.

Next step is to have a look at the basic information page on how to make a first call.

Here is a simple example how to search for ebay items by keywords (just get rid of the new lines):

http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&
SERVICE-VERSION=1.0.0&
RESPONSE-DATA-FORMAT=XML&REST-PAYLOAD&
SECURITY-APPNAME=YourAppId
&keywords=router

It will give you back an XML looking more or less like this file. A part of it here just for preview:

<?xml version="1.0" encoding="UTF-8"?>
<findItemsByKeywordsResponse xmlns="http://www.ebay.com/marketplace/search/v1/services">
  <ack>Success</ack>
  <version>1.8.0</version>
  <timestamp>2010-10-28T19:15:13.992Z</timestamp>
  <searchResult count="100">
    <item>
      <itemId>260682205733</itemId>
      <title>ZyXEL ZyAIR G-4100 Wireless Router B-Stock!! NEW!!</title>
      <globalId>EBAY-US</globalId>
      <primaryCategory>
        <categoryId>44997</categoryId>
        <categoryName>802.11g</categoryName>
      </primaryCategory>
      <galleryURL>http://thumbs2.ebaystatic.com/pict/2606822057338080_1.jpg</galleryURL>
      <viewItemURL>http://cgi.ebay.com/ZyXEL-ZyAIR-G-4100-Wireless-Router-B-Stock-NEW-/260682205733?pt=COMP_EN_Routers</viewItemURL>
      <productId type="ReferenceID">54847007</productId>
      <paymentMethod>PayPal</paymentMethod>
      <autoPay>false</autoPay>
      <postalCode>60062</postalCode>
      <location>Northbrook,IL,USA</location>
      <country>US</country>
      <shippingInfo>
        <shippingServiceCost currencyId="USD">5.95</shippingServiceCost>
        <shippingType>Flat</shippingType>
        <shipToLocations>US</shipToLocations>
      </shippingInfo>

How to use XPath

Now that we have a nice XML file to play with we can begin. Here is an example of how to access XML using DOM from PHP and how to execute XPath queries.

    $xmlString = file_get_contents("modified.sample.xml");
    // Create DOM instance
    $xmlDom = new DOMDocument();
    // Load XML string and parse it now
    $xmlDom->loadXML($xmlString);
    
    // Create XPath parser by passing a DOMDocument instance the XPath constructor 
    $xpath = new DOMXPath($xmlDom);
    // Now you have to register namespaces if your XML has them in the root element
    $xpath->registerNamespace('eb', "http://www.ebay.com/marketplace/search/v1/services");
    // run sample query to get elements
    $elems = $xpath->query('//eb:item/eb:title');
    
    echo ("_" . $elems->length . "_");
    foreach( $elems as $elem ){
        echo $elem->nodeValue ."\n";
    }

Please note that if your XML document uses namespaces (as the ebay does) you will need to register it like in the example above.

Speeding up queries

XPath queries in PHP can be narrowed down to an node element. If you are searching for some elements and you have reference to the parent node you can pass it as a second argument to the query method. Then XPath will execute the query relatively to the node passed. Then if you have a following structure you can first loop over items and then run more detailed queries per item. Its only useful if you need to perform many calls, have deep structures or if you are using recursive functions to crawl the XML.

root
  ->items
    ->item
      ->... 
    ->item
      ->... 
    ->item
      ->... 

XPath queries cheat sheet - all you really need

There are just a few queries that you will really need. All the other stuff is just fancy pants stuff. Here is a list of most important bits:

  • expressions starting with / are absolute paths from root of the tree
  • expressions starting with // are searches anywhere in the tree
  • expressions without / or // are relative to the current node (passed as a context parameter to the query method)
  • attributes are accessed with @ sign

And a few expressions:

//eb:item

finds all tags named "item", no matter where in the document tree

//eb:item/eb:title

finds all tags named "title" which are children of item tag

//eb:item/*

finds all children of item tag (no matter where item tag is)

//eb:searchResult/eb:item

finds all tags named "item" that are nested in searchResult tag.

/eb:findItemsByKeywordsResponse/eb:searchResult

finds a tag starting from the root

//@currencyId

finds all attributes named "currencyId" (no matter where)

//eb:shippingServiceCost/@currencyId

finds all attributes named "currencyId" of shippingServiceCost tags

//eb:productId[@type="ReferenceID"]

finds all attributes named "currencyId" of shippingServiceCost tags

You may read more on these pages:

http://msdn.microsoft.com/en-us/library/ms256086.aspx
http://www.w3schools.com/xpath/xpath_syntax.asp

Comments

Post new comment

Image CAPTCHA

About the author

Artur Ejsmont

Hi, my name is Artur Ejsmont,
welcome to my blog. I am a passionate software engineer living in Sydney and working for Yahoo!

Web Scalability for Startup Engineers

If you are into technology, you can order my book Web Scalability for Startup Engineers on Amazon. I would love to hear what are your thoughts so please feel free to drop me a line or leave a comment.

Follow my RSS