What is XPath

XPath (XML Path Language) is a powerful query language used to navigate and select nodes from an XML (eXtensible Markup Language) document. It provides a standardized way to address different parts of an XML document and extract specific data based on criteria or patterns.

XPath operates on the hierarchical structure of an XML document, treating it as a tree-like structure. The tree consists of nodes, which can be elements, attributes, text, comments, processing instructions, or namespaces.

Before moving further, we will use below XML structure for the example:

<bookstore>
  <book id="1">
    <title>Book 1</title>
    <author>Author 1</author>
  </book>
  <book id="2">
    <title>Book 2</title>
    <author>Author 2</author>
  </book>
</bookstore>

Here are the key components and features of XPath.

How many nodes type in Xpath

Elements, attributes, texts, namespaces, processing instructions, comments, and root nodes are the seven different types of nodes in XPath.

NodeDescription
ElementsAn element refers to a fundamental unit of structure in an XML document. Elements are the building blocks of XML documents and represent specific pieces of information or data within the document.
Example: <bookstore>, <book id=”1″>., etc.
AttributesAttributes refer to additional information associated with XML elements. An attribute provides extra details about an element, such as its properties, characteristics, or metadata. Attributes are name-value pairs and are used to annotate elements with additional information.
Example: id=”1″
TextText refers to the textual content within an XML element. It represents the actual value or data stored within an element. The text content can include plain text, numbers, or any other character data.
NamespacesA namespace is a way to uniquely identify elements and attributes in an XML document. Namespaces help differentiate elements and attributes with the same local name but belonging to different XML vocabularies or document types.
Processing-instructionsProcessing instructions (PIs) are special directives embedded within an XML document that provide instructions for applications or processes that consume or process the XML data. Processing instructions are not part of the actual data content but serve as metadata or instructions for applications or middleware.
CommentsComments provide a way to add explanatory or informative text within the XML document that is intended for human readers and does not affect the processing or interpretation of the XML data. Comments are not considered part of the actual data content and are typically ignored by XML processors.
root nodesThe root node is the parent of all other nodes in the XML document. It is represented by a single element that encloses all other elements within the document. The root node is unique and there can only be one root node in an XML document.

Node Selection

XPath allows you to select nodes based on their location in the XML document. The most basic XPath expression is a forward slash (“/“) that represents the root node of the XML document. Below are the ways to select the nodes.

Path ExpressionDescription
/select the root node
//select all nodes with that name
.select the current node
..the parent node of the current node
@select attributes
nodenameselect all node with that name

For example, /bookstore selects the root element “bookstore” in the XML document and will return the below complete XML file as a result.

/bookstore

XPath Expressions

XPath uses path expressions to navigate through the XML structure and select specific nodes. A path expression consists of a series of steps separated by slashes (“/”).

For instance, /bookstore/book selects all “book” elements that are direct children of the “bookstore” element.

Example:

/bookstore/book

XPath supports two wildcards for element and attribute names:

* Matches any element or attribute.
@ Matches any attribute.

XPath Axes

XPath provides several axes, including:

  • child: select all children of the current node
  • parent: Selects the parent of the current node.
  • ancestor: Selects all ancestors of the current node.
  • ancestor-or-self: Select all ancestors and self
  • attribute: Select all attribute of the current node
  • descendant: Selects all descendants of the current node.
  • following: Selects all nodes that come after the current node.
  • preceding: Selects all nodes that come before the current node.

Predicates:

To filter nodes based on specific criteria, XPath uses predicates. The syntax to use the predicate is brackets (“[]”). XPath unleashes the power of predicates to expertly filter nodes, leaving only the crème de la crème based on specific conditions.

For example, /bookstore/book[@category=’fiction’] selects all “book” elements that have a “category” attribute with a value of “fiction“.

Functions:

To change and extract data from nodes, XPath has a large number of built-in methods available.
Functions can perform tasks such as string manipulation, mathematical calculations, node-set operations, and more.

Examples of XPath functions include string(), concat(), count(), position(), and contains(). Extensible Stylesheet Language Transformations (XSLT), XML databases, web scraping, and XML processing libraries are just a few of the technologies and applications that heavily utilize XPath. It provides a flexible and expressive way to navigate XML documents and extract the required information efficiently.

When constructing XPath expressions, it’s essential to understand the structure of the XML document and the specific data you want to retrieve. XPath expressions can be as simple as selecting a single element or as complex as combining multiple axes, predicates, and functions to perform advanced queries.