XPath
Extraction BeginnerWhat is XPath?
XPath (XML Path Language) is a query language used to navigate through elements and attributes in an XML or HTML document. Think of it as a superpower-infused street address system for web pages — it tells you exactly where to find any piece of content in the HTML structure.
XPath is the GPS for HTML. No cap, without it, you’re lost in the DOM jungle. It uses path expressions to select nodes or node-sets in an XML/HTML document, similar to how you’d navigate folders on your computer, but way more powerful.
Why Developers Love XPath
When you’re building a web scraper, the HTML structure is your treasure map. XPath is the compass that gets you to the gold. Unlike CSS selectors (which are simpler but less flexible), XPath can:
- Navigate up and down the document tree (parent, child, sibling)
- Filter elements based on text content (e.g.,
//div[contains(text(), 'Price')]) - Select elements by attributes and partial matches
- Handle dynamic content and complex nested structures
Pro tip: Use browser DevTools $x("//tag[@class='value']") to test XPath expressions in real-time.
//div[@class='product-card']//span[@class='price']/text()
//table[@id='data-table']//tr[position() > 1]//td[2]
XPath 1.0 is supported everywhere, while XPath 2.0+ offers more functions but has limited browser support.