Shadow DOM
Extraction IntermediateWhat is Shadow DOM?
Shadow DOM is a web standard that allows you to create encapsulated, isolated components in HTML. Elements inside a Shadow DOM tree are hidden from the main document’s DOM tree — regular DOM queries like document.querySelector() can’t see inside them. This is how browsers implement custom elements like <video>, <input type="range">, and third-party widget components.
Shadow DOM encapsulates HTML components, hiding them from your selectors. The tricky beast that breaks naive scrapers. When you inspect a YouTube video player or a Stripe payment element and can’t find the actual content in the DOM… that’s Shadow DOM at work.
Shadow DOM in Action
<!-- Regular DOM -->
<div id="host-element"></div>
<!-- What you see in DevTools -->
<div id="host-element">
<!--shadow-root (open)-->
<div class="actual-content">Hidden from regular selectors!</div>
<!--/shadow-root-->
</div>
Scraping Shadow DOM
Method 1: JavaScript Execution
const host = document.querySelector('#my-element');
const shadow = host.shadowRoot;
const content = shadow.querySelector('.inner-element');
Method 2: Puppeteer/Playwright
const element = await page.evaluateHandle(() => {
const host = document.querySelector('#my-element');
return host.shadowRoot.querySelector('.inner-element');
});
Method 3: Custom Element Access
// Some shadow DOM elements expose methods
await page.evaluate(() => {
const widget = document.querySelector('custom-widget');
return widget.getData(); // If available
});
Warning: Simple HTTP libraries (requests, curl) can’t access Shadow DOM at all. You need a JavaScript runtime like Puppeteer or Playwright to render pages containing shadow DOM elements.