Page 1 of 1

Help with DOMXPath and removeChild?

Posted: 27 Mar 2013, 17:47
by craywolf
I've written a Full Feed Newspapers plugin, which turns partial-content feeds into full-content feeds for a lot of newspaper sites. It works by fetching the link URL, and using an XPath query to fetch the contents of the DIV with class "entry-content".

Works great, generally. But I'd like to remove certain DIVs within the entry-content DIV. For example one site, in some articles, has an "inline-sidebar" class DIV that I'd like to remove. I've tried nesting another XPath query for div[@class="entry-content"] and using $result->parentNode->removeChild($result) but the sidebar remains.

It also seems to strip out embedded images, which is not a big deal to me, but I wouldn't mind having them back either.

Now, I'm not a developer, either code or web. I'm just a guy who knows just enough PHP and HTML to get by, and this is the most I've done with XPath. I'd love to not only fix this, but learn the why and the how of it. If any of you could point me in the right direction, I would appreciate it greatly.

Here is the relevant section of code from the plugin, and a link to an example article with the inline-sidebar DIV. ... uring.html

Code: Select all

= new DOMDocument();

$basenode = false;

if ($doc) {
    $xpath = new DOMXPath($doc);
= $xpath->query('(//div[@class="entry-content"])');
    foreach ($entries as $entry) {
        // TODO: Remove child elements matching '(div[@class="inline-sidebar"])'
        $basenode = $entry;

    if ($basenode) {
        $article["content"] = $doc->saveXML($basenode);
        $article["plugin_data"] = "newspapers-full,$owner_uid:" . $article["plugin_data"];