Appending DOMNode from DOMXPath back to DOMDocument

by kiawin

DOMXPath is a very simple and good way of traversing HTML and search for a particular node. Playing around with its query function provide straight forward method in retrieving each node’s name and value.

[crayon lang=”php”]$domdoc = new DOMDocument();
$domdoc->loadHTML($html);
$xpath = new DOMXPath($domdoc);

$elements = $xpath->query(“//div[@class=’section’]”);
foreach($elements as $element) {
$element->item(1)->nodeName;
$element->item(1)->nodeValue;
}[/crayon]

The method above will enable us to extract names and values for each node (element).

If you wish to extract all names and values with its html tags, the natural understanding by PHP manual would be:

[crayon lang=”php”]$newDOM = new DOMDocument();
$newNode = $element->item(1);
$newDOM->appendChild($newNode);[/crayon]

In theory this should work, as $element->item(1) returns DOMNode, while $newDOM->appendChild() accepts DOMNode.

In practice, you will get error. The correct way will be as follow:

[crayon lang=”php”]$newDOM = new DOMDocument();
$newNode = $newDOM->importNode($element->item(1), true);
$newDOM->appendChild($newNode);[/crayon]

By setting true in the second parameter of $newDOM->importNode(), all belonging child nodes will be copied along with the papa node.

Lastly, output the scrapped HTML in string by using $newDOM->saveHTML().

[crayon lang=”php”]echo $newDOM->saveHTML()[/crayon]

Happy Scrapping!