Sometime we need to extract some inner xml from the bigger xml, while parsing it.
One simple approach is to use the some String util (such as Apache commons etc.) and call stringBetween finction, but then it does not give you the parent tag and then you add the prefix and suffix the start and end tag respectively.
Alternatively you can use a mix of XQuery(to search the exact xml node) and then apply LS Serializer to get the inner xml or the sub xml as String.
In the below example of note xml, lets say we have to extract <attachments> xml. So, we will first try to get the node and then use LS Serailzer to get attachments.
The first piece of code, is the input, the second is the output we are looking for. And, the third one is the actual code.
So, that's it. Happy chopping the xmls! :) ;)
One simple approach is to use the some String util (such as Apache commons etc.) and call stringBetween finction, but then it does not give you the parent tag and then you add the prefix and suffix the start and end tag respectively.
Alternatively you can use a mix of XQuery(to search the exact xml node) and then apply LS Serializer to get the inner xml or the sub xml as String.
In the below example of note xml, lets say we have to extract <attachments> xml. So, we will first try to get the node and then use LS Serailzer to get attachments.
The first piece of code, is the input, the second is the output we are looking for. And, the third one is the actual code.
So, that's it. Happy chopping the xmls! :) ;)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<note> | |
<to>Mike</to> | |
<from>Dan</from> | |
<heading>Webinar</heading> | |
<body>Don't forget the webniar coming Sunday!</body> | |
<attachments> | |
<attachment> | |
<id>1</id> | |
<filename>agenda.pdf</filename> | |
<downloadlink>http://somehtingsomething.com/agenda</downloadlink> | |
</attachment> | |
<attachment> | |
<id>2</id> | |
<filename>invite.pdf</filename> | |
<downloadlink>http://somehtingsomething.com/invite</downloadlink> | |
</attachment> | |
</attachments> | |
</note> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<attachments> | |
<attachment> | |
<id>1</id> | |
<filename>agenda.pdf</filename> | |
<downloadlink>http://somehtingsomething.com/agenda</downloadlink> | |
</attachment> | |
<attachment> | |
<id>2</id> | |
<filename>invite.pdf</filename> | |
<downloadlink>http://somehtingsomething.com/invite</downloadlink> | |
</attachment> | |
</attachments> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// removed imports for the sake of clarity. | |
// But, there is no need of any other dependencies for this code to run, all these classes are available in JDK. | |
import .... | |
/** | |
* Example of LS Serialiser | |
***/ | |
public class XMLParser{ | |
public static String innerXmlAndParentNodeXmlTag() throws Exception{ | |
Node node = getNode(); | |
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); | |
DOMImplementationLS lsImpl = (DOMImplementationLS)registry.getDOMImplementation("LS"); | |
LSSerializer serializer = lsImpl.createLSSerializer(); | |
// diable the first line of xml declaration, by default its true. | |
serialiser.getDomConfig().setParameter("xml-declaration", false); | |
String xml = serializer.writeToString(node); | |
reutrn xml; | |
} | |
// for demo code its file, else it could be some other source like Stream or a String etc. | |
private Node getNode() throws ParserConfigurationException, SAXException, IOException, XPathExpressionException{ | |
File inputFile = new File("input.txt"); // might need to change the path as per your own. | |
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); | |
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); | |
Document doc = dBuilder.parse(inputFile); | |
doc.getDocumentElement().normalize(); | |
XPath xPath = XPathFactory.newInstance().newXPath(); | |
String expression = "//attachments"; | |
Node node = (Node) xPath.compile(expression).evaluate(doc, XPathConstants.NODE); | |
return node; | |
} | |
} |