docs/faq-performance.xml - xerces2-j - Git at Google

 <?xml version='1.0' encoding='UTF-8'?>
 <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
 <faqs title='Performance FAQs'>
  <faq title='General Performance'>
   <q>General Performance</q>
   <a>
    <p>
     Don't use XML where it doesn't make sense. XML is not a panacea.
     You will not get good performance by transferring and parsing a
     lot of XML files.
    </p>
    <p>Using XML is memory, CPU, and network intensive.</p>
   </a>
  </faq>
  <faq title='Parser Performance'>
   <q>Parser Performance</q>
   <a>
    <p>
     Avoid creating a new parser each time you parse; reuse parser
     instances. A pool of reusable parser instances might be a good idea
     if you have multiple threads parsing at the same time.
    </p>
    <p>
     The parser configuration will affect the performance of the parser.
     If you are interested in evaluating the parser performance with DTDs use     StandardParserConfiguration (Note: this is the default parser configuration).
     For testing the performance for XML Schema evaluation turn on schema validation feature (this will insert XML Schema Validator in the pipeline).
    </p>
   </a>
  </faq>
  <faq title='Parsing Documents Performance'>
   <q>Parsing Documents Performance</q>
   <a>
    <p>
     There are a variety of things that you can do to improve the
     performance when parsing documents:
    </p>
    <ul>
     <li>
      Convert the document to US ASCII ("US-ASCII") or Unicode
      ("UTF-8" or "UTF-16") before parsing. Documents written using
      ASCII are the fastest to parse because each character is
      guaranteed to be a single byte and map directly to their
      equivalent Unicode value. For documents that contain Unicode
      characters beyond the ASCII range, multiple byte sequences
      must be read and converted for each character. There is a
      performance penalty for this conversion. The UTF-16 encoding
      alleviates some of this penalty because each character is
      specified using two bytes, assuming no surrogate characters.
      However, using UTF-16 can roughly double the size of the
      original document which takes longer to parse.
     </li>
     <li>
      Explicitly specify "US-ASCII" encoding if your document is in
      ASCII format. If no encoding is specified, the XML specification
      requires the parser to assume UTF-8 which is slower to process.
     </li>
     <li>
      Avoid external entities and external DTDs. The extra file
      opens and transcoding setup is expensive.
     </li>
     <li>
      Reduce character count; smaller documents are parsed quicker.
      Replace elements with attributes where it makes sense. Avoid
      gratuitous use of whitespace because the parser must scan past it.
     </li>
     <li>
      Avoid using too many default attributes. Defaulting attribute
      values slows down processing.
     </li>
    </ul>
   </a>
  </faq>
  <faq title='XML Application Performance'>
   <q>XML Application Performance</q>
   <a>
    <ul>
     <li>If you don't need validation (and infoset augmentation) of XML documents, don't include validators (DTD or XML Schema) in the pipeline. Including the validator components in the pipeline will result in a performance hit for your application: if a validator component is present in the pipeline, Xerces will try to augment the infoset even if the validation feature is set to false.
 If you are only interested in validating against DTDs don't include XML Schema validator in the pipeline.

     </li>
     <li> If you don't need validation, avoid using a DOCTYPE line in your XML document.
      The current version of the parser will always read the DTD if the DOCTYPE line
      is specified even when validation feature is set to false.
     </li>
     <li>
      For large documents, avoid using DOM which uses a lot of memory.
      Instead, use SAX if appropriate. The DOM parser requires that
      the entire document be read into memory before the application
      processes the document. The SAX parser uses very little memory
      and notifies the application as parts of the document are parsed.
     </li>
    </ul>
   </a>
  </faq>
 </faqs>
	<?xml version='1.0' encoding='UTF-8'?>
	<!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
	<faqs title='Performance FAQs'>
	<faq title='General Performance'>
	<q>General Performance</q>
	<a>
	<p>
	Don't use XML where it doesn't make sense. XML is not a panacea.
	You will not get good performance by transferring and parsing a
	lot of XML files.
	</p>
	<p>Using XML is memory, CPU, and network intensive.</p>
	</a>
	</faq>
	<faq title='Parser Performance'>
	<q>Parser Performance</q>
	<a>
	<p>
	Avoid creating a new parser each time you parse; reuse parser
	instances. A pool of reusable parser instances might be a good idea
	if you have multiple threads parsing at the same time.
	</p>
	<p>
	The parser configuration will affect the performance of the parser.
	If you are interested in evaluating the parser performance with DTDs use StandardParserConfiguration (Note: this is the default parser configuration).
	For testing the performance for XML Schema evaluation turn on schema validation feature (this will insert XML Schema Validator in the pipeline).
	</p>
	</a>
	</faq>
	<faq title='Parsing Documents Performance'>
	<q>Parsing Documents Performance</q>
	<a>
	<p>
	There are a variety of things that you can do to improve the
	performance when parsing documents:
	</p>
	<ul>
	<li>
	Convert the document to US ASCII ("US-ASCII") or Unicode
	("UTF-8" or "UTF-16") before parsing. Documents written using
	ASCII are the fastest to parse because each character is
	guaranteed to be a single byte and map directly to their
	equivalent Unicode value. For documents that contain Unicode
	characters beyond the ASCII range, multiple byte sequences
	must be read and converted for each character. There is a
	performance penalty for this conversion. The UTF-16 encoding
	alleviates some of this penalty because each character is
	specified using two bytes, assuming no surrogate characters.
	However, using UTF-16 can roughly double the size of the
	original document which takes longer to parse.
	</li>
	<li>
	Explicitly specify "US-ASCII" encoding if your document is in
	ASCII format. If no encoding is specified, the XML specification
	requires the parser to assume UTF-8 which is slower to process.
	</li>
	<li>
	Avoid external entities and external DTDs. The extra file
	opens and transcoding setup is expensive.
	</li>
	<li>
	Reduce character count; smaller documents are parsed quicker.
	Replace elements with attributes where it makes sense. Avoid
	gratuitous use of whitespace because the parser must scan past it.
	</li>
	<li>
	Avoid using too many default attributes. Defaulting attribute
	values slows down processing.
	</li>
	</ul>
	</a>
	</faq>
	<faq title='XML Application Performance'>
	<q>XML Application Performance</q>
	<a>
	<ul>
	<li>If you don't need validation (and infoset augmentation) of XML documents, don't include validators (DTD or XML Schema) in the pipeline. Including the validator components in the pipeline will result in a performance hit for your application: if a validator component is present in the pipeline, Xerces will try to augment the infoset even if the validation feature is set to false.
	If you are only interested in validating against DTDs don't include XML Schema validator in the pipeline.

	</li>
	<li> If you don't need validation, avoid using a DOCTYPE line in your XML document.
	The current version of the parser will always read the DTD if the DOCTYPE line
	is specified even when validation feature is set to false.
	</li>
	<li>
	For large documents, avoid using DOM which uses a lot of memory.
	Instead, use SAX if appropriate. The DOM parser requires that
	the entire document be read into memory before the application
	processes the document. The SAX parser uses very little memory
	and notifies the application as parts of the document are parsed.
	</li>
	</ul>
	</a>
	</faq>
	</faqs>