HTML versus XML: Similarities
… But how about machines?
Problems with Automated Interpretation of HTML Document
An intelligent agent trying to retrieve the names of the authors of the book
- Both use tags(e.g. <h2> and </year>)
- Tags may be nested (tags within tags)
- Human users can read and interpret both HTML and XML representations quite easily
… But how about machines?
Problems with Automated Interpretation of HTML Document
An intelligent agent trying to retrieve the names of the authors of the book
- Authors’ names could appear immediately after the title
- or immediately after the word by
- Are there two authors?
- Or just one, called “V. Marek and M. Truszczynski”?
HTML vs XML: Structural Information
- HTML documents do not contain structural information: pieces of the document and their relationships.
- XML more easily accessible to machines because
- Every piece of information is described.
- Relations are also defined through the nesting structure.
- E.g., the <author>tags appear within the <book>tags, so they describe properties of the particular book.
- A machine processing the XML document would be able to deduce that
- the authorelement refers to the enclosing bookelement
- rather than by proximity considerations
- XML allows the definition of constraints on values
- E.g.a year must be a number of four digits
HTML vs XML: Formatting
- The HTML representation provides more than the XML representation:
- The formatting of the document is also described
2. Τhe main use of an HTML document is to display information: it must define formatting
3. XML: separation of content from display
- same information can be displayed in different ways
No comments:
Post a Comment