EVEN though documents have been published over the global networks before Hyper Text Markup Language was even around, the ease that came with this language was bound to generate an overwhelming response. The concept of linking the documents, such as e-business, e-books, e-magazines on web through a mechanism known as a hyperlink took internet community by storm. However, now HTML is on the verge of being replaced by some newer markup languages, and one of them is Extensible Markup Language.
The success of HTML in transmitting documents for global viewing cannot be debated. However, the language itself is limited in scope: it is only intended for displaying documents in a web browser. There are two important implications of this:
(a) the language is primarily meant to display or present the data and can’t maintain relationship between data items presents problems for businesses. How would company XYZ maintain an accurate relationship between it’s clients and sales-person using HTML?
(b) Due to the intense competition in the browser market (mostly between Microsoft and Netscape), the vendors further complicated the issue by diverging from the standards. This resulted in vendor-specific forms of HTML that were either unable to render on a different browser or, much worse, rendered in a totally unacceptable manner.
A far-lesser mentioned problem with HTML was the encoding scheme used in its files, which is platform dependent. Although the documents would render on browser irrespective of the platform, the file itself could not be opened for modifications on a platform other than the one it was created on. This posed a serious problem, especially for companies with limited budgets; there was no way to open up a HTML file created on a Windows-based PC for modifications on the company’s other rarely-used Unix based PC.
At present, most of the documents are encoded in ISO-8859-1, or windows-1252, or EBCDIC.
The HTML4.01, which is the current version also poses problems from a technical point of view. The language relies on a fixed tag set to present documents. Which means that the programmers do not have the ability to extend the language by creating their own syntax. Besides, even though HTML is generic enough to be an all-purpose language for transmittal of information over the web, it is not specific enough to address issues that are of significance to one person or group.
A new markup language was introduced that aimed to resolve the problems with HTML. It was called XHML. Since XHTML is simply a rearrangement of the HTML language based on XML, an introduction to XML is necessary to appreciate the benefits that XHTML has to offer.
XML is also based on SGML XML files are pure text files that are not platform or software dependent. XML achieves this universality by making it mandatory to encode the documents using UNICODE character set. It is truly extensible; programmers are able to create their own tags so that the content of the document may be described. Suppose a hypothetical company wants to document its net profit for the past five years. In XML, the data could be represented as shown in Table 1.
Even though such a document could be created with HTML, however, it would only be useful for display purposes. The meaning of the data would not be apparent and any other application, besides the browser, would not be able to interpret that data. Thus, it would be almost impossible to process that data using other applications, such as Microsoft Excel.
However, the data shown in XML format (Table 1) shows the relationship and the hierarchy and is easily re-used; the company can process it and get a graphical representation using Excel or store it in its database. To learn more about the XML syntax, please refer to the listing of websites given at the end of the article.
The tags, such as and are descriptive. They actually describe the purpose of the document. With such a descriptive markup, the same document can readily be processed in many different ways, using only those parts of it, which are considered relevant. Thus, an application to calculate the total net profit for the past years would be able to sum up the values and display the results, while another application to display yearly reports could also make use of the same data and process it according to the requirements. To truly consider the benefits of such a markup, we need to think of the current business environment.
Today, the business trend of globalization has made it mandatory to find a cost-effective way to communicate information. Coupled with the amazing tools of information technology it has also been made possible to store ever-increasing amounts of data in an electronic format- and indeed, this is exactly what is happening. However, there is a general tendency to equate data with what is stored in the company’s relational databases, such as Oracle or SQL. An organization, however, has data in other formats as well.
A report stored in a spreadsheet, electronic documents in PDF format, email messages, text files, mailing lists created in MS Word could be stored in a XML format and then re-used for whatever purpose that the organization decides. Instead of wasting valuable secondary memory, the data only has to be stored once; thus saving the cost of storage and processing and the cost associated with having large amounts of redundant data. This is not to suggest that formats other than XML are largely irrelevant; it is just that if we store data in XML format then various applications can make use of the same set of data.
Giants, such as Microsoft and Oracle have already started rolling out products that are capable of providing access to XML documents using the existing applications. Since XML documents are text documents, organizations can cut down on costs associated with high-powered back end server machines. Organizations do not run a risk of running into license problems either because XML is a non-propriety format; it is not dependent on any application or operating system.
The pros of XML are worth the efforts to explore it further to see whether it should be implemented to support the existing corporate information technology infrastructure. However, how does all this XML talk relate to HTML? Since XML is a data structuring language and documents on the WWW are composed of some sort of data, XML can be used to structure the web site contents, while using HTML to display that data into the client’s web browser. The combination of HTML and XML is what gave birth to XHTML.
XHTML provides the benefits of marking up data so that the hierarchical structure of the data and the relationships between the various data elements are preserved. By preserving the structure of data, organizations can save both the training time and cost associated with training new programmers on how their web sites work so that they can get to work straight away. Because HTML adheres to the strict standards outlined by XML, there is little problem of documents being incompatible. Organizations can easily share data and form long-lasting value added relationships with both the customers and the suppliers.
XML can have remarkable benefits not only for individual businesses, but also for the respective industries. It can serve to standardize the electronic communication of information of the various industries. The standardization can easily be achieved because XML relies on custom markup tags that define the actual data contained. Thus, the data is accompanied by meta-data, which can be used by an industry to exchange documents. Using the sample XML document, it would be very convenient for the financial sector to come up with custom tags that define how to exchange information such as balance sheet and income statement figures. Those documents can then be interpreted according to the requirements using any application that supports XML.
Apart from the meta-data concept, XML presents a promising future to electronic business to business transactions, thus, facilitating the entire supply chain and other business communication. Even though electronic business communication has existed before XML was even around, this was largely accomplished using Electronic Data Interchange (EDI). The problem with EDI is that it requires 100 per cent compatibility between the connected systems; a slight deviation was enough to cause not so slight problems for the parties involved in the transaction.
Furthermore, EDI required the implementation of expensive Value Added Networks (VAN) to transmit the information. XML, on the other hand, is capable of transmitting information over the inexpensive internet via HTTP. Thus, companies that opt for XML do not have to devote a large percentage of their budgets to acquire, setup, operate and maintain VAN’s. Data exchange can be carried out both effectively and efficiently using the existing corporate network.
E-commerce applications are probably the biggest beneficiaries of XML. E-commerce is a broad term used to describe transactions that take place electronically. Companies have discovered that business processes can be streamlined, costs can be cut and response times can be increased if transactions are done via the Internet. As almost transactions of all sorts involve the exchange of data (in the form of invoices, reports, receipts, etc.), XML and e-commerce are a perfect match. Generally, there are two main types of business relationships: business to business (B2B) and business to consumer (B2C). Both types have their potential uses for XML; because both make use of data!
The W3C has already released the XHTML specification and, as such, we are bound to perceive that any new HTML standards in the future would actually be XHTML standards. Even though HTML would become obsolete as newer versions of XHTML are released, it is not at a risk. It is bound to stay until all applications have been designed to conform to the XML standard. In the context of web publishing, HTML is bound to stay till the browsers have been designed to conform to the latest specifications.
Apart from the WWW, the XML has made quite an entry into other areas where data exchange is carried out electronically. It has already made a place for itself for B2B transactions, where the only choice before XML was the use of costly EDI. Software vendors have started to release versions that support XML as a native format (MS Office2003 being one of them). Relational database vendors are also striving to incorporate the XML support into the software (Microsoft’s SQL server 2000 offers XML support). Besides XML-supported databases, there are fully XML databases as well: dbXML Group , Ipedo XML Database and Tamino . In fact, SQL itself is being revised to support queries on XML data.
XML can also be used to create meta-dictionaries or vocabularies for databases. Thus, industry-specific meta-dictionaries can also be developed that would facilitate complex B2B interactions that are normally found in industries such as aviation and pharmaceuticals.
With so much industry support for XML, XML seems more than just a hot buzzword. Many third-party companies are already working to develop electronic communication services around XML technology. These services aim to permanently remove the interoperability barriers among systems and companies alike.
The future may even hold the possibility of an inexpensive EDI that does not mandate compatibility between systems to operate using XML. Thus, the XML technology is worth looking at by businesses for cheaper alternatives to expensive and often unreliable electronic data exchange mechanisms.
The writer is a student of computer science at the Hamdard Institute of Management Sciences, Karachi