Name:
Location: Columbus, Ohio, United States

Wednesday, October 13, 2004

Understanding GRML.

The development of a markup language.

Introduction

HTML is the primary markup language used on the web. After its first release, it lacked many of the features taken for granted today on the web. It took many years for HTML to become what it is. In fact, almost four years passed between the first attempts at a markup language and HTML 2.0. In the years since 1995, HTML has continued changing. This demonstrates the commitment necessary to develop a markup language.

Before the development of HTML began, it needed software to test its features. The software developed was the precursor to the first HTML web browser. There is no way to test a markup language without having the software first. This demonstrates the requirements for developing a markup language. The software drives its development.

The purpose of this article is to show how General Reuse Markup Language, or GRML, developed into its current format. Examples are given to show the differences between GRML 1.0 and GRML 2.0. The attributes of each markup language are described along with how they are used on the web.

Background

This is one in a series of articles on GRML. Before continuing, read the article, Introducing GRML. It provides an overview of existing file formats and markup languages, and explains why GRML was created.

If you are not interested in markup languages, potential alternative approaches, or web browser technology, this is not the article for you. This discussion is not suitable for anyone who feels HTML is the only way to browse the web.

The Beginning.

The process of creating GRML was indirect. It began with a desire to create a front-end to extract content from web pages. The idea was to submit a web page request and retrieve the content in a format usable by a variety of applications. HTML displays content in one way, so it is not used by a variety of applications. Since the target web pages used HTML, the retrieved content needed to be available using another format.

The only solution to extracting content from a HTML web page is to use an adapter. Adapters read data in one format and write them in another. This was the perfect solution, except for one thing. HTML web pages are described differently for every web page requested. There is no way to extract author information, or article text, or product descriptions without creating an adapter for each web page. There had to be a better way.

Building a web front-end.

While trying to find a practical way to extract content from web pages, a front-end was being developed to display the content. A single adapter was developed to format HTML from a single web page into an informal format used by the front-end. This informal format was the initial step toward creating a markup language.

From June, 2002 until August, 2002, the front-end used an adapter to convert HTML web pages to text, for display. There was no format, other than reading single lines of text from the adapter. As development continued, more adapters were added, until 6 were available. Web page requests sent from the front-end had to use one of these 6 adapters. There was no feature for users to directly enter a web page request.

The first attempt.

As the front-end was developed, a form was needed for sending requests using input controls. This required a formal approach for handling requests to and responses from a web page. Using arbitrary lines of text was inefficient. This was the beginning of Personal Markup Language.

The new markup language had form support and provided a structure for formatting web page content. However, the form was limited. At first, the front-end created a form from the first web page request. There was no way to display another form. To allow the markup language to create a form for each web page request, the front-end was updated.

Upgrading the format.

With form support, the front-end now sent web page requests from input controls and created a form from web page responses, when necessary. The only feature missing was a way to organize web page content into groups, and display each group of content separately in the front-end. This required a new markup language. It was the beginning of the Simple Markup Language.

When the front-end displays content from a web page, it is called a dimension. Splitting content into different groups creates a dimension for each group. The front-end needed to display different dimensions of content for forecasting, logistics, and data analysis. Once the front-end added this, the markup language supported multidimensional views.

As the markup language was being developed, there was one constant. The front-end did not allow the user to directly enter a web page request. A user had to choose from the 6 web page requests used by the front-end. Or, submit a request using the form input controls of a web page.

Once direct web page requests were added, it was possible to "browse" web pages. The front-end became a web browser. Using a web browser required the markup language to be completely redesigned. This new markup langauge was the first version of GRML.

GRML 1.0.

Completed January 2003, GRML supported form input controls, columns, and results. There was multidimensional support and it used the concept of "web applications". Each represented an activity that a user performs on the web. The first GRML web browser had "web applications" for using a search engine, getting news headlines, viewing auction listings, and doing a job search.

"Web applications" were a holdover from the days of the front-end, when directly submitting a web page request or opening a file were not supported. While the web browser allowed web page requests, they had to be from a "web application" or a form if the request was to be sent.

The reason for "web applications" is to use content from HTML web pages in GRML web browsers. Since HTML web pages are abundant and GRML is new, it is advantageous to have the ability to adapt HTML to GRML.

An example of "web applications" in GRML 1.0 is below.

<GRML>
<a class=navi_13 name=AUCT type=title>Auctions</>
<a class=navi_13 name=JOBS type=title>Job Search</>
<a class=navi_13 name=SRCH type=title>Search Engine</>
<a class=navi_13 name=AUCT type=location>127.0.0.1/auc.asp?search2=</>
<a class=navi_13 name=JOBS type=location>127.0.0.1/jobs.asp?search2=</>
<a class=navi_13 name=SRCH type=location>127.0.0.1/parse.asp?search2=</>

<a class=hist_13 type=item>127.0.0.1/startup.asp</>
<a class=hist_13 type=item>127.0.0.1/over.asp</>
</GRML>

GRML was designed to be used by many different browsers. It was not possible to test this capability since only one GRML web browser existed. As other browsers were created and the markup language developed, GRML moved to version 1.1 in the first 4 months of 2003.

The next major upgrade to GRML occurred when resolving the problem of "web applications."

GRML 1.2.

One limitation of the "web application" approach was the need for a separate adapter for each HTML web page. Since there are billions of HTML web pages, it was impractical to create a "web application" for each one. Another problem was keeping the "web application" updated if a web page changed. If supporting a multitude of web pages is difficult, trying to keep them updated was practically impossible. GRML needed modification.

During March, 2004, everything related to "web applications" was removed from GRML. This allowed the markup language to focus on form input controls, columns, and results. With the "web applications" removed, it was now possible to read any HTML web page using more generic and consistent web adapters.

An example of GRML 1.2 follows.

<GRML>
<a class=edit_13 name=url1 type=title>Enter URL:</>
<a class=edit_13 name=url1 type=location>http://127.0.0.1/links.asp</>

<a class=column_13 type=item>Title</>
<a class=column_13 type=item>Result</>

<a control=result_13 type=item>RIAA, MPAA Ask High Court To Review
<a control=result_13 type=item>It's official: Hollywood studios and record companies on Friday asked the United States Supreme Court to overturn a controversial series of recent court decisions that have kept file-swapping software legal."</>

<a control=result_13 type=link>http://127.0.0.1/article.pl?sid=04/10/11/1846208</>
</GRML>

GRML 1.2 was the last of the 1.x releases of GRML. During the next six months of use, it set the stage for another change in the syntax of the markup language.

GRML 2.0.

The inital versions of GRML worked well on the web and the local filesystem. It allowed the development of many different web browsers that use its form and column/result approach. Other than removing "web applications", the syntax for GRML did not change much from the 1.0 to 1.2 versions. Issues of speed, control, and reliability were not considered. However, this changed with GRML 2.0.

This version of GRML was designed to create small file sizes, handle file and web page content using fewer browser resources, and allow more options for arranging file and web page content. The old syntax was completely abandoned in favor of smaller tags and more specific tag keywords. The sample GRML from version 1.2 looks as follows in 2.0.

<GRML>
<edit url1>
<location>Enter URL:
<title>http://127.0.0.1/links.asp
</edit>

<column>
<Title>
<Description>
<Link>
</column>

<result>
<Title>RIAA, MPAA Ask High Court To Review

<Description>The Hobo writes "It's official: Hollywood studios and record companies on Friday asked the United States Supreme Court to overturn a controversial series of recent court decisions that have kept file-swapping software legal."

<link>http://127.0.0.1/article.pl?sid=04/10/11/1846208
</result>
</GRML>

Using the GRML 2.0 syntax, tags drop to a fraction of their size from version 1.2. In addition, there are no problems with handling very large text strings (greater than 1024 characters). In version 1.2, the content sometimes was ignored because of its size. This often disrupted the display of all remaining content in the file or web page. This problem was solved in version 2.0, because each result item specifies a column.

It is possible to organize columns and results using version 2.0 that is not possible with version 1.2. Results are ordered according to the column display order. This is set by listing the top column as first, and all subsequent columns in order until the bottom column is last. If there are 5 columns, and the 3rd should be displayed first, place it at the top of the column order.

A result item only displays if the column it specifies appears in the column order. If it is necessary to display only one column of results, only that column exists in the column order. Or, specify any number of columns and only those results are displayed. This was not possible with previous versions of GRML.

Conclusion.

GRML has moved through many versions since its first release, January 2003. It has moved from a "web application" markup language to a web page markup language. With version 2.0, it has the smallest, fastest, and most flexible syntax of any version released.

With its support for form input controls, columns, and results, GRML is able to support many web browsers by organizing its content for use regardless of how the content is displayed.

Learn more about GRML and GRML web browsers.

0 Comments:

Post a Comment

<< Home