PaXQual: a silly language for analyzing and rewriting Web Pages

23. February 2007 12:42 by CarlosLoria in General  //  Tags:   //   Comments (0)

Let us gently start meeting PaXQuAL, a prototype language that we are shaping out and adjusting for the emergent purpose of symbolically expressing analysis and transformation simple tasks around Web Pages, all this circumscribed in the context of refactoring, as we have been done in previous posts.

And as we also have already done days before, sometimes we just want to digress a little bit from practical and realistic issues, just to expose some theoretical ideas we find somehow interesting (and probably nobody else). I just can promise that no Greek letter will be used, at all (in part because that Font is not allowed by the publishing tool, I confess).

Anybody (if any) still reading this post is allowed to right away express the nowadays classical: “Yet another language? How many do we count up by now?” Claim is justified because everybody knows that every problem in Computer Science is solved proposing a (the) new one. Now it is my turn, why not. It’s a free world. For the interested reader a technical paper will be hopefully available with further details at this site, soon.

Actually PaXQuAL (Path based Transformation and Querying Attribute Language is his real name; is pronounced Pascual) is not that new and different from many other languages, developed for real researchers at the academia and industry. We wanted to imagine a language for querying and transforming structured data (eg. XML, HTML) and from that sort we have many available as we know. What new material can be proposed at this field for someone like us? Actually, what we really want is to operationally relate CSS with some special sort of theoretical weird artifact we had been exploring some years ago that we may dare to call Object-Oriented Rewrite Systems or Term-rewriting Systems (TRS) with extra variables and state (as a result of some work developed by and joint with actual researchers some years ago).  Considering TRS in this case natural because CSS is indeed a kind of them and that field has a rich offering of tools for useful automated reasoning. And we can find them useful here, we guess.

The question that pushed us back to the old days is: given an interesting, so simple and practical language, like CSS is, what kind of object-oriented rewriting logic can be used to describe its operational semantics. You may not believe it but this is a very important issue if we are interested in reasoning about CSS and HTML for refactoring purposes among others. And we are, don’t we?

CSS is rule-based, includes path-based pattern matching and is feature (semantically attributed) equipped, which all together yields a nice combination. CSS can be considered “destructive” because it allows adding or changing (styling) only attributes of tags where remaining “proper content” does not result destructively rewritten. It is not generative, by such a reason (in contrast to XSLT and XQUERY). And that leads to an interesting paradigm. For instance, following is a typical simple CSS rule for setting some properties of every tag of the kind body.

body {

     font-family: Arial, Helvetica, sans-serif;

     background-color: #423f43;

     text-align: center;

}

Of course more explicit rules like this one can be declared but further, an inheritance (cascading) mechanism implicitly allows that attributes may be pushed down or synthesized as we know from attribute grammars.

That all is nice but we feel we had to be original and want to propose the crazy idea of using something similar to CSS for purposes beyond setting style attributes, for instance for expressing classification rules allowing to recognize patterns like the ones we explained in previous posts. For instance, that a table is actually a sort of layout object, navigation bar or a menu, among others. Hence, we would have a human-readable, querying and transformation language for Web Pages, a sort of CSS superset (keeping CSS as a metaphor what we think might be a good idea):

Let us by now just expose some examples (where we advert concrete syntax in PaXQuAaL is not yet definitive). For instance, we may want to eliminate the bgcolor attribute of any table having it because is considered deprecated in XHTML. We use symbol “-:“ for denoting execution of the query/transformation as in Prolog.

 :- table[bgcolor]{bgcolor:null;}

We may want to add a special semantic attribute to every table directly hanging from a body, indicating it may be a surface object for some latter processing. We first must statically declarate a kind of table, “sTable”, accepting a surface attribute because we are attempting to use static typing as much as possible (“Yes I am still a typeoholic”)

@:- sTable::table[surface:boolean;]{surface:false}

Symbol “@:-” is like “:-” but operating at the terminological level. And then we have the rule for classifying any table instance hanging from the body tag, directly:

:- body sTable{surface:true;}

Many more "interesting" issues and features still need to be introduced; we will do that in forthcoming post. Hence, stay tuned.