I'm not talking about the XML serialization (the <node/> business most people are familiar with) but the standardized infoset data structure, with the end goal of making every kind of data file on a Unix system accessible through a standard query langauge.
In other words, I want to be able to do something like this:
as just a simplistic example. Instead of needing to write different scripts to query the myriad different datafile formats, the same query tool and language would be able to pull data from all of them and output the information in any requested format: tab or comma separated variable for flat-table results, a binary infoset to be piped to XML-enabled tools, or an XML serialization.$ query --tsv login, uid, name[last='Smith'] < /etc/passwd > mysmiths.tsv
This requires a fast backend that can index fields and cache query search results on unmodified files. This may require modifications to the file system and/or kernel. I'd prefer the system to keep both binary and text versions of the databases so you could update things the old fashioned way; when you change one, the other updates. If you try to write to the text version of a data file and it's not well formed, the system should complain at you and tell you where the parse error is.
A further advantage of XMLifying data files if that a file's data structure schema can define all the possible settings for certain things. Instead of working on a blank file and having to dig through a manual of undefined quality to remember what keywords you need, the editor can present all the possible options for a setting and descriptions thereof.
Thanks to quercus for setting me off on this track.
Addendum: It turns out that I've mentioned something along these lines before. The query example I gave then was "cat /etc/passwd[@uid=0]@login" to get the usernames of all root users, but it's pretty much the same basic idea.