| Warrior Tang ( @ 2008-06-20 08:39:00 |
| Current mood: | |
| Current music: | Goo Goo Dolls - Iris |
Sitemaps
The web grew a new feature while I wasn't looking. Sitemaps are an XML directory of the files on a website that search engine crawlers can use to find information on the site. Since sitemaps are a Google technology and other search engines have hopped on the bandwagon, anyone interested in search engine optimization will want them.
I seem to recall that way back in the old days, there was a standard which did much the same as sitemaps. It did not require any special new keywords or data structures. It worked in plain HTML and you could integrate it as part of your site. The way it worked is that you would make a hyperlinked HTML index of the information in your site, or at least the public information that you would want visitors (and crawlers) to see. You would then save this index in a file called index.html.
Sitemaps contain a way for you to specify when a web resource was last modified and how frequently it changes. The web server is supposed to handle that with the HTTP Last-Modified: and Expires: headers. Not that any web server makes it easy to say when the pages on your site expire, but standards are already in place. Sitemaps also let you associate metadata with your page such as "priority" numbers to rank the importance of different pages on the site. That sort of thing could be handled in an HTML Meta tag.
All in all, it seems to me that sitemaps shouldn't be necessary. The main problem they solve is that people don't know how to set the HTTP Expires header in their web server and so most people didn't do anything to tell crawlers how often to access their site. On the upside, sitemaps provide a portable standard for any future web server to easily load this information and present it correctly in HTTP.