GEEK to GEEK Microsoft Internet Developer June 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

GEEK

Robert Hess

GEEK Has Microsoft issued any GUI guidelines for Web-based apps similar to the Windows® Interface Guidelines for Software Design? Applications built with HTML and Active Server Pages (ASP) have certain design limitations that prevent them from totally embracing the Windows Interface Guidelines.
GEEK You are obviously correct in noticing that the traditional Windows Interface Guidelines don't translate very well when talking about a Web application.

    Unfortunately, Microsoft currently doesn't have any kind of document that attempts to provide the same level of design guidelines for Web applications. There are numerous reasons for this. Personally, I think one of the great things about the Web is that there aren't such strict style guidelines. It's an extremely flexible and creative environment in which every designer is free to determine what approach works best for them, their content, and their users.

    There is also a significant difference between Windows and the Web. Think about it—there weren't any user interface guidelines for MS-DOS®-based applications; such a document would represent an artificial structure that wasn't based on any existing set of visual elements. MS-DOS didn't contain any sort of toolbox of user interface components. On the other hand, one of the primary benefits of Windows is that it supplies menu bars, scrollbars, listboxes, radio buttons, dialog boxes, and so on. These components expose a rich environment immediately identifiable as Windows.

    The Web is much closer to MS-DOS. Sure, there are some built-in controls, mostly related to forms. But virtually everything else is exposed in such a creative and independent fashion that to try to impose any sort of visual limitations would be too confining for the types of creative and exciting applications that are being designed.

GEEK In the February 1999 geektogeek column, someone asked if there was an easier way to deploy WebClass apps since his ISP did not install Visual Studio® 6.0. The solution was to find one that has all the appropriate Microsoft® apps installed. Does Microsoft provide free Web sites that have all the apps, like Windows NT® Server, Microsoft Internet Information Server (IIS) enabled, Visual Studio server-to-server installed, and so on?
GEEK Alas, no. Microsoft—or perhaps more specifically in this situation, MSN™—does not offer free Web site storage at the current time. I'm not sure which, if any, of the existing free Web storage places are hosting themselves fully with the technologies you list. I'm sure when this letter is published, I'll hear from them.

GEEK The link to more information discussing the IIS DocSummary component in your November 1998 column is no longer accessible, and I cannot find this information anywhere on the Microsoft Web site. Any idea where this has gone?
GEEK What you are looking for is the Summary Info or ASPSumInfo component. This component allows an ASP page to view the DocSumInfo information that Word and other Microsoft Office applications will add to the document files they save. This provides access to such useful information as Author, Title, Description, and so on.

    For some reason the download for it (and several other components) was removed from the IIS portion of the Microsoft Web site, and as of yet I haven't been able to locate anybody who's responsible for getting this put back up somewhere. However, ASPSumInfo and many other handy components come on the CD provided with the book Internet Information Server Resource Kit (Microsoft Press, 1998).

GEEK When I close Internet Explorer in VBScript using window.close, it displays an informational message. How can I close Internet Explorer without this message?
GEEK The message you are seeing, "The Web page you are viewing is trying to close the window. Do you want to close this window?" is there for security. Microsoft was worried that a rogue site might use the ability to close the current window without warning to pull pranks (or worse) on unsuspecting users. A way to get around this message would defeat its purpose.

    Nevertheless, there is a way around it. If the window being closed had been opened by the same domain, then no security alert is posted. Here is code to illustrate this feature.

<html> <body> <script language="vbscript"> width = 170 height = 100 dim sFeatures sFeatures = "left=" & window.screen.availWidth/2 - width/2 & "," sFeatures = sFeatures & "top=" & window.screen.availHeight/2 - height/2 & "," sFeatures = sFeatures & "height=" & height & "," sFeatures = sFeatures & "width=" & width & "," sFeatures = sFeatures & "status=no,toolbar=no,menubar=no,location=no" </script> <input type=button value="window.open" onclick="window.open 'WindowCloser.html', 'whacker', sFeatures"> <input type=button value="window.close" onclick="window.close()"> </body> </html>

Save this to the file WindowCloser.html. If you start out by clicking Close, you will see the problem you mention. If you first click on Open, then click on Close in the window that comes up, you will notice that no warning is brought up.

GEEK I recently received some Microsoft Word-format files to be posted on my Web page. I saved them as HTML in Word 2000. The results look good, but I end up with gigantic files. I am not interested in round-trip capability and would like to edit the pages in FrontPage®. How do I get rid of the extras and obtain a simple HTML file?
GEEK Word is trying hard to format the document just as it was formatted as a .doc file. In the past, HTML didn't provide many fancy formatting capabilities that Word could take advantage of, but with the advent of Cascading Style Sheets, it's possible to get pretty close.

    Word 2000 is also taking advantage of XML so that as much of the original information of the Word document as possible is preserved in the HTML file. Opening up your Web page from within Word 2000 restored possibly all of the formatting and field codes from the original, and it even restored the reviewer's notes in the document, including the one that states:

"screen resolution looks high to me ... might just be my machine though? Be careful on this - we can't afford a poor screenshot in ch1"

      But all of that formatting adds bulk to the page, not only in the definition of the appropriate styles at the top of the page, but also in the assignment of the styles throughout the document.

    For bulk conversion of lots of large complex Word documents, leaving the files as they are is probably not a bad choice. But if you only have a few pages, and you really want to tweak them down, the best way is to manually edit the HTML files directly. Since most (if not all) of the WYSIWYG Web page editors will add superfluous elements, attributes, and comments to the HTML that they export, I think it is a good idea to be familiar enough with raw HTML so that you can manually distill the output pages when appropriate.

    When I have Word documents that I want to convert to minimally formatted HTML files, I save the files as text only, then I manually add just enough HTML to the output text file for the formatting I want. Sometimes this will take a little work depending on how complex the original document was, and how close to the original I want or need the resulting HTML to be.

    In the first pass, I'll add the tags to put in the paragraph breaks, bullets, indents (normally using <blockquote>), and section headings. Then, comparing this to the original document, I'll go back and insert any bolding, italics, or other highlighting that's missing. I might at this time convert some of the document areas to tables for better control over formatting and layout. Once I have the document working well, I'll go back and work in any images in the document so that they are placed properly. Often, I'll even resort to generating thumbnails of the larger images and using the thumbnail in the document, then linking it to the full-size version of the image so the user can get to it if they need it.

    The resulting file is usually not only far smaller than what was automatically generated, but it makes a better Web page.

GEEK I am having problems with Web-crawling spiders getting my email address off our site. Is there a way to hide the address from the spiders but still have it viewable on my Web site?
GEEK You'd need to figure out how the spiders extract the addresses from your site, and how sophisticated they are. It is relatively simple to write a text parser that will look for

<a href="mailto:...">

links in an HTML file, and then pull out the email address.

    There are a couple of ways to defeat this. One, of course, is to not put email addresses on your site. I'll assume that this is not the sort of answer you are looking for.

    Another option would be to put an indirect address on your site. Instead of posting the "real" email address that you use for normal business/personal purposes, open up a Web-based email account on a service like HotMail™, and post that address on your site. When you see a legitimate piece of email come though, you then just manually transfer it to your real account and carry on from there. Not an elegant solution, but one that works.

    Another option, depending on exactly how and to whom you are exposing the email addresses, would be to use the trick that Usenet users have been forced to adopt for similar reasons. When they post messages in newsgroups, their email address can be included in the message headers. Unfortunately, folks were writing applications that sifted through the newsgroups and extracted all the email addresses to build their bulk mail lists. Usenet users soon began mangling the email addresses they put into their profiles, often listing them as something like:

MyName@nospam.domain.com

In their signatures, the users might alert folks that they need to remove the "nospam" to send them email. You might consider doing the same thing on your pages.

    One more option is to require the host browser to deal with script code of some fashion. You can try to outsmart the spider by adding code onto the page that the spider application would not recognize, but that a Web browser would, like this:

<script> document.write ("<a href="); document.write ("mailto:"); document.write ("MyName@HotMail.com"); document.write (" >"); </script>

You might even want to break it up further than that, getting the @ into a line all of its own, just in case the spider is looking for the xxx@yyy.zzz format to identify an email address.

    Then there are multiple variations of this approach. Even trickier would be to enumerate through the document object, locate all anchors on the page, and manually reset the href to be a valid email address. You could perhaps use a slightly mangled email address in the anchor to begin with, and the code on the page would know, simply by looking at this mangled name, what the real name should be.

From the June 1999 issue of Microsoft Internet Developer.