Drew DeBruyne
Microsoft Corporation
April 27, 1998
Contents
Introduction
Constructing a Site Vocabulary
Integrating Knowledge Manager with Site
Server Content Management
Building Content Sources and Search
Catalogs
Integrating Knowledge Manager with
Exchange Public Folders
Integrating Knowledge Manager with Site
Server Content Deployment
Return to Site Server 3.0 Overview page
Editor's Note: Many examples used in this article are contained in the Site Server 3.0 documentation. To get the most out of this collection of tutorials, download Site Server 3.0 and follow along as you read.
Site Server Knowledge Manager, a Web application built on top Site Server's Push, Personalization, and Search features, provides a great knowledge management solution that you can tailor to your organization's needs. From the start, Knowledge Manager enables members of an organization to browse heterogeneous information sources through one interface, save frequently used searches in personal "knowledge briefs," share knowledge briefs with other members in the organization, and have knowledge brief updates delivered on a daily basis by e-mail or via a channel. Knowledge Manager is designed to be the one place people go for information.
As an administrator, there are numerous ways you can maximize the effectiveness of Knowledge Manager for your organization. This document contains a collection of strategies, how-to instructions, and tips for effectively deploying Knowledge Manager on your organization's Web site. This article describes how to do the following:
Setting up a good site vocabulary, which is the set of categories through which your users can browse for information, is essential for Knowledge Manager to be effective. The first section will help you create a good site vocabulary. The second section will help you configure Knowledge Manager to view information published to your site through Site Server Content Management. The third section describes how you can divide the information that you present to your users through Knowledge Manager into multiple content sources for more effective searching. The fourth section describes some advanced techniques for integrating Knowledge Manager with Exchange Public Folders. Finally, the last section will describe how to replicate content from one location to another and display the information in Knowledge Manager.
Site Server 3.0 exposes a powerful tool for organizing content on your intranet site: the site vocabulary. The site vocabulary provides a centrally managed taxonomy of terms that can be applied to content as metadata. Because the site vocabulary is stored in the Site Server membership directory, the administrator must use the Membership Directory Manager to create and maintain the site vocabulary. Then, when users look for information on your site, they can simply browse through the site vocabulary terms using the Knowledge Manager Web application. Each vocabulary term becomes a "category" in Knowledge Manager in which your users can find information.
Along with content sources, a fundamental way to organize information in Knowledge Manager is by constructing a good site vocabulary. A good site vocabulary will have an intuitive organizational structure, will contain clues for your users about where information can be found, and will be structured so that you can easily add terms without disrupting your users. Indeed, putting some thought into your site vocabulary early on will ensure that your Knowledge Manager deployment is useful to your users.
If you have not done so already, it may be useful for you to look at the Knowledge Manager interface. After installing Site Server 3.0 and following the Knowledge Manager configuration steps listed in the Site Server documentation, use a Web browser to open http://your_machine/siteserver/knowledge, where "your_machine" is the name of the machine on which Site Server is installed. On the opening page, look at the category browser tree control. The hierarchy shown in the tree control is the site vocabulary. This section provides tips and tricks for building a good site vocabulary for use with Knowledge Manager.
Before moving forward, it is important to present and define some terms that will be used throughout this section:
The site vocabulary that you create for Knowledge Manager depends heavily on how you want to organize the content that your site presents. Dividing content into well-organized subjects is important because it makes browsing and searching the content easier for your users. For example, let's say you work at an automobile company and use Knowledge Manager in your workgroup to present information about internal projects. You may want to set up a site vocabulary like this:
Projects
Cars
Project Flame
Project Zoom
Project Screech
Trucks
Project Big
Project Bigger
Project Huge
In this site vocabulary, information is organized by project. On the other hand, you may want to organize the same information by job type, depending on the structure of your organization. For example, you might have the following vocabulary:
Projects
Sales Force
Datasheets
Project Flame
Project Zoom
Project Screech
Engineering
Specifications
Project Big
Project Bigger
Project Huge
Keep in mind that your users will find content in a category only if:
The key point here is that you should organize your site vocabulary -- and thus your content -- in a way that makes sense for your users, the people who will be browsing for information. Keep in mind that your users might want to save a particular category into their private knowledge briefs and then have category updates delivered to them. You want to create categories that would allow individual users to have the most appropriate content delivered to them if they choose to subscribe to a category.
The steps for maintaining and augmenting the site vocabulary are explained in the Site Server documentation, but it is useful to walk through the steps here. Here's how to create two simple site vocabulary terms:
In some cases, it will make sense to have a single site vocabulary for every content source you have defined. For example, suppose you are employing Knowledge Manager in your workgroup to keep track of emerging technologies in the computer industry. You have configured a separate content source for each of a number of technological think tanks, university engineering departments, and competitors' Web sites. Because all of the information across all the content sources relates to emerging technologies, using a single site vocabulary for all the content sources is appropriate.
In other cases, however, you will find that the information in each of your content sources is sufficiently different to justify having a separate vocabulary for each one. (Note: as will be described later, Knowledge Manager uses a single site vocabulary internally, but you can make it seem like there are multiple vocabularies in use.) For example, say you have a content source that contains classified ads posted on your intranet, and another that contains human resources policy documents. A vocabulary term such as "TVs and VCRs," which makes sense for classified ads, does not make sense for human resources information. For your users to see "TVs and VCRs" while they're browsing the human resources content source would be very confusing.
In cases such as this, where the information in one content source is very different from the information in others, you will want to consider maintaining a separate vocabulary for each. This gives you the ability to present disparate information on your Knowledge Manager site while still giving your users a focused searching and browsing experience. In the example given above, the human resources content source would have a different vocabulary (with terms like "guidelines" and "jobs available") than the classified ads content source (which would have terms like "TVs and VCRs" and "Rocking Chairs").
Internally, Knowledge Manager can use only one site vocabulary. However, by anchoring your content sources to specific terms in your site vocabulary, you can give the appearance of multiple vocabularies to your users. For example, suppose you have separate content sources for HR documents and classified ads, as above. For the HR documents, you want your users to see this vocabulary:
Guidelines
Work Regulations
Performance Reviews
Jobs Available
Engineering
Management
Sales
And for the classified ads, you want your users to see this vocabulary:
Electronics
TVs and VCRs
Computers
Stereos
Furniture
Couches
Beds
Chairs
Rentals
Vacation
Home
Apartments
You would still need to create one central site vocabulary by using the standard Site Server Web-based vocabulary editor. You would lay out the vocabulary by combining the vocabularies for each content source as follows:
Human Resources
Guidelines
...
Classified Ads
Electronics
Furniture
...
(The ellipses indicate additional categories that are not shown here.)
In this example, two top-level nodes have been added: one for "Human Resources" and another for "Classified Ads."
The next step would be to anchor each content source to the relevant term in the single site vocabulary -- the classifieds content source would be anchored to "Classified Ads," and the HR content source to "Human Resources." To anchor a content source, you must perform the following steps:
Now, to see that your content source was anchored correctly, go to Knowledge Manager using a Web browser and select the content source that you just anchored from the "Search for documents in" drop-down list. You should see that the vocabulary associated with your content source now starts at the vocabulary term anchor, and not at the root node of the entire site vocabulary.
With Site Server Content Management, you can enable your users to post content directly to a site. They can simply go to a Web page and complete a simple document submission process. You can make it easy to navigate through published documents by cataloging them with Site Server Search and then creating a suitable site vocabulary for use in Knowledge Manager. This section points out some effective ways to create a site vocabulary for documents published through Site Server Content Management.
In Content Management, all published documents are kept in content stores. Thus, a content store is simply a container for published documents. Each content store can contain multiple content types. A content type is just a list of information that the user must fill out when publishing a document. For example, the Content Management sample site, CMSample, contains eight different content types by default. Each content type contains a list of content attributes -- or content metadata -- that the user must fill out when submitting content. The default "White papers" content type, for instance, contains such attributes as Author, Editor, Topic, and Abstract. When Site Server Search catalogs published documents, it picks up all the metadata that the user filled out and makes it searchable. This makes it possible to, for example, find all documents whose Author field contains "Joe." Overall, then, a content store contains content types that contain content attributes. More information on these concepts can be found in the Content Management section of the Site Server documentation.
After setting up your own content store and content types per the instructions in the Site Server documentation, you may wonder how you can make it easy for your users to find documents once they are published. One simple way is to create a site vocabulary that groups published content into categories that are meaningful to your users. Furthermore, by presenting it through Knowledge Manager, you also enable your users to directly search the published content. So, to make published content available in Knowledge Manager, you need to do two things: you must catalog the content using Site Server Search and you must create a site vocabulary that organizes the content for your users. These two steps are described in detail in this section.
[Note that the Publishing features of Site Server must be installed for the following steps to work. To install Site Server Publishing, use the setup program on the Site Server CD. By default, Site Server Publishing is included in the "Typical" installation. You can check to see that Publishing is installed by going to http://your_machine/siteserver/admin. If the link to "Publishing" is live, Publishing is installed.]
Because Knowledge Manager displays information kept in Site Server Search catalogs, you must create a Search catalog definition for the documents that your users will submit for publication. After creating the catalog, it will be necessary to create a Content Source, which provides meta-information about the catalog to Knowledge Manager.
As a spot check, you may want to verify that the content source shows up in the Knowledge Manager interface. To do this, launch your Web browser, go to Knowledge Manager (http://your_machine/siteserver/knowledge), and click the "Search for all content in" drop-down list box. You should see your new content source listed there.
One of the key benefits of using Site Server Content Management is that it enables you to collect arbitrary metadata for each document that a user publishes to your site. Because this metadata is cataloged by Site Server Search, you can then create a site vocabulary that helps your users browse through published documents by way of the document metadata. Two examples help illustrate this important point:
Content by Type
White papers
Specifications
Test Plans
Content by Project
Fusion Car
Fission Car
Rocking Chair
The great part about a structure like this is that content will show up in multiple categories. For example, a white paper about the Fusion Car project would show up in both "White papers" and "Fusion Car." Because there are multiple paths to the same information, your users to will find it easier to locate documents because each user may have her own mental model for where a particular document will be found.
Now that you have created a Search catalog and content source for your published content, and thought a little about how to structure a site vocabulary that is appropriate to the content, the next step is to actually create the site vocabulary. To do this, you should use the Web-based Membership Directory Manager as you did in the previous section on creating site vocabulary. Each category in your vocabulary needs to be associated somehow with documents published to your content store, so that documents show up in the appropriate category. While there is not any implicit link between a category in the site vocabulary and published documents, an explicit link can be made through the mechanism of the associated category query. This is simply a search query that executes whenever someone browses to that category. Because it can be expressed with all the power of the standard Site Server Search query syntax, the associated query can zero in on particular metadata fields such as "content type" or "author."
So, suppose you wanted to enable the first example given above, where your users can browse through published content by content type. Each category would have an associated query that searches for documents tagged with the appropriate content type. More specifically, this would require that the associated query for a given category would be "@Meta_ContentType ContentType", where ContentType is the name you have given to the content type for that category. The "@Meta_ContentType" part of the query means you are limiting your search to the "Meta_ContentType" metadata field. (Note that Site Server Content Management automatically tags published documents with their content type in the ContentType meta field.)
Enabling the second example would be just as easy. Rather than limiting the associated category queries to the "ContentType" field, though, you would use the "project" field. For instance, the "Fusion Car" category would have an associated category query like the following: "@meta_project Fusion Car." Note again the use of standard Site Server Search query syntax. Here we are just searching for documents that have "Fusion Car" in the project field. All metadata fields in documents published through Content Management are cataloged by Site Server Search, which implies that you can utilize those fields in Knowledge Manager to create different "slices" through your published content. The metadata fields are searchable using the "meta_" syntax. For example, if you have a metadata field in your content type called "foo," you can limit searches to that field by using "@meta_foo string", where string is the text for which you are searching. To see which attributes are defined for your content type, you can use Site Server Web-based administration as follows:
For more information on content type metadata fields and the way they are searched, consult your Site Server Search and Content Management documentation.
For a published document to show up in a category in Knowledge Manager, it must have already been cataloged by Site Server Search. This highlights an important point: because content will not show up in Knowledge Manager until it has been cataloged, you need to configure your Search catalog to run on a schedule. For example, if you configure your catalog to be rebuilt every hour, content will show up in a category a maximum of one hour after it is submitted. If your site contains a lot of content, it may be more appropriate to use a longer catalog refresh interval because cataloging is a computer resource-intensive process. If you notice that your Search machine is constantly cataloging information -- that is, by the time it finishes one cataloging run it is already time to start another -- you should increase the catalog rebuild interval.
There is one last issue of which you should be aware. Content Management tags different file formats in different ways. For HTML documents, Content Management simply writes the metadata into the file as HTML Meta tags. For OLE documents such as Microsoft Word or Microsoft Excel documents, Content Management writes the metadata into the OLE property stream. Because the metadata is written into the HTMLMetaPropertySet of the OLE document, it is still searchable using the "@meta_" syntax. For non-OLE, non-HTML documents such as executables, bitmaps, JPEGs, and so on, Content Management actually creates a "stub" file that contains the metadata. The stub file has a .stub file extension, is formatted in HTML, and contains a Web browser redirect to the actual published document. Thus, when a user does a search on metadata with a non-OLE, non-HTML document published through Content Management, the stub file is returned in the results list. When the user clicks on that particular result, the stub file is opened by the browser, but the browser is immediately redirected to the actual published file. Now, here's the rub: For Site Server Search to catalog the stub files created by Content Management, the .stub file extension must be added to the catalog definition's crawled file types. To add .stub files to the crawled file types for your catalogs, do the following:
After your next crawl, your Search catalog will contain the stub files that Content Management produces.
To summarize, you can make your published documents browsable in Knowledge Manager through a carefully constructed site vocabulary. Furthermore, through the site vocabulary, you can make the same document available in multiple categories, which may make it easier for your users to find information. There are two main steps to integrating Content Management documents with Knowledge Manager:
In Site Server 3.0, sources of information are represented by what's called a content source. A content source describes the underlying information source, which can be open database connectivity (ODBC) databases, Index Server indices, or Site Server Search catalogs. It adds meta-information about the content that is not contained in the underlying information source. For example, the content sources that Knowledge Manager uses to describe Site Server Search catalogs contain additional information about site vocabulary anchors.
Searchable information can be divided into multiple content sources in Knowledge Manager. For example, you may want to configure one content source so that it contains information from sources external to your company, like Internet news sites. You may want to configure a separate content source that contains internal information. Then, when users come to your Knowledge Manager site, they will be able to narrow their searches to the type of information they want by simply choosing the right content source. This section contains suggestions for building effective content sources.
All searchable information available in Knowledge Manager comes from Site Server Search catalogs. That is, Knowledge Manager can only use content sources based on Search catalogs. As such, the catalogs that you build with Search define the entire space of information that Knowledge Manager users can access. For this reason, it is important that you carefully plan the Search catalogs so that people can easily find the information they need.
If your site is small and you do not plan on offering a multitude of heterogeneous information to your users, you may consider having just one Search catalog. This simplifies the search experience for your users because they do not have to decide which catalog to search; they always search the single catalog that you have defined. In this scenario, you would add all crawl start addresses to the same catalog definition in Site Server Search, and then create one content source that makes the Search catalog available in Knowledge Manager.
If you are running a larger Knowledge Manager site, you will probably want to segment the information your site offers into multiple Search catalogs. There are numerous ways to divide information into multiple Search catalogs:
In Site Server Search, the process of building a catalog is separate from the process of searching a catalog. These processes, in fact, can occur on different machines altogether. If you are running a large Knowledge Manager site in which you have multiple Search catalogs and periodic re-crawls, you may want to consider the following to optimize the performance of your site:
Site Server Search can index Microsoft Exchange Public Folders. Therefore, because Knowledge Manager is, in part, built on top of Search, you can give your users the ability to search Public Folders (PFs) from Knowledge Manager. This makes the following fictitious scenarios possible (The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred):
There is a common thread to these scenarios: the administrator has configured Site Server Search to catalog Exchange Public Folders, and Knowledge Manager to use the resulting catalogs. To enable similar scenarios on your own intranet, you will need to ensure that some key pieces are in place:
Once you have worked through the issues above, you should be able to set up a new Search catalog definition for the public folder messages as usual. Your Site Server Search documentation describes the full process for setting up a catalog for public folders.
After creating the catalog, just define a content source using the new catalog as usual, and the public folder that you cataloged will then be searchable in Knowledge Manager.
Companies will often make arrangements with electronic content providers to have content delivered periodically. Many newsfeeds work in this manner. For example, a company might arrange for a business news source to upload a compressed archive of all financial stories each business day. Typically, the electronic content company will upload the content via FTP. Alternatively, the content company may just make the content available via FTP on their own site without actually uploading it to the customer. In either case, it is the customer's responsibility to retrieve the information. Once it is retrieved and uncompressed, companies will typically want to make the information searchable by employees in Web applications such as Knowledge Manager. This entire process can be automated by coupling Knowledge Manager with Site Server Content Deployment (CD). Though the newsfeed scenario is used as the driving example in this section, the information contained herein applies to any situation where you are replicating content from location A to location B, and want to provide browsing and searching of the content in location B.
Information typically gets from the content provider to your users in four main phases:
In this cycle, the key phases to automate are phases 2 and 3; the content provider is responsible for phase 1, and phase 4 happens "automatically" when the Search catalog based on the new content has been created. The following sections describe exactly how to automate phases 2 and 3. Note that the Publishing features of Site Server must be installed for the following steps to work. To install Site Server Publishing, use the setup program on the Site Server CD.
This section describes how to automatically copy content from your FTP site to a location where it can be cataloged by Site Server Search.
Rather than supplying an entire directory hierarchy of files, your provider may supply the content in a one-file archive designed to be uncompressed after retrieval. In this case, special care must be taken to correctly configure Content Deployment and Search to work together. Later in this example, you will configure Search so that it updates its catalog every time a new file is replicated. If you are only replicating one file -- that is, the content archive -- only one notification will be sent to Search, when in reality Search needs to get notifications for all the new files after the archive has been decompressed. To set up Content Deployment in a situation like this, do the following:
cd c:\newsfeed pkunzip -o -d content.zip
Once the incoming content has been received (and decompressed, if necessary), it can be cataloged by Site Server Search. Creating a Search catalog (and a content source to wrap it) for the incoming content makes it searchable in Knowledge Manager. Additionally, you may want to configure the Search catalog to be rebuilt on a schedule. Say, for example, you retrieve and decompress the content every night at 11 p.m. using Content Deployment. You know that the retrieval process typically takes 15 minutes or so, and that it takes just a few extra minutes to decompress the content. Based on this knowledge, you could set up your Search catalog to crawl your content every night at midnight, after the new content has been received. In this section, we explain how to set up the Search catalog and corresponding content source correctly.
- In the ensuing wizard, specify the kind of crawl you would like. Note that if you select a Web link crawl, the content that you have replicated should have a page that links to all the remaining content; without this page, there will be no way to crawl all the content. Click Next.
- Specify the Start address for the crawl as well as any crawl depth limits. Click Finish.
You have now configured your system to automatically retrieve new content, catalog it, and then present it to your users in Knowledge Manager. You may now want to test your system.
A good way to ensure correct configuration is to manually start your Content Management project. To do this, start the Site Server Administration console as you did above and locate your project in the Publishing subtree. Right-click your project and select Start. In the next dialog, click Start Replication.
After doing this, two indicators will be particularly helpful in monitoring the ongoing process: the status field in the Projects container under Publishing, and the Status field corresponding to your Search catalog under Catalog Build Server. After the process is complete, you can use the Replication Reports in Publishing and the Gatherer Logs in Search to check for any errors. If everything went well, you can manually start the catalog build process to ensure that the replicated content gets cataloged correctly. You can use Gatherer Logs in MMC- or Web-based administration to confirm that the catalog process went smoothly. If the catalog process proceeded without a hitch, and you have set up a content source for your new Search catalog per the instructions above, you should now be able to search and browse through the content in Knowledge Manager.