Working with HTML Files

Hypertext Markup Language (HTML) is the standard language for presenting information over the Internet. HTML files are text files that include the information that users will see, along with tags that specify formatting information about how the information will be presented. For example, you can specify tags for bold, italic, and underlined text, indentation, and many other types of formatting. Internet browsers interpret information within HTML files, format them as pages, and display them to users. You can use HTML to create and view pages on the World Wide Web or an internal Web (intranet).

In addition to supporting text and graphics, HTML also supports tabular information. Using HTML table data tags (<TD> and </TD>), you can embed one or more tables within an HTML file. You can access the tabular data in HTML files from your Microsoft Access database by using the Text installable ISAM to either link the table in the HTML file or open it directly. The procedures for doing so are very similar to the procedures for linking and opening tables from other external data sources.

Linking HTML Files

To link data from an HTML file to your Microsoft Access database, use the CurrentDb function to return an object variable that represents the current Microsoft Access database. Then create a TableDef object and set the connection string and source table name: set the TableDef object’s Connect property to specify the source database type and the full path to the HTML file, and set its SourceTableName property to the name of the table in the HTML file. The source database type is HTML Import. The path is determined by the Uniform Resource Locator (URL) address of the HTML file on the Internet or an intranet.

The following code links a table from an HTML table to a Microsoft Access database:

Sub LinkHTML()
	Dim dbs As Database
	Dim tdfHTML As TableDef
	Dim rstSales As Recordset

	' Open the Microsoft Access database.
	Set dbs = CurrentDb

	' Create a TableDef object.
	Set tdfHTML = dbs.CreateTableDef ("Linked HTML Table")

	' Set the connection string to specify the source database type and
	' the full path to the file that contains the table you want to link.
	tdfHTML.Connect = "HTML Import;" _
		& "DATABASE=http://www.usa.myserver.com/files/mypage.html"

	' Set the SourceTableName property to the name of the table you want to access.
	tdfHTML.SourceTableName = "Q1SalesData"

	' Append the TableDef object to the TableDefs collection to create a link.
	dbs.TableDefs.Append tdfHTML

	' Create a Recordset object from the linked HTML table.
	Set rstSales = dbs.OpenRecordset("Linked HTML Table")
End Sub

Tip   Instead of specifying the connection string and source table name by setting properties of the TableDef object, you can specify them by using the connect and source arguments of the CreateTableDef method. You can also use the Microsoft Access user interface to link external data. For more information on using the CreateTableDef method or the user interface to link external data, see “Linking External Tables” earlier in this chapter.

Opening HTML Files

To open a table in an HTML file directly from your Microsoft Access database, create a Database object and set the arguments of the OpenDatabase method to specify the full path to the HTML file, whether to open the HTML file exclusively, whether to open it with read/write or read-only permissions, and the source database type. The path is the Uniform Resource Locator (URL) address of the HTML file on the Internet or an intranet. The source database type is HTML Import. Specify the name of the HTML table as the source argument of the OpenRecordset method.

The following example opens an HTML directly from a Microsoft Access database and then creates a Recordset object from the HTML file.

Sub OpenHTML()
	Dim dbs As Database
	Dim rstSales As Recordset

	' Open the HTML file without exclusive access and with read/write permissions. 
	Set dbs = OpenDatabase("http://www.usa.myserver.com/files/mypage.html", _
		False, False, "HTML Import;")

	' Create a Recordset object from the Q1SalesData table.
	Set rstSales = dbs.OpenRecordset("Q1SalesData")
End Sub

Important When you open an HTML file, the Text installable ISAM uses File Transfer Protocol (FTP) to copy the source file from the Internet server to the workstation requesting the data. The Text installable ISAM then accesses this local copy of the file, not the file on the server. This means that the records you access through code are only a snapshot of the actual records in the file on the server. Changes made to the server copy of the file aren’t available to your application until you reopen or relink the HTML file. The copy of the file on the Internet server is never modified by changes that you make to the local copy.

The local copy of the HTML file is read-only. You cannot modify tabular data in an HTML file. However, you can open tabular HTML information as a recordset, save the contents of the recordset in a new table in your local Microsoft Access database, and then export the table as an HTML file. When you export, you create an HTML file that includes the minimum necessary HTML header information, plus the contents of the table.

Table Naming Conventions

As the previous example shows, you must reference a table inside an HTML file with a name; for example, Q1SalesData. If a table in an HTML file has a caption, the caption is the table name. If a table doesn’t have a caption, and it’s the only table in the file, use the title of the HTML document to reference the table. If more than one table exists in the HTML file, and none of these tables have captions, you must reference them sequentially in code as Table1, Table2, and so on. The Text installable ISAM interprets these as “the first unnamed table in the file,” “the second unnamed table in the file,” and so on.

Field Names and Data Types

As the Text installable ISAM reads the data in the HTML file, it chooses the data type for each field in the table by interpreting the contents of the cells. For example, if most of the values in a field are numeric, the Text installable ISAM chooses the Long or Double data type, depending on whether the majority of the numbers are integer or floating-point values. Similarly, if the majority of the values in a field are text, the Text installable ISAM chooses the Text data type (with a maximum field size of 255 characters).

Important In fields that contain a combination of data types, the Text installable ISAM assigns the Null value to cells of the minority data type. For example, if a field of data in an HTML table contains 90 integers and 10 strings, the Text installable ISAM chooses the Long data type for the field and assigns the strings the Null value. If you then use the OpenRecordset method on the HTML table, your code will encounter 90 integers and 10 Null values in that field.

If you don’t want the Text installable ISAM to choose data types, you can specify data types for fields by creating a schema information file, which contains information about each field in the table. Store your Schema.ini file in the same folder as the local HTML file you are linking or opening.

See Also   For more information on schema information files, see “Creating a Schema Information File” earlier in this chapter.

The Text installable ISAM assumes that the first row of the table doesn’t contain field names. You can override this by using the HDR keyword in the connection string. To indicate that the first row of the table contains field names, include "HDR=Yes" in the connection string.

Although the HTML specification supports table header tags (<TH> and </TH>), the installable ISAM doesn’t assume that the data within these tags specifies field names. Therefore, the installable ISAM treats all data within the <TH> and </TH> tags as normal table data. You can embed <TH> tags anywhere in the table, not just in the first row.

Note   The Text installable ISAM doesn’t support formatting tags that can be placed within the table data tags (<TD> and </TD>). These include tags such as ID, LANG, BORDER, WIDTH, FLOW, ALIGN, and so on. If the installable ISAM encounters these tags, it ignores them.

Handling Embedded Images and Hyperlinks

With HTML, you can embed graphics within table cells. References to graphics are identified through the HTML <IMG SRC ... > tag. If a cell contains just this tag and no other text, the installable ISAM interprets the cell as empty, because the Text installable ISAM can’t read graphics. If the cell contains a graphic and some other text, the installable ISAM will read the text, but not the graphic.

When the reference to the graphic is a hyperlink tag of the form <A HREF=".......">TEXT</A>, the text between the <A> tag and the </A> tag is displayed, and the Text installable ISAM ignores the embedded hyperlink information. If the text embedded in the hyperlink has fewer than 255 characters, the installable ISAM creates a Text field. If the text has more than 255 characters, the installable ISAM creates a Memo field.

If the cell contains only a hyperlink and no other embedded text, the Text installable ISAM reads the caption and the URL associated with the hyperlink.

Handling Embedded Lists and Tables

Some HTML tables may have lists embedded within one or more table cells. In this case, the Text installable ISAM inserts a carriage return and line feed after each list item. It doesn’t create separate cells for the list items. If the list of items separated by carriage returns has fewer than 255 characters, the installable ISAM creates a Text field; if the list has more than 255 characters, the installable ISAM creates a Memo field.

The installable ISAM interprets a list that isn’t embedded in a table as a single-field table.

If a table is embedded within another table, the tables are treated as separate tables. The cell that contains the embedded table is set to the Null value, and the embedded table is treated as a separate table from which you can create a separate recordset.