Creating a VBA String-Parsing Utility

by Patrick L. Steiner

If you've ever had to extract data from foreign data sources like comma-delimited text files, you've probably needed a way to parse that data into useable form. VBA includes built-in search functions, such as INSTR(), LIKE, and FIND, but these functions need a specific string or pattern to look for. And even if you find the string you want, you're limited as to what you can do next.

To work around these limitations, you can build your own string-parsing utility. In this article, we'll show you such a utility, called BreakString2Parts. (Trust me—the name will make sense as you read on.)

What string parsers do

String-parsing utilities allow you to break strings into various parts and extract keywords or phrases within the string. Typically, the string will contain one or more special characters called delimiters; these delimiters signify the start or end of the portion of the string you want to extract. Although the comma is used most often, any character can serve as a delimiter—as long as it doesn't appear in the body of the text to be extracted.

One of the most common uses of string parsing is for reading external data files. Many spreadsheets and databases allow you to save data in comma-delimited (or comma separated value) formats. String parsing makes it easy for you to extract the data fields from such a file.

Another way I use string parsing is to extract data or keywords from text or memo fields. As developers, we're often required to interface with programs over which we have no control. In such cases, since I can't add a new field, I make use of one already there—such as a text or memo field. The users are instructed to add delimiters, usually angle brackets (<>) or square brackets ([]), around the data or keywords. By using the string-parsing utility, I can extract the data or keywords I need without modifying the structure of the original system.

Syntax of the string-parsing utility

Our string-parsing utility, BreakString2Parts, takes the following four arguments:

sText—the string to analyze (input)

sDelimiter—the delimiter character (input)

sPart1—the string portion preceding the delimiter (output)

sPart2—the string portion following the delimiter (output)

The function returns a value of 0 or –1 depending upon what it encounters as it analyzes the input string using the specified delimiter. Table A shows the various return values of the function based on different conditions.

Table A: Return values of the string-parser utility

Condition Value
Input string null   0
Delimiter character null   0
Delimiter more than one character long   0
No delimiter found in input string   0
Normal parameter inputs -1

The return parameters [sPart1] and [sPart2] will always have a value—even if it's the empty string ("").

Because there are two output arguments (sPart1 and sPart2), the function was designed to be called from within another VBA function, such as:

iRetVal=BreakString2Parts(sText, sDelimiter, _ sPart1, sPart2)

From here, the calling function can use the values of iRetVal, sPart1, and sPart2 as needed (as we'll see in the next section).

Parsing based on a single delimiter

The string-parsing utility in this article takes an input string and a delimiter character, then breaks the string into two parts—the part that precedes the first occurrence of the delimiter (sPart1) and everything after the first occurrence of the delimiter (sPart2). Thus, if the input string looks like

alpha,beta,gamma

the string parser will return two strings:

alpha

which is sPart1, and

beta,gamma

which is sPart2. Note that the first delimiter is not returned (which is our intention).

The second string still contains a comma, so we call the parser again, this time passing the second string, beta,gamma. When we do, the parser returns

beta

which is sPart1, and

gamma

which is sPart2. Running through this process one more time, passing the second string, gamma, to the string parser returns

gamma

and

""

or the empty string. Thus, for any given string that contains a list of items (separated by commas, for example), we can repetitively call the string parser until the second part (sPart 2) is the empty string. Each value in sPart1 can be used or stored by the calling routine as a valid list item.

Parsing based on a pair of delimiters

You can also use our string-parsing utility to extract keywords that are enclosed within a pair of delimiters. Suppose you have the input string

I like <ice cream> and cake

By making a first call to the parser with the starting delimiter character, <, the parser will return

I like 

and

ice cream> and cake

A second call to the parser with sPart2 (ice cream> and cake) as the input string and > as the delimiter returns

ice cream

and

 and cake

After the second call, the keywords we're interested in, ice cream, are located in sPart1.

How the string-parsing utility works

The complete listing of function BreakString2Parts is shown in Listing A. We won't describe how each line of code works, but instead we'll outline the logic behind the function. (You can download this month's sample files as part of the file jan98.zip from our Web site, www.zdjournals.com/adj. Just click the Source Code hyperlink.)

First, the function value BreakString2Parts is initialized to zero, and the two output parameters (sPart1 and sPart2) are initialized to the empty string. Next, the function checks to make sure the input arguments sText and sDelimiter are valid. Neither argument can be null, and sDelimiter must be only one character in length.

The smallest “normal” string the routine would expect to process would be three characters in length (e.g., a,b or <a>). Therefore, the next section of code handles special cases where the length of the input string is fewer than three characters. In this section, sPart1 and sPart2 could return the empty string depending on whether (and where) the delimiter character is found.

If the input string has three or more characters, the routine searches for the first occurrence of the delimiter. It sets sPart1 equal to the string that precedes the delimiter. It sets sPart2 equal to the string that follows the delimiter. The first occurrence of the delimiter isn't returned in either part. (By design, delimiters aren't part of the desired data.) Again, sPart1 and sPart2 could return the empty string, depending on whether (and where) the delimiter character is found.

In any case, whenever a delimiter character is encountered, the value of the function (BreakString2Parts) is set to –1. As we'll see in our example that searches for keywords, you can use this value to determine whether a valid set of delimiters (e.g., < >) was found in the string.

Listing A: Code for BreakString2Parts

Public Function BreakString2Parts(sText As String, sDelimiter As String, sPart1 As String, sPart2 As String) As Integer

'------------------------------------------------------------
' Purpose:  Break the input string [sText] into 2 parts
'          ([sPart1] and [sPart2]) looking for the special 
'          delimiter character [sDelimiter].
'
' Note:     If no delimiter character is found in the input 
'          string, the function will still return [sPart1] and 
'             [sPart2], but the return value of the function will 
'         be zero (0) instead of minus one (-1).
'------------------------------------------------------------
' By:       Pat Steiner
' Date:     10/1/97
' Version:  1.0
'------------------------------------------------------------
Dim sMessage As String
Dim sMsgTitle As String
Dim sFunction As String
Dim iMessage As Integer
Dim sCRLF As String * 2
Dim iError As Integer
Dim iLen As Integer
Dim iStart As Integer
Dim iEnd As Integer
Dim iDelimiter As Integer
Dim i As Integer
Dim iChars As Integer
                 
'------------------------------------------------------------
' Initialize "local" variables
'------------------------------------------------------------
BreakString2Parts = 0                 
'Initial function value = 0
sMessage = "Invalid calling (input) arguments:"           
'Setup pessimistic error message string
iMessage = 16                                             
'STOP icon
sFunction = "BreakString2Parts"
sMsgTitle = "FN: " & sFunction & " ( )"
sCRLF = Chr$(13) & Chr$(10)
iError = 0
sPart1 = ""
sPart2 = ""

On Error GoTo Err_BreakString2Parts 
'Setup error handler

'------------------------------------------------------------
' Look for function "input" errors (calling argument errors)
'------------------------------------------------------------
If IsNull(sText) Then
  sMessage = sMessage & sCRLF & sCRLF & "Input text [sText] cannot be NULL!"
  iError = iError + 1
End If

If IsNull(sDelimiter) Then
  sMessage = sMessage & sCRLF & sCRLF & "Delimiter text [sDelimiter] cannot be NULL!"
  iError = iError + 1
End If

If Len(sDelimiter) <> 1 Then
  sMessage = sMessage & sCRLF & sCRLF & "Delimiter text [sDelimiter] must be 1-character in length!"
  iError = iError + 1
End If

If iError > 0 Then
  Beep
  MsgBox sMessage, iMessage, sMsgTitle
  GoTo Exit_BreakString2Parts
Else
  sMessage = ""
End If

'------------------------------------------------------------
' Parse the string and separate into 2 parts ... 
' check "small" inputs strings first.
'------------------------------------------------------------
iLen = Len(sText)

If iLen < 1 Then                                
'String not worth parsing!!!
  GoTo Exit_BreakString2Parts
End If

If iLen = 1 Then
  If sDelimiter <> sText Then
    sPart1 = sText                              
'1 character string (no delimiter) ... set it = [sPart1]
  End If
  GoTo Exit_BreakString2Parts
End If

If iLen = 2 Then
  If Mid$(sText, 1, 1) = sDelimiter Then        
'1st char is delimiter ... 2nd char to [sPart2]
    sPart2 = Mid$(sText, 2, 1)
    BreakString2Parts = -1
  Else
    If Mid$(sText, 2, 1) = sDelimiter Then      
'2nd char is delimiter ... 1st char to [sPart1]
      sPart1 = Mid$(sText, 1, 1)
      BreakString2Parts = -1
    Else
      sPart1 = sText                            
'2 character string (no delimiter) ... set it = [sPart1]
    End If
  End If
  GoTo Exit_BreakString2Parts
End If


'------------------------------------------------------------
' Parse the string and separate into 2 parts 
'... length >= 3 characters!
'------------------------------------------------------------
iDelimiter = 0
For i = 1 To iLen
  If Mid$(sText, i, 1) = sDelimiter Then
    iDelimiter = i                              
'Set the position of the delimiter = 1st occurrence of it!!!
    i = iLen + 1                                
'Force early exit from loop
  End If
Next i

If iDelimiter = 0 Then
  sPart1 = sText                                
'3+ character string (no delimiter) ... set it = [sPart1]
  GoTo Exit_BreakString2Parts
End If

If iDelimiter = 1 Then
  sPart2 = Mid$(sText, 2, iLen - 1)             
'3+ char string (delimiter is 1st char) ... set it = [sPart2]
  BreakString2Parts = -1
  GoTo Exit_BreakString2Parts
End If

iStart = 1                                      
'String is delimited as anticipated ... set [sPart1]
iEnd = iDelimiter - 1
iChars = iEnd - iStart + 1
sPart1 = Mid$(sText, iStart, iChars)

iStart = iDelimiter + 1                         
'String is delimited as anticipated ... set [sPart1]
iEnd = iLen
iChars = iEnd - iStart + 1
sPart2 = Mid$(sText, iStart, iChars)
    
BreakString2Parts = -1

'======================
Exit_BreakString2Parts:
'======================
Exit Function

'=====================
Err_BreakString2Parts:
'=====================
sMessage = "Error #" & Err & ": " & Error(Err)
MsgBox sMessage, iMessage, sMsgTitle
Exit Function

End Function
A test function to look for keywords

Now let's test our function. In this example, we'll prompt the user to enter a text string. We want to extract any text that's contained between the pair of matching delimiters < >. The code for the test function, TestKeywords(), is shown in Listing B.

Listing B: Code for test function TestKeyWords ()

Public Function TestKeyWords()

'------------------------------------------------------------
' Purpose:  Test function BreakString2Parts(text, delimiter, part1, part2) 'looking for keywords' in a text string.

'
' Note:     The keywords are enclosed within angle brackets < > by definition ' of [sStartDelimiter] and [sEndDelimiter].

'------------------------------------------------------------
' By:       Pat Steiner
' Date:     10/1/97
' Version:  1.0
'------------------------------------------------------------
Dim sMessage As String
Dim sMsgTitle As String
Dim sFunction As String
Dim iMessage As Integer

Dim sPart1 As String
Dim sPart2 As String
Dim sInputText As String
Dim sStartDelimiter As String * 1
Dim sEndDelimiter As String * 1

Dim iRetVal As Integer

sFunction = "TestKeyWords"
sMsgTitle = "FN: " & sFunction & " ( )"
sStartDelimiter = "<"
sEndDelimiter = ">"

'-------------------------------------------------
' Prompt user for input string to parse ...
'-------------------------------------------------
sInputText = InputBox$("Enter text string to parse: ", sMsgTitle)

If sInputText <> "" Then
  Beep
  iRetVal = BreakString2Parts(sInputText, sStartDelimiter, sPart1, sPart2)
  
  If iRetVal = -1 Then      
'[sPart2] contains everything after the 1st delimiter
    sInputText = sPart2
    iRetVal = BreakStringReverse2Parts(sInputText, sEndDelimiter, sPart1, sPart2)
    
    If iRetVal = -1 Then    
'[sPart1] contains everything up to the 2nd delimiter
      iMessage = 64         
'INFO icon w/OK button
      sMessage = "Keyword: " & sPart1 & Chr$(13) & Chr$(13)
      MsgBox sMessage, iMessage, sMsgTitle
    Else
      iMessage = 16         
'STOP icon w/OK button
      MsgBox "BreakString2Parts could not find END delimiter!", iMessage, sMsgTitle
    End If
    
  Else
    iMessage = 16           
'STOP icon w/OK button
    MsgBox "BreakString2Parts could not find START delimiter!", iMessage, sMsgTitle
  End If
Else
  Beep
  iMessage = 16             
'STOP icon w/OK button
  MsgBox "Test function cancelled by user!", iMessage, sMsgTitle
End If

End Function

The function prompts the user for the input string by using the InputBox$() function. If the user presses the Cancel button, we'll display a message indicating that the user cancelled the process. If the user types anything else and presses OK, we pass the input string to BreakString2Parts using the first delimiter, <. If the return value from BreakString2Parts is –1, the first delimiter was found, and the remaining string to analyze is now contained in the return parameter sPart2.

We issue another call to BreakString2Parts, but this time with the ending delimiter, >. If the return value is –1, the ending delimiter was found and a message box displays the keyword(s) found inside the angle brackets. If neither delimiter was found, a message box displays this result instead.

Figure A shows the TestKeywords() function's InputBox$() display with a text string entered. After pressing the OK button, Figure B displays the MsgBox() with the resulting keywords extracted.

Figure A: We've entered a string to parse in the TestKeywords() function's input box.

Figure B: This message box displays the result of the string-parsing function.

Notes

In the example database on our FTP site, you'll find an additional string-parsing utility that begins analyzing at the end of the string and works to the beginning of the string looking for the delimiter character. We've also included several other test functions to demonstrate the utilities syntax and usage.

A string-parsing utility is essential for any developer who needs to extract data from foreign data sources, such as comma-delimited text files, or who must add extra functionality to existing systems without modifying their structures. In this article, we've presented such a utility.

Copyright © 1998, ZD Inc. All rights reserved. ZD Journals and the ZD Journals logo are trademarks of ZD Inc. Reproduction in whole or in part in any form or medium without express written permission of ZD Inc. is prohibited. All other product names and logos are trademarks or registered trademarks of their respective owners.