by Patrick L. Steiner
If you've ever had to extract data from foreign data sources like comma-delimited text files, you've probably needed a way to parse that data into useable form. VBA includes built-in search functions, such as INSTR(), LIKE, and FIND, but these functions need a specific string or pattern to look for. And even if you find the string you want, you're limited as to what you can do next.
To work around these limitations, you can build your own string-parsing utility. In this article, we'll show you such a utility, called BreakString2Parts. (Trust me—the name will make sense as you read on.)
What string parsers doString-parsing utilities allow you to break strings into various parts and extract keywords or phrases within the string. Typically, the string will contain one or more special characters called delimiters; these delimiters signify the start or end of the portion of the string you want to extract. Although the comma is used most often, any character can serve as a delimiter—as long as it doesn't appear in the body of the text to be extracted.
One of the most common uses of string parsing is for reading external data files. Many spreadsheets and databases allow you to save data in comma-delimited (or comma separated value) formats. String parsing makes it easy for you to extract the data fields from such a file.
Another way I use string parsing is to extract data or keywords from text or memo fields. As developers, we're often required to interface with programs over which we have no control. In such cases, since I can't add a new field, I make use of one already there—such as a text or memo field. The users are instructed to add delimiters, usually angle brackets (<>) or square brackets ([]), around the data or keywords. By using the string-parsing utility, I can extract the data or keywords I need without modifying the structure of the original system.
Syntax of the string-parsing utilityOur string-parsing utility, BreakString2Parts, takes the following four arguments:
sText—the string to analyze (input)
sDelimiter—the delimiter character (input)
sPart1—the string portion preceding the delimiter (output)
sPart2—the string portion following the delimiter (output)
The function returns a value of 0 or –1 depending upon what it encounters as it analyzes the input string using the specified delimiter. Table A shows the various return values of the function based on different conditions.
Table A: Return values of the string-parser utility
Condition | Value |
Input string null | 0 |
Delimiter character null | 0 |
Delimiter more than one character long | 0 |
No delimiter found in input string | 0 |
Normal parameter inputs | -1 |
The return parameters [sPart1] and [sPart2] will always have a value—even if it's the empty string ("").
Because there are two output arguments (sPart1 and sPart2), the function was designed to be called from within another VBA function, such as:
iRetVal=BreakString2Parts(sText, sDelimiter, _ sPart1, sPart2)
From here, the calling function can use the values of iRetVal, sPart1, and sPart2 as needed (as we'll see in the next section).
Parsing based on a single delimiterThe string-parsing utility in this article takes an input string and a delimiter character, then breaks the string into two parts—the part that precedes the first occurrence of the delimiter (sPart1) and everything after the first occurrence of the delimiter (sPart2). Thus, if the input string looks like
alpha,beta,gamma
the string parser will return two strings:
alpha
which is sPart1, and
beta,gamma
which is sPart2.
Note that the first delimiter is not returned (which is our intention).
The second string still contains a comma, so we call the parser again, this time passing the second string, beta,gamma.
When we do, the parser returns
beta
which is sPart1, and
gamma
which is sPart2.
Running through this process one more time, passing the second string, gamma
, to the string parser returns
gamma
and
""
or the empty string. Thus, for any given string that contains a list of items (separated by commas, for example), we can repetitively call the string parser until the second part (sPart 2
) is the empty string. Each value in sPart1
can be used or stored by the calling routine as a valid list item.
You can also use our string-parsing utility to extract keywords that are enclosed within a pair of delimiters. Suppose you have the input string
I like <ice cream> and cake
By making a first call to the parser with the starting delimiter character, <
, the parser will return
I like
and
ice cream> and cake
A second call to the parser with sPart2
(ice cream> and cake
) as the input string and > as the delimiter returns
ice cream
and
and cake
After the second call, the keywords we're interested in, ice cream
, are located in sPart1
.
The complete listing of function BreakString2Parts
is shown in Listing A. We won't describe how each line of code works, but instead we'll outline the logic behind the function. (You can download this month's sample files as part of the file jan98.zip from our Web site, www.zdjournals.com/adj. Just click the Source Code hyperlink.)
First, the function value BreakString2Parts
is initialized to zero, and the two output parameters (sPart1
and sPart2
) are initialized to the empty string. Next, the function checks to make sure the input arguments sText
and sDelimiter
are valid. Neither argument can be null, and sDelimiter
must be only one character in length.
The smallest “normal” string the routine would expect to process would be three characters in length (e.g., a,b
or <a>
). Therefore, the next section of code handles special cases where the length of the input string is fewer than three characters. In this section, sPart1
and sPart2
could return the empty string depending on whether (and where) the delimiter character is found.
If the input string has three or more characters, the routine searches for the first occurrence of the delimiter. It sets sPart1
equal to the string that precedes the delimiter. It sets sPart2
equal to the string that follows the delimiter. The first occurrence of the delimiter isn't returned in either part. (By design, delimiters aren't part of the desired data.) Again, sPart1
and sPart2
could return the empty string, depending on whether (and where) the delimiter character is found.
In any case, whenever a delimiter character is encountered, the value of the function (BreakString2Parts
) is set to –1. As we'll see in our example that searches for keywords, you can use this value to determine whether a valid set of delimiters (e.g., < >) was found in the string.
Listing A: Code for BreakString2Parts
Public Function BreakString2Parts(sText As String, sDelimiter As String, sPart1 As String, sPart2 As String) As Integer
'------------------------------------------------------------
' Purpose: Break the input string [sText] into 2 parts
' ([sPart1] and [sPart2]) looking for the special
' delimiter character [sDelimiter].
'
' Note: If no delimiter character is found in the input
' string, the function will still return [sPart1] and
' [sPart2], but the return value of the function will
' be zero (0) instead of minus one (-1).
'------------------------------------------------------------
' By: Pat Steiner
' Date: 10/1/97
' Version: 1.0
'------------------------------------------------------------
Dim sMessage As String
Dim sMsgTitle As String
Dim sFunction As String
Dim iMessage As Integer
Dim sCRLF As String * 2
Dim iError As Integer
Dim iLen As Integer
Dim iStart As Integer
Dim iEnd As Integer
Dim iDelimiter As Integer
Dim i As Integer
Dim iChars As Integer
'------------------------------------------------------------
' Initialize "local" variables
'------------------------------------------------------------
BreakString2Parts = 0
'Initial function value = 0
sMessage = "Invalid calling (input) arguments:"
'Setup pessimistic error message string
iMessage = 16
'STOP icon
sFunction = "BreakString2Parts"
sMsgTitle = "FN: " & sFunction & " ( )"
sCRLF = Chr$(13) & Chr$(10)
iError = 0
sPart1 = ""
sPart2 = ""
On Error GoTo Err_BreakString2Parts
'Setup error handler
'------------------------------------------------------------
' Look for function "input" errors (calling argument errors)
'------------------------------------------------------------
If IsNull(sText) Then
sMessage = sMessage & sCRLF & sCRLF & "Input text [sText] cannot be NULL!"
iError = iError + 1
End If
If IsNull(sDelimiter) Then
sMessage = sMessage & sCRLF & sCRLF & "Delimiter text [sDelimiter] cannot be NULL!"
iError = iError + 1
End If
If Len(sDelimiter) <> 1 Then
sMessage = sMessage & sCRLF & sCRLF & "Delimiter text [sDelimiter] must be 1-character in length!"
iError = iError + 1
End If
If iError > 0 Then
Beep
MsgBox sMessage, iMessage, sMsgTitle
GoTo Exit_BreakString2Parts
Else
sMessage = ""
End If
'------------------------------------------------------------
' Parse the string and separate into 2 parts ...
' check "small" inputs strings first.
'------------------------------------------------------------
iLen = Len(sText)
If iLen < 1 Then
'String not worth parsing!!!
GoTo Exit_BreakString2Parts
End If
If iLen = 1 Then
If sDelimiter <> sText Then
sPart1 = sText
'1 character string (no delimiter) ... set it = [sPart1]
End If
GoTo Exit_BreakString2Parts
End If
If iLen = 2 Then
If Mid$(sText, 1, 1) = sDelimiter Then
'1st char is delimiter ... 2nd char to [sPart2]
sPart2 = Mid$(sText, 2, 1)
BreakString2Parts = -1
Else
If Mid$(sText, 2, 1) = sDelimiter Then
'2nd char is delimiter ... 1st char to [sPart1]
sPart1 = Mid$(sText, 1, 1)
BreakString2Parts = -1
Else
sPart1 = sText
'2 character string (no delimiter) ... set it = [sPart1]
End If
End If
GoTo Exit_BreakString2Parts
End If
'------------------------------------------------------------
' Parse the string and separate into 2 parts
'... length >= 3 characters!
'------------------------------------------------------------
iDelimiter = 0
For i = 1 To iLen
If Mid$(sText, i, 1) = sDelimiter Then
iDelimiter = i
'Set the position of the delimiter = 1st occurrence of it!!!
i = iLen + 1
'Force early exit from loop
End If
Next i
If iDelimiter = 0 Then
sPart1 = sText
'3+ character string (no delimiter) ... set it = [sPart1]
GoTo Exit_BreakString2Parts
End If
If iDelimiter = 1 Then
sPart2 = Mid$(sText, 2, iLen - 1)
'3+ char string (delimiter is 1st char) ... set it = [sPart2]
BreakString2Parts = -1
GoTo Exit_BreakString2Parts
End If
iStart = 1
'String is delimited as anticipated ... set [sPart1]
iEnd = iDelimiter - 1
iChars = iEnd - iStart + 1
sPart1 = Mid$(sText, iStart, iChars)
iStart = iDelimiter + 1
'String is delimited as anticipated ... set [sPart1]
iEnd = iLen
iChars = iEnd - iStart + 1
sPart2 = Mid$(sText, iStart, iChars)
BreakString2Parts = -1
'======================
Exit_BreakString2Parts:
'======================
Exit Function
'=====================
Err_BreakString2Parts:
'=====================
sMessage = "Error #" & Err & ": " & Error(Err)
MsgBox sMessage, iMessage, sMsgTitle
Exit Function
End Function
A test function to look for keywords
Now let's test our function. In this example, we'll prompt the user to enter a text string. We want to extract any text that's contained between the pair of matching delimiters < >. The code for the test function, TestKeywords(), is shown in Listing B.
Listing B: Code for test function TestKeyWords ()
Public Function TestKeyWords()
'------------------------------------------------------------
' Purpose: Test function BreakString2Parts(text, delimiter, part1, part2) 'looking for keywords' in a text string.
'
' Note: The keywords are enclosed within angle brackets < > by definition ' of [sStartDelimiter] and [sEndDelimiter].
'------------------------------------------------------------
' By: Pat Steiner
' Date: 10/1/97
' Version: 1.0
'------------------------------------------------------------
Dim sMessage As String
Dim sMsgTitle As String
Dim sFunction As String
Dim iMessage As Integer
Dim sPart1 As String
Dim sPart2 As String
Dim sInputText As String
Dim sStartDelimiter As String * 1
Dim sEndDelimiter As String * 1
Dim iRetVal As Integer
sFunction = "TestKeyWords"
sMsgTitle = "FN: " & sFunction & " ( )"
sStartDelimiter = "<"
sEndDelimiter = ">"
'-------------------------------------------------
' Prompt user for input string to parse ...
'-------------------------------------------------
sInputText = InputBox$("Enter text string to parse: ", sMsgTitle)
If sInputText <> "" Then
Beep
iRetVal = BreakString2Parts(sInputText, sStartDelimiter, sPart1, sPart2)
If iRetVal = -1 Then
'[sPart2] contains everything after the 1st delimiter
sInputText = sPart2
iRetVal = BreakStringReverse2Parts(sInputText, sEndDelimiter, sPart1, sPart2)
If iRetVal = -1 Then
'[sPart1] contains everything up to the 2nd delimiter
iMessage = 64
'INFO icon w/OK button
sMessage = "Keyword: " & sPart1 & Chr$(13) & Chr$(13)
MsgBox sMessage, iMessage, sMsgTitle
Else
iMessage = 16
'STOP icon w/OK button
MsgBox "BreakString2Parts could not find END delimiter!", iMessage, sMsgTitle
End If
Else
iMessage = 16
'STOP icon w/OK button
MsgBox "BreakString2Parts could not find START delimiter!", iMessage, sMsgTitle
End If
Else
Beep
iMessage = 16
'STOP icon w/OK button
MsgBox "Test function cancelled by user!", iMessage, sMsgTitle
End If
End Function
The function prompts the user for the input string by using the InputBox$() function. If the user presses the Cancel button, we'll display a message indicating that the user cancelled the process. If the user types anything else and presses OK, we pass the input string to BreakString2Parts using the first delimiter, <. If the return value from BreakString2Parts is –1, the first delimiter was found, and the remaining string to analyze is now contained in the return parameter sPart2.
We issue another call to BreakString2Parts, but this time with the ending delimiter, >. If the return value is –1, the ending delimiter was found and a message box displays the keyword(s) found inside the angle brackets. If neither delimiter was found, a message box displays this result instead.
Figure A shows the TestKeywords() function's InputBox$() display with a text string entered. After pressing the OK button, Figure B displays the MsgBox() with the resulting keywords extracted.
Figure A: We've entered a string to parse in the TestKeywords() function's input box.
Figure B: This message box displays the result of the string-parsing function.
NotesIn the example database on our FTP site, you'll find an additional string-parsing utility that begins analyzing at the end of the string and works to the beginning of the string looking for the delimiter character. We've also included several other test functions to demonstrate the utilities syntax and usage.
A string-parsing utility is essential for any developer who needs to extract data from foreign data sources, such as comma-delimited text files, or who must add extra functionality to existing systems without modifying their structures. In this article, we've presented such a utility.
Copyright © 1998, ZD
Inc. All rights reserved. ZD Journals and the ZD Journals logo are trademarks of ZD
Inc. Reproduction in whole or in part in any form or medium without
express written permission of ZD Inc. is prohibited. All other product
names and logos are trademarks or registered trademarks of their
respective owners.