StrSpan and StrBreak


Bruce: In C, the basic building blocks of parsing are strspn and strcspn. My Basic names for them are StrSpan and StrBreak. To use StrSpan, you pass it the string you want to parse and a list of separator characters—space, tab, and comma—and line-break characters if the string can span multiple lines. StrSpan returns the position of the first character that is not a separator. You pass StrBreak
the same arguments, and it returns the position of the first character that is a separator.


You can probably guess how to use these functions. Find the start of a token with StrSpan, find the end of the token with StrBreak, cut out the token, find the start of a token with StrSpan, find the end of the token with StrBreak…and so on to the end of the string. That’s pretty much what GetToken does, but first let’s take a look at StrSpan:

Function StrSpan1(sTarget As String, sSeps As String) As Integer

Dim cTarget As Integer, iStart As Integer
cTarget = Len(sTarget)
iStart = 1
‘ Look for start of token (character that isn’t a separator)
Do While InStr(sSeps, Mid$(sTarget, iStart, 1))
If iStart > cTarget Then
StrSpan1 = 0
Exit Function
Else
iStart = iStart + 1
End If
Loop
StrSpan1 = iStart

End Function

StrBreak is identical except that the loop test is reversed:

‘ Look for end of token (first character that is a separator)
Do While InStr(sSeps, Mid$(sTarget, iStart, 1)) = 0

Archaeologist’s Note: Names of functions and variables have been changed to protect the guilty. Joe Hacker’s diatribes against the stupid naming conventions of the original have been edited out of this text, along with other rude remarks deemed irrelevant. The code has been updated to reflect the Visual Basic language, ignoring QuickBasic syntax differences. Different versions of the procedures are numbered. Interested historians can find the original code in most versions of QuickBasic, the Basic Professional Development System, MS-DOS 5, and Windows NT.


Joe: If they’re identical except for one line, why have two functions? Why not have one function—say, StrScan—with a flag argument that can be either Span or Break? Put the loop test in a conditional. That should save some code.


Jane: Yeah, but at what cost? You might loop through these functions hundreds of times if you’re parsing a big file. Is the size cost of duplicating tiny functions worth the cost of adding an extra test in a loop that will be called in a loop? Besides, the interface feels better with separate functions.


Joe: I don’t care what “feels” better, but I guess I’ll buy your performance argument. Let’s stick with two functions.


Mary: Any other comments?


Jane: The length cTarget is calculated just once, outside the loop. That’s good. The body of the loop looks pretty clean. The loop test with Mid$ called inside InStr looks messy.


Joe: It’s taking one character at a time off the test string and searching for it in the separator list. Kind of an unusual use of InStr. You don’t care where you find the character, only whether you find it. I can’t think of a better way to do it, short of rewriting it in a real language, like C.


Bruce: Well, if StrSpan is OK, StrBreak is also OK because it’s the same except backward. Let’s move on to GetToken.