in Delphi Strings :: Whatever type of applications you are creating using Delphi, you must be writing some code to handle strings. Delphi provides a healthy assortment of string operators, functions and procedures in RTL.
Let's say that you need to determines whether the beginning of a string matches a specified string? Delphi provides at least 4 ways to check if a string starts with another (sub)string.
Here's what functions you can use (note that LeftStr and AnsiStartsStr are defined in the strutils.pas unit):
const text = 'application'; starter = 'ap' begin AnsiStartsStr(starter, text); Pos(starter, text) = 1; LeftStr(text, Length(starter)) = starter; Copy(text, 1, Length(starter)) = starter; end;
Note: All 4 functions provide case-sensitive matching.
The question that you could be asking yourself (at least I did) is: what's the best / fastest way?
Related:

Why ask? Benchmark it.
@runner: The questions was “how do you check”. I guess what “you” picked is what you think is the fastest way.
I would try something like:
type
Proc=Procedure;
function Benchmark(p:Proc; Times:Cardinal):Cardinal;
var t,i:Cardinal;
begin
t:=GetTickCount;
for i := 1 to Times do
p();
Result:=GetTickCount-t;
end;
I think you need to use timegettime with timebeginperiod and timeendperiod. Also, I would loop it 10,000 times to get a larger time sampling so there would be less precision error. Here are my results in Delphi 7
AnsiStartsStr(starter, text): 15
if Pos(starter, text) = 1 then: 0
LeftStr(text, Length(starter)): 9
LeftStr(text, 2): 9
Copy(text, 1, Length(starter)): 2
Copy(text, 1, 2): 2
Length adds zero to the times. Here is 100,000 iterations
AnsiStartsStr(starter, text): 136
if Pos(starter, text) = 1 then: 5
LeftStr(text, Length(starter)): 85
LeftStr(text, 2): 85
Copy(text, 1, Length(starter)): 19
Copy(text, 1, 2): 18
Surely the time for
Pos(starter,text)
must depend on (a) if the starter occurs in text and (b) if not the length of text. If text was a million characters in length, the Pos function has to check the entire million characters (less a few) for the occurence of starter. So I would suggest that using Pos is potentially by far the slowest option. But I haven’t done any benchmarks.
There is no “faster” mode, there is a recommended mode:
“Pos”: Is the worst one, its speed is in relation with “text” variable.
“Copy”: Its speed is fixed to the length of “starter” variable at worst case, but it generates a new copy of the string as result, which involves memory allocation, contention in the memory string manager, and so on.
“LeftStr”: It is more or less the same as “Copy” in fact a “LeftStr” call will end in a call to “Copy”, so it would be slower than “Copy”.
“AnsiStartsStr”: Is designed just for that, its name reflects its design and will be in average the faster one.
Jose, could you tell me how do you guess AnsiStartsStr is better than LeftStr and furthermore Copy? Function name is nothing, it can not guarantee the execution speed.
In my opinion, Copy is best, next two are just based on it.
The question that you could be asking yourself (at least I did) is: what’s the best / fastest way?
So, if you’ve been asking yourself, why didn’t you benchmark it?
@Ollo: I did. The idea behind this post is “what are you using to check for string.startswith(string)”.
AnsiStartsStr is using the copy() internally as well (at least with Delphi6). So this can’t be faster than calling copy() directly.
Pos() might be fast for short strings but I wouldn’t use it for long strings or if they are unknown in length.
LeftStr is also using copy().
conclusion: either copy() or pos() for short strings
@Amarbat:
You are right AnsiStartsStr also uses “Copy” which involves more or less the same as plain “Copy” or “LeftStr”.
Anyway taking a closer look the 4 functions are not equivalent they will not produce the same results in all cases as the first one “AnsiStartsStr” is locale dependent and a compare like:
AnsiStartsStr(‘aero’,'ζro’)
Will result true in some locales. So AnsiStartsStr does not fit the same category, its purpose is different and it is controlled by the operative system and user locale.
What about :
for i = 1 to length(starter) do
if starter[i] text[i] then begin
result := FALSE;
exit;
end;
result := TRUE;
it returs as soon as posible
watch out for string overruns: before going into the FOR loop, you should probably do if length(Text) < length(starter) then Result := False;
maybe this function will help (posi with -1 will check the teststring at the end of sourcestring).
function SameSubString(const sourcestring, teststring:string; posi:integer=1):boolean;
var ls,lt:integer;
ps,pt:pchar;
begin
ls:=length(sourcestring);
if posi>ls then begin
result:=false; exit;
end;
lt:=length(teststring);
if lt>ls then begin
result:=false; exit;
end;
if posi=-1 then posi:=ls-lt+1;
if posi+lt-1>ls then begin //cant be true as posi + lt is bigger then ls
result:=false;
exit;
end;
ps:=pchar(sourcestring); if posi>1 then inc(ps,posi-1);
pt:=pchar(teststring);
while pt^#0 do begin
if pt^ps^ then begin
result:=false;
exit;
end;
inc(pt); inc(ps);
end;
result:=true;
end;
My experience is that string compares can
be sped up considerably by first comparing just
the first character in the string. I believe that
most compilers overlook this trick.
If test[1] = newrecord[1] then if
test = newrecord then etc. etc.
I believe I did benchmarks with this way back when and it certainly made a difference in Delphi 3 or 4 or whatever I was using then.
As GSA says specialized functions are almost always faster.
I tried simplest one:
function begins(const aString: string; const aPattern: string): Boolean;
var
i : Word;
begin
i := Length(aPattern);
if (Length(aString) 0) do begin
if (aString[i] aPattern[i]) then begin
Result := False;
Exit;
end else begin
Dec(i);
end;
end;
Result := True;
end;
and results for 3 corner cases
– short pattern matches
– long pattern doesn’t match
– long pattern matches
are:
Is “ap” in “application”?
1000000 Pos() takes: 109 ms
1000000 AnsiStartsStr() takes: 578 ms
1000000 begins() takes: 31 ms
Is “applications” in “application is”?
1000000 Pos() takes: 156 ms
1000000 AnsiStartsStr() takes: 797 ms
1000000 begins() takes: 47 ms
Is “applications” in “applications”?
1000000 Pos() takes: 125 ms
1000000 AnsiStartsStr() takes: 703 ms
1000000 begins() takes: 78 ms
SysUtils contains several useful functions such as CompareStr and CompareMem written in assembler.
// Left Match
// True if start of s=startstring
// Warning: Assumes single-byte characters.
//
function LeftMatch(const s,startstring:string):Boolean;
begin
if (startstring=”) or (length(s)<length(startstring)) then
Result := False
else
Result := CompareMem(pointer(s),pointer(startstring),length(startstring));
end; //Left Match
I voted Pos, because I guess it’s the fastest.
I would still prefer “AnsiStartsStr” in an actual code, unless the speed is extremly important. “AnsiStarsStr” is a clear function, it’s easy to understand what it does and no additional comments are needed.
They all perform poorly, I’m using my own implementation.
Pos() f.i. can behave wel if the string is at the beginning or if the tested string is short, but can perform horribly otherwise.
All three other variants involve a Copy(), ie. memory allocation, copy, release & exception frame.
if it’s that important, writing your own function is going to be faster.
@Guy Gordon: very good, but case senstive searches only?
also change
if (startstring=) or (length(s)=length(startstring)
optimizes out the (startstring=)
Your pool has 2 different questions and one answer to cheese from.
We use Pos the most but that doesn’t mean it’s the fastest one!!
Oops, i lost a line there. I meant to say
@Guy Gordon: change
if (startstring=) or (length(s)=length(startstring)
to optimize out the (startstring=)
function StatsWith(const SubStr, Str: string): Boolean;
begin
if Length(SubStr) <= Length(Str) then
Result := CompareMem(Pointer(SubStr), Pointer(Str), Length(SubStr))
else
Result := False;
end;
In Indy, under Windows we use the Win32 API CompareString() function in our own TextStartsWith() and TextEndsWith() implementations. This way, we can compare the original string input values directly, case insensitively, without having to make any copies in memory. On other platforms, we resort to Copy() and AnsiCompareText() instead.
Arthurs function is certainly by far the quickest, but it will not work with newer Delphi versions – unicode string characters use 2 bytes of memory, so the memory area to be compared is 2*length bytes long.
I think there are even character sets with fixed character size, in that case the problem cannot be solved that way.
The disadvantage of such low level programming is that it is very dependent on the implementation of the language. I think it is a good idea to omit such tricks, unless maximal speed is of crucial importance.
sorry, I omitted an important word – I should have written:
character sets with no fixed character size
@idefix2 you’re right, Length() should be replaced by ByteLength() in the code
function StatsWith(const SubStr, Str: string): Boolean;
begin
if Length(SubStr) <= Length(Str) then
Result := CompareMem(Pointer(SubStr), Pointer(Str), ByteLength(SubStr))
else
Result := False;
end;