1. Computing

How Do You Test If a Delphi String Starts With a (Sub)String? What RTL Function is Faster?

By January 9, 2012

Follow me on:

in Delphi Strings :: Whatever type of applications you are creating using Delphi, you must be writing some code to handle strings. Delphi provides a healthy assortment of string operators, functions and procedures in RTL.

Let's say that you need to determines whether the beginning of a string matches a specified string? Delphi provides at least 4 ways to check if a string starts with another (sub)string.

Here's what functions you can use (note that LeftStr and AnsiStartsStr are defined in the strutils.pas unit):

const
  text = 'application';
  starter = 'ap'
begin
  AnsiStartsStr(starter, text);

  Pos(starter, text) =  1;

  LeftStr(text, Length(starter)) = starter;

  Copy(text, 1, Length(starter)) = starter;
end;

Note: All 4 functions provide case-sensitive matching.

The question that you could be asking yourself (at least I did) is: what's the best / fastest way?

Related:

Comments
March 30, 2010 at 7:38 am
(1) runner says:

Why ask? Benchmark it.

March 30, 2010 at 8:24 am
(2) Zarko Gajic says:

@runner: The questions was “how do you check”. I guess what “you” picked is what you think is the fastest way.

March 30, 2010 at 9:59 am
(3) Osama ALASSIRY says:

I would try something like:

type
Proc=Procedure;

function Benchmark(p:Proc; Times:Cardinal):Cardinal;
var t,i:Cardinal;
begin
t:=GetTickCount;
for i := 1 to Times do
p();
Result:=GetTickCount-t;
end;

March 30, 2010 at 11:04 am
(4) Cameron says:

I think you need to use timegettime with timebeginperiod and timeendperiod. Also, I would loop it 10,000 times to get a larger time sampling so there would be less precision error. Here are my results in Delphi 7

AnsiStartsStr(starter, text): 15
if Pos(starter, text) = 1 then: 0
LeftStr(text, Length(starter)): 9
LeftStr(text, 2): 9
Copy(text, 1, Length(starter)): 2
Copy(text, 1, 2): 2

Length adds zero to the times. Here is 100,000 iterations

AnsiStartsStr(starter, text): 136
if Pos(starter, text) = 1 then: 5
LeftStr(text, Length(starter)): 85
LeftStr(text, 2): 85
Copy(text, 1, Length(starter)): 19
Copy(text, 1, 2): 18

March 30, 2010 at 4:47 pm
(5) Gonzales says:

Surely the time for
Pos(starter,text)
must depend on (a) if the starter occurs in text and (b) if not the length of text. If text was a million characters in length, the Pos function has to check the entire million characters (less a few) for the occurence of starter. So I would suggest that using Pos is potentially by far the slowest option. But I haven’t done any benchmarks.

March 30, 2010 at 5:12 pm
(6) Jose Mejuto says:

There is no “faster” mode, there is a recommended mode:

“Pos”: Is the worst one, its speed is in relation with “text” variable.
“Copy”: Its speed is fixed to the length of “starter” variable at worst case, but it generates a new copy of the string as result, which involves memory allocation, contention in the memory string manager, and so on.
“LeftStr”: It is more or less the same as “Copy” in fact a “LeftStr” call will end in a call to “Copy”, so it would be slower than “Copy”.
“AnsiStartsStr”: Is designed just for that, its name reflects its design and will be in average the faster one.

March 30, 2010 at 11:00 pm
(7) Amarbat says:

Jose, could you tell me how do you guess AnsiStartsStr is better than LeftStr and furthermore Copy? Function name is nothing, it can not guarantee the execution speed.

In my opinion, Copy is best, next two are just based on it.

March 31, 2010 at 2:17 am
(8) Ollo says:

The question that you could be asking yourself (at least I did) is: what’s the best / fastest way?

So, if you’ve been asking yourself, why didn’t you benchmark it?

March 31, 2010 at 3:18 am
(9) Zarko Gajic says:

@Ollo: I did. The idea behind this post is “what are you using to check for string.startswith(string)”.

March 31, 2010 at 4:52 am
(10) GSA says:

AnsiStartsStr is using the copy() internally as well (at least with Delphi6). So this can’t be faster than calling copy() directly.

Pos() might be fast for short strings but I wouldn’t use it for long strings or if they are unknown in length.

LeftStr is also using copy().

conclusion: either copy() or pos() for short strings

March 31, 2010 at 5:34 am
(11) Jose Mejuto says:

@Amarbat:

You are right AnsiStartsStr also uses “Copy” which involves more or less the same as plain “Copy” or “LeftStr”.

Anyway taking a closer look the 4 functions are not equivalent they will not produce the same results in all cases as the first one “AnsiStartsStr” is locale dependent and a compare like:

AnsiStartsStr(‘aero’,'ζro’)

Will result true in some locales. So AnsiStartsStr does not fit the same category, its purpose is different and it is controlled by the operative system and user locale.

March 31, 2010 at 6:34 am
(12) dpap says:

What about :
for i = 1 to length(starter) do
if starter[i] text[i] then begin
result := FALSE;
exit;
end;
result := TRUE;
it returs as soon as posible

March 31, 2010 at 7:39 am
(13) Louis Kleiman says:

watch out for string overruns: before going into the FOR loop, you should probably do if length(Text) < length(starter) then Result := False;

April 1, 2010 at 5:10 am
(14) GSA says:

maybe this function will help (posi with -1 will check the teststring at the end of sourcestring).

function SameSubString(const sourcestring, teststring:string; posi:integer=1):boolean;
var ls,lt:integer;
ps,pt:pchar;
begin
ls:=length(sourcestring);
if posi>ls then begin
result:=false; exit;
end;

lt:=length(teststring);
if lt>ls then begin
result:=false; exit;
end;

if posi=-1 then posi:=ls-lt+1;
if posi+lt-1>ls then begin //cant be true as posi + lt is bigger then ls
result:=false;
exit;
end;

ps:=pchar(sourcestring); if posi>1 then inc(ps,posi-1);
pt:=pchar(teststring);
while pt^#0 do begin
if pt^ps^ then begin
result:=false;
exit;
end;
inc(pt); inc(ps);
end;
result:=true;
end;

April 1, 2010 at 2:39 pm
(15) Lucky says:

My experience is that string compares can
be sped up considerably by first comparing just
the first character in the string. I believe that
most compilers overlook this trick.

If test[1] = newrecord[1] then if
test = newrecord then etc. etc.

I believe I did benchmarks with this way back when and it certainly made a difference in Delphi 3 or 4 or whatever I was using then.

April 5, 2010 at 3:34 am
(16) otzi says:

As GSA says specialized functions are almost always faster.
I tried simplest one:

function begins(const aString: string; const aPattern: string): Boolean;
var
i : Word;
begin
i := Length(aPattern);
if (Length(aString) 0) do begin
if (aString[i] aPattern[i]) then begin
Result := False;
Exit;
end else begin
Dec(i);
end;
end;

Result := True;
end;

and results for 3 corner cases
– short pattern matches
– long pattern doesn’t match
– long pattern matches
are:

Is “ap” in “application”?
1000000 Pos() takes: 109 ms
1000000 AnsiStartsStr() takes: 578 ms
1000000 begins() takes: 31 ms
Is “applications” in “application is”?
1000000 Pos() takes: 156 ms
1000000 AnsiStartsStr() takes: 797 ms
1000000 begins() takes: 47 ms
Is “applications” in “applications”?
1000000 Pos() takes: 125 ms
1000000 AnsiStartsStr() takes: 703 ms
1000000 begins() takes: 78 ms

April 7, 2010 at 10:25 am
(17) Guy Gordon says:

SysUtils contains several useful functions such as CompareStr and CompareMem written in assembler.

// Left Match
// True if start of s=startstring
// Warning: Assumes single-byte characters.
//
function LeftMatch(const s,startstring:string):Boolean;
begin
if (startstring=”) or (length(s)<length(startstring)) then
Result := False
else
Result := CompareMem(pointer(s),pointer(startstring),length(startstring));
end; //Left Match

January 9, 2012 at 8:50 am
(18) mart says:

I voted Pos, because I guess it’s the fastest.
I would still prefer “AnsiStartsStr” in an actual code, unless the speed is extremly important. “AnsiStarsStr” is a clear function, it’s easy to understand what it does and no additional comments are needed.

January 9, 2012 at 9:05 am
(19) Eric says:

They all perform poorly, I’m using my own implementation.

Pos() f.i. can behave wel if the string is at the beginning or if the tested string is short, but can perform horribly otherwise.

All three other variants involve a Copy(), ie. memory allocation, copy, release & exception frame.

January 9, 2012 at 10:58 am
(20) Aim says:

if it’s that important, writing your own function is going to be faster.

@Guy Gordon: very good, but case senstive searches only?

also change
if (startstring=”) or (length(s)=length(startstring)

optimizes out the (startstring=”)

January 9, 2012 at 11:11 am
(21) roberto says:

Your pool has 2 different questions and one answer to cheese from.
We use Pos the most but that doesn’t mean it’s the fastest one!!

January 9, 2012 at 1:53 pm
(22) Aim says:

Oops, i lost a line there. I meant to say

@Guy Gordon: change
if (startstring=”) or (length(s)=length(startstring)

to optimize out the (startstring=”)

January 9, 2012 at 3:37 pm
(23) Arthur says:

function StatsWith(const SubStr, Str: string): Boolean;
begin
if Length(SubStr) <= Length(Str) then
Result := CompareMem(Pointer(SubStr), Pointer(Str), Length(SubStr))
else
Result := False;
end;

January 9, 2012 at 11:53 pm
(24) Remy Lebeau says:

In Indy, under Windows we use the Win32 API CompareString() function in our own TextStartsWith() and TextEndsWith() implementations. This way, we can compare the original string input values directly, case insensitively, without having to make any copies in memory. On other platforms, we resort to Copy() and AnsiCompareText() instead.

January 11, 2012 at 6:22 pm
(25) idefix2 says:

Arthurs function is certainly by far the quickest, but it will not work with newer Delphi versions – unicode string characters use 2 bytes of memory, so the memory area to be compared is 2*length bytes long.

I think there are even character sets with fixed character size, in that case the problem cannot be solved that way.

The disadvantage of such low level programming is that it is very dependent on the implementation of the language. I think it is a good idea to omit such tricks, unless maximal speed is of crucial importance.

January 11, 2012 at 6:27 pm
(26) idefix2 says:

sorry, I omitted an important word – I should have written:
character sets with no fixed character size

January 18, 2012 at 12:27 am
(27) Arthur says:

@idefix2 you’re right, Length() should be replaced by ByteLength() in the code

function StatsWith(const SubStr, Str: string): Boolean;
begin
if Length(SubStr) <= Length(Str) then
Result := CompareMem(Pointer(SubStr), Pointer(Str), ByteLength(SubStr))
else
Result := False;
end;

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

©2014 About.com. All rights reserved.