1. Computing

How Many Times A String Appears Inside Another String?

By June 5, 2012

Follow me on:

in Delphi TIPS ::

While the SysUtils unit and the StrUtils unit provide hundreds of string manipulation routines there are always some that are "missing". A function I need daily is the one that return the number of how many times a string appears inside another string.

Read the full article to learn how to Get The Number Of Occurrences Of a SubString in a String


June 5, 2012 at 6:17 am
(1) Marcel Aubin says:

I like this variant of the function which avoid repeating the PosEx function

function SubStringOccurences(const subString, sourceString : string) : integer;
pEx, pChar: integer;
result := 0; pChar := 1;
pEx := PosEx(subString, sourceString, pEx);
if pEx > 0 then Inc(result);
until pEx = 0;

June 5, 2012 at 11:31 am
(2) stzdzyhs says:

on the other side, I bet that the author’s code is more quick because:
it just has 1 if test in a single loop (while condition test),
in your code, it becomes two if test in a single loop
(1 is if pEx>0, the other is until.. )

the repeating the PosEx is not in the loop, it’s out of loop.

June 5, 2012 at 2:36 pm
(3) majlumbo says:

Not that it is better, but here’s another way to get the number of occurrences

function numOccurrences(substring, sourcestring: string): integer;
Result := (length(sourcestring)-length(stringreplace(sourcestring, substring, ”,[rfreplaceall]))) div length(substring);

June 5, 2012 at 4:06 pm
(4) majlumbo says:

Seeing that you also included an Ignore case version, figured I’d add that also, plus add a check to ensure there’s no divide by zero error

function numOccurrences(substring, sourcestring: string; ignorecase: boolean): integer;
if (length(substring) = 0) or (length(sourcestring) = 0) then
Result := 0
//no sense checking if source or sub is zero length
if ignorecase then
Result := (length(sourcestring) –
length(stringreplace(sourcestring, substring, ,
[rfreplaceall, rfIgnoreCase]))) div length(substring)
Result := (length(sourcestring) –
length(stringreplace(sourcestring, substring, ”,
[rfreplaceall]))) div length(substring);

June 6, 2012 at 10:08 am
(5) tuncay says:

I tend to do things in a more basic form, avoiding stock functions like stringreplace as much as possible, you never know when borcadero decides to change it. So, I instead only using LowerCase and Copy, and do this:

function SubStringCount(needle, haystack:String; ignorecase:boolean=false):integer;
var i, lneedle, lhaystack : integer; c : char; comp : String;
lneedle := length(needle);
lhaystack := length(haystack);
result := 0;

if (lneedle>0) and (lhaystack>0) and (lhaystack>=lneedle) then
if ignorecase then needle := lowercase(needle);
// haystack will be converted to lower case as needed along the loop
// for performance reasons

// store first character of needle to reuse later.
c := needle[1];

result := 0;
i := 1;
while i <= (lhaystack – lneedle + 1) do
// only do the comparison if first char matches, otherwise, no sense in
// comparing whole substring
if (haystack[i] = c) or (ignorecase and (lowercase(haystack[i]) = c)) then
comp := copy(haystack, i, lneedle);
if ignorecase then comp:= lowercase(comp);
if comp = needle then
result := result + 1;
// skip cursor over substring that we already found
i := i + lneedle – 1;

Dont let the length scare you, half of it is comments.

June 6, 2012 at 10:53 am
(6) tuncay says:

Oh, the point of this is (despite longer) that it aims to go over the haystack data only once, and then to compare as little data as possible, including case conversions. Dont forget that PosEx internally also goes over the data. Such a difference would be more obvious when you have a 10 byte needle but a 100 MB haystack.

June 11, 2012 at 6:59 am
(7) picky says:

I like that you’ve thought about speed and user experience when strings aren’t just a few characters long.
One thing though, your needle search string may get altered by this routine as it is.
I’d be tempted to change it to
function SubStringCount(const needle, haystack:String; ignorecase:boolean=false):integer;
and add
myneedle: string;

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

©2014 About.com. All rights reserved.