This sequential approach is, as you might expect, rather slow when there are lots of files to be processed - i.e. the next file is being opened and scanned when the previous file is finished. The result of this scanning is a collection of scanning results stored in a TStringList.
Having in mind that today many machines have more than one core and that scanning more files at the same time should speed up the entire process I've decided to go for a threaded approach - where several files are being scanned at the same time.
To make it more easy to explain what I have done, my scanning can be simplified into the following task: split a string into several pieces and have characters for each piece be stored in the TStringList.
String Parsing Multiple Threads
Having the simplified task described above, here's the problem and the expected solution:- For a string 'delphi' I would call my "master" thread.
- The input string will be sliced into several pieces where each piece would be processed by a "child" thread.
- If I would have 2 child threads (the number depends on the length of the input string), each thread would take half of the input string.
- The result of all this work (by master thread) is a TStringList containing characters for the inputted string : 'd','e','l','p','h','i'.
- The first child thread would receive "del" and output 'd','e','l'.
- The second child thread would receive "phi" and output 'p','h','i'.
- Child threads output their results to the master thread.
- Master thread updates the main application GUI.
There are several Threading Libraries For Delphi available to Delphi developers which will hide all the "how to thread" complexity from you and expose only what is just-enough for you not to runaway from synchronization and critical sections for data protection.
For this task I've decided to first take what Delphi has to offer and that is the TThread class.
To have a piece of code you would like to be executed out of the main thread of the application, you would create a class that extends TThread. I will not go into basics around extending TThread, there's more info here: Threading in Delphi.
Here's my attempt to do a threaded string parser (by directly extending the TThread class):
TStringParserThread and TStringChildThread
Note that I have a master thread "TStringParserThread" which will receive the input string and output the results. The main work is being done by child threads ("TStringChildThread") which are private classes defined inside the TStringParserThread.Here's the interface part:
The constructor of the master thread receives the input string.
type
//master
TStringParserThread = class(TThread)
private
//child
type TStringChildThread = class(TThread)
private
StringIn : string;
Results : TStringList;
Tag : integer;
protected
procedure Execute; override;
public
constructor Create(const inPartialString : string);
destructor Destroy(); override;
end;
private
Results : TStringList;
StringIn: string;
Children : array of TStringChildThread;
ChildrenResults : array of TStringList;
procedure ChildOnTerminate(sender : TObject);
protected
procedure Execute; override;
procedure DoTerminate; override;
public
destructor Destroy(); override;
constructor Create(const inFullString : string);
end;
I have an array of child threads (Children) and also an array of the child results (ChildrenResults)
The main work of a thread happens in its overridden Execute method.
Here's how it looks for the master thread:
The Execute method of a child is rather simple:
procedure TStringParserThread.Execute;
const
numChildren = 2;
var
i,
charsPerChild,
startIndex : integer;
cFinished : boolean;
partial : string;
begin
inherited;
charsPerChild := Length(StringIn) DIV numChildren;
SetLength(Children, numChildren);
SetLength(ChildrenResults, numChildren);
for i := 0 to -1 + numChildren do
ChildrenResults[i] := TStringList.Create;
//create and assign slices per children
for i := 0 to -1 + numChildren do
begin
startIndex := i * charsPerChild + 1;
if i = -1 + numChildren then charsPerChild := MaxInt;
partial := Copy(StringIn, startIndex, charsPerChild);
Children[i] := TStringChildThread.Create(partial);
end;
//run children
for i := Low(Children) to High(Children) do
begin
Children[i].OnTerminate := ChildOnTerminate;
Children[i].Tag := i;
Children[i].Start;
end;
//wait until all children finished
cFinished := false;
while NOT (Terminated OR cFinished) do
begin
cFinished := true;
for i := Low(Children) to High(Children) do
cFinished := cFinished AND Children[i].Finished;
end;
//collect results from children
for i := Low(Children) to High(Children) do
begin
Results.AddStrings(ChildrenResults[i]);
end;
//after here the OnTerminate for the master thread
//would be raised and handled in the app's StringParserThreadTerminated method
end;
When each child thread is finished extracting characters from the given string, the OnTerminated event for a child is used to grab the result and store it in the ChildrenResults of the master thread:
procedure TStringParserThread.TStringChildThread.Execute;
var
c : char;
begin
inherited;
for c in StringIn do
begin
Results.Add(c);
//add some "workload"
Sleep(Random(100));
end;
end;
The rest of the thread implementation code is related to cleanup, initialization and alike (you can download the full code below).
procedure TStringParserThread.ChildOnTerminate(sender: TObject);
begin
ChildrenResults[TStringChildThread(Sender).Tag].AddStrings(TStringChildThread(Sender).Results);
end;
Now, lets see how all this is run, by a click of a button:
The stringParserThread is a private filed of a form. The inputted text comes from Edit1. When the thread is terminated its results are collected in the StringParserThreadTerminated procedure which handles the OnTerminate event. This is a safe moment to update your main application GUI:
begin
stringParserThread := TStringParserThread.Create(Edit1.Text);
stringParserThread.OnTerminate := StringParserThreadTerminated;
stringParserThread.Start;
end;
Just storing the extracted characters contained in the Results filed (TStringList) to the Edit2 control. Am using "," as a delimiter and the DelimitedText property to have a nicer output.
procedure TForm.StringParserThreadTerminated(sender: TObject);
begin
TStringParserThread(Sender).Results.Delimiter := ',';
Edit2.Text := TStringParserThread(Sender).Results.DelimitedText;
end;
Basically, that's it. I have a threaded string parser :)
There's more code to the full implementation as you can see from the interface section. The Create constructor receives the input string and also sets FreeOnTerminate to true (so the thread is freed when terminated), there's also some cleanup in the overridden destructors.
Grab the full source code from here: Threaded string parser Source Code and explore / upgrade if you like.
The Omni Thread Library - The Final Rescue
I've noted that my original task (the one with files) will be hard to code and maintain and I also need a thread pool and lots more things happening, so I decide to give Omni Thread Library (OTL) a try. OTL is a threading library developed by Primož Gabrijelčić and I must say is a MUST if you ever start playing with threads.Gabriel was kind enough to also solve the above problem using OTL: OmniThreadLibrary in Practice [2]–Background Worker and List Partitioning.
My next task was to have many of those string parsers (master threads) work at the same time. For this I would need a pool to be able to control how many threads (that is: files being processed) are actually running at one moment, but still use the fire-and-forget approach. More on that very soon.

