1. Technology

Threaded Delphi String Parser

Breaking Down String To Characters In Threads

By

Threaded String Parser in Delphi
Threaded String Parser in Delphi
I have an application which has to process files on the file system in a way that each file is accessed / opened and some work is being done on a file (let's call this work "scanning").

This sequential approach is, as you might expect, rather slow when there are lots of files to be processed - i.e. the next file is being opened and scanned when the previous file is finished. The result of this scanning is a collection of scanning results stored in a TStringList.

Having in mind that today many machines have more than one core and that scanning more files at the same time should speed up the entire process I've decided to go for a threaded approach - where several files are being scanned at the same time.

To make it more easy to explain what I have done, my scanning can be simplified into the following task: split a string into several pieces and have characters for each piece be stored in the TStringList.

String Parsing Multiple Threads

Having the simplified task described above, here's the problem and the expected solution:
  1. For a string 'delphi' I would call my "master" thread.
  2. The input string will be sliced into several pieces where each piece would be processed by a "child" thread.
  3. If I would have 2 child threads (the number depends on the length of the input string), each thread would take half of the input string.
  4. The result of all this work (by master thread) is a TStringList containing characters for the inputted string : 'd','e','l','p','h','i'.
  5. The first child thread would receive "del" and output 'd','e','l'.
  6. The second child thread would receive "phi" and output 'p','h','i'.
  7. Child threads output their results to the master thread.
  8. Master thread updates the main application GUI.
There are some undecided requisites, for the moment, like should the order of the characters match their order in the input string. What happenes when I want to terminate all processing, and alike. But, more on that later...

There are several Threading Libraries For Delphi available to Delphi developers which will hide all the "how to thread" complexity from you and expose only what is just-enough for you not to runaway from synchronization and critical sections for data protection.

For this task I've decided to first take what Delphi has to offer and that is the TThread class.

To have a piece of code you would like to be executed out of the main thread of the application, you would create a class that extends TThread. I will not go into basics around extending TThread, there's more info here: Threading in Delphi.

Here's my attempt to do a threaded string parser (by directly extending the TThread class):

TStringParserThread and TStringChildThread

Note that I have a master thread "TStringParserThread" which will receive the input string and output the results. The main work is being done by child threads ("TStringChildThread") which are private classes defined inside the TStringParserThread.

Here's the interface part:

type
  //master
  TStringParserThread = class(TThread)
  private
  //child
    type TStringChildThread = class(TThread)
    private
      StringIn : string;
      Results : TStringList;
      Tag : integer;
    protected
      procedure Execute; override;
    public
      constructor Create(const inPartialString : string);
      destructor Destroy(); override;
    end;

  private
    Results : TStringList;
    StringIn: string;
    Children : array of TStringChildThread;
    ChildrenResults : array of TStringList;

    procedure ChildOnTerminate(sender : TObject);
  protected
    procedure Execute; override;
    procedure DoTerminate; override;
  public
    destructor Destroy(); override;
    constructor Create(const inFullString : string);
  end;
The constructor of the master thread receives the input string.
I have an array of child threads (Children) and also an array of the child results (ChildrenResults)
The main work of a thread happens in its overridden Execute method.

Here's how it looks for the master thread:

procedure TStringParserThread.Execute;
const
  numChildren = 2;
var
  i,
  charsPerChild,
  startIndex : integer;
  cFinished : boolean;
  partial : string;
begin
  inherited;

  charsPerChild := Length(StringIn) DIV numChildren;

  SetLength(Children, numChildren);

  SetLength(ChildrenResults, numChildren);
  for i := 0 to -1 + numChildren do 
	  ChildrenResults[i] := TStringList.Create;

  //create and assign slices per children
  for i := 0 to -1 + numChildren do
  begin
    startIndex := i * charsPerChild + 1;
    if i = -1 + numChildren then charsPerChild := MaxInt;

    partial := Copy(StringIn, startIndex, charsPerChild);

    Children[i] := TStringChildThread.Create(partial);
  end;

  //run children
  for i := Low(Children) to High(Children) do
  begin
    Children[i].OnTerminate := ChildOnTerminate;
    Children[i].Tag := i;
    Children[i].Start;
  end;

  //wait until all children finished
  cFinished := false;
  while NOT (Terminated OR cFinished) do
  begin
    cFinished := true;
    for i := Low(Children) to High(Children) do
      cFinished := cFinished AND Children[i].Finished;
  end;

  //collect results from children
  for i := Low(Children) to High(Children) do
  begin
    Results.AddStrings(ChildrenResults[i]);
  end;

  //after here the OnTerminate for the master thread 
  //would be raised and handled in the app's StringParserThreadTerminated method
end;
The Execute method of a child is rather simple:
procedure TStringParserThread.TStringChildThread.Execute;
var
  c : char;
begin
  inherited;

  for c in StringIn do
  begin
    Results.Add(c);
    //add some "workload"
    Sleep(Random(100));
  end;
end;
When each child thread is finished extracting characters from the given string, the OnTerminated event for a child is used to grab the result and store it in the ChildrenResults of the master thread:
procedure TStringParserThread.ChildOnTerminate(sender: TObject);
begin
  ChildrenResults[TStringChildThread(Sender).Tag].AddStrings(TStringChildThread(Sender).Results);
end;
The rest of the thread implementation code is related to cleanup, initialization and alike (you can download the full code below).

Now, lets see how all this is run, by a click of a button:

begin
  stringParserThread := TStringParserThread.Create(Edit1.Text);
  stringParserThread.OnTerminate := StringParserThreadTerminated;
  stringParserThread.Start;
end;
The stringParserThread is a private filed of a form. The inputted text comes from Edit1. When the thread is terminated its results are collected in the StringParserThreadTerminated procedure which handles the OnTerminate event. This is a safe moment to update your main application GUI:
procedure TForm.StringParserThreadTerminated(sender: TObject);
begin
  TStringParserThread(Sender).Results.Delimiter := ',';
  Edit2.Text := TStringParserThread(Sender).Results.DelimitedText;
end;
Just storing the extracted characters contained in the Results filed (TStringList) to the Edit2 control. Am using "," as a delimiter and the DelimitedText property to have a nicer output.

Basically, that's it. I have a threaded string parser :)

There's more code to the full implementation as you can see from the interface section. The Create constructor receives the input string and also sets FreeOnTerminate to true (so the thread is freed when terminated), there's also some cleanup in the overridden destructors.

Grab the full source code from here: Threaded string parser Source Code and explore / upgrade if you like.

The Omni Thread Library - The Final Rescue

I've noted that my original task (the one with files) will be hard to code and maintain and I also need a thread pool and lots more things happening, so I decide to give Omni Thread Library (OTL) a try. OTL is a threading library developed by Primož Gabrijelčić and I must say is a MUST if you ever start playing with threads.

Gabriel was kind enough to also solve the above problem using OTL: OmniThreadLibrary in Practice [2]–Background Worker and List Partitioning.

My next task was to have many of those string parsers (master threads) work at the same time. For this I would need a pool to be able to control how many threads (that is: files being processed) are actually running at one moment, but still use the fire-and-forget approach. More on that very soon.

©2014 About.com. All rights reserved.