Parsing Sentences

[Home]   [Puzzles & Projects]    [Delphi Techniques]   [Math topics]   [Library]   [Utilities]

 

 

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ).     There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi.  They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out.  Please use the feedback link to let me know if the link stops working.

 

Support DFF - Shop

 If you shop at Amazon anyway,  consider using this link. 

     

We receive a few cents from each purchase.  Thanks

 


Support DFF - Donate

 If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

Mensa® Daily Puzzlers

For over 15 years Mensa Page-A-Day calendars have provided several puzzles a year for my programming pleasure.  Coding "solvers" is most fun, but many programs also allow user solving, convenient for "fill in the blanks" type.  Below are Amazon  links to the two most recent years.

Mensa® 365 Puzzlers  Calendar 2017

Mensa® 365 Puzzlers Calendar 2018

(Hint: If you can wait, current year calendars are usually on sale in January.)

Contact

Feedback:  Send an e-mail with your comments about this program (or anything else).

Search DelphiForFun.org only

 

 

 

This demo for Delphi programmers is an extension of a previously posted Parsing Strings program.

It's a text parser which helps identify paragraphs, sentences, words and delimiters in plain text files. Paragraphs are defined by blank lines. Sentences in this demo are identified by trailing period (.),  exclamation point (!),or  question mark (?) as ending delimiters.

It was written to help a teacher in Indonesia who teaches English as a foreign language and is working on an automated translator to provide sample material for some Computer Based Training he is developing.  I wished him luck but warned him that identifying sentences and paragraphs is likely to be the easiest part of the job. 

The "GetNextWord" function takes a string as input and returns a Boolean result: "True" if a word was found and "False" when there are no more words in the string.  This is a non-destructive version which returns more information than the GetWord function of the Parsing Strings program.    In addition to the word found, this version returns the trailing delimiters of the word, an index of the starting point of the next location to check and   Boolean flags indicating whether the current word is End-of-Sentence (EOS) and End-of-Paragraph (EOP). Single Carriage Return (CR) and Linefeed (LF) pairs indicating the end of a line are ignored. Double CR-LF pairs indicate the EOP condition.

Buttons allow loading a  text file to process, a file of abbreviations,  and starting the parsing operation which will display the parsed results in a separate display area.

Version 2 adds abbreviation checking. One of the problems with checking periods as sentence delimiters is that abbreviations containing periods will be treated as EOS.  Pass a sorted TStringlist list containing abbreviations along with the results of GetNextWord to the "IsAbbreviation" function and it will reconstruct matching entries with the proper parameters. With the provided sample abbreviation list, such entries as "Mr.",Dr.", "e.g.", "vs." etc. will be detected correctly.  Ambiguous conditions when a sentence ends with an abbreviation that is not the end of a paragraph may not be handled properly in all cases.

The results of the parsing operation are displayed in color-coded form in a TRichEdit control.  The technique required to use colored text on part of a line (e.g. the delimiters displayed in red) is not obvious.  It can be accomplished by setting Selstart property to the desired text insertion point, setting SelAttribs to the desired  attributes and then setting Seltext to the text to be inserted like this:

if trim(delims)<>'' then
begin  
{non-blank delimiters exist}
    SelStart := GetTextLen-2;
 {set SelStart to after the word  but before trailing CRLF}
    selattributes.color := clred;
    selattributes.style:=[fsbold];
    seltext := delims;
  {Selstart sets SelLength=0, and ==> seltext is inserted}
end;

Also the TMemo trick of setting SelStart and Sellength to 0 to force the first line of he display to scroll into view, does not work with TRichEdit. Instead we need to generate a EM_SCROLLCARET
windows message.


  

Download and Explore Programs

Click to download source code

Click here to download executable program

Future Explorations


 
  [Feedback]   [Newsletters (subscribe/view)] [About me]
Copyright © 2000-2018, Gary Darby    All rights reserved.