WordStuff #1

[Home]   [Puzzles & Projects]    [Delphi Techniques]   [Math topics]   [Library]   [Utilities]

 

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ).     There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi.  They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out.  Please use the feedback link to let me know if the link stops working.

 

Support DFF - Shop

 If you shop at Amazon anyway,  consider using this link. We receive a few cents from each purchase.   Thanks.

 

Support DFF - Donate

 If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

Contact

Feedback:  Send an e-mail with your comments about this program (or anything else).

Search DelphiForFun.org only

 

 

 

Problem Description

This is the first installment of a series of program about words. I've included 2 programs here. The first, DicMaint, introduces the dictionary structure and provides code to maintain them. The second program, CrosswordHelper, is a word completion program, displays a list of dictionary words matching a mask of known letters

Background & Techniques

The first requirement for many word manipulation problems  is a dictionary. Not the the kind with definitions, just the kind with a list of valid words. The TDic object compresses a wordlist to about half of its uncompressed size. The wordlist is maintained as a TStringList object. The initial letters of each word in the list are replaced by a byte with the count of letters which match the preceding word. Unused bits of this byte also flag foreign words and abbreviations.

To speed processing, a letter index is maintained pointing to the first word in the list for each letter.  The SetRange method defines the beginning and ending  initial letters and the shortest and longest words to be retrieved.   GetNextWord retrieves words within this range and returns false when no more words are available. Other methods load and save dictionaries (in compressed or uncompressed form), add and remove words, lookup words, etc.  A request to load a dictionary with an extension of .txt will scan a text file and extract  all unique words as a dictionary.  A request to save a dictionary with an extension of .txt will build an expanded word list with one word per line.  

Just to get us started, I've also included CrosswordHelper, a simple program using the Tdic class to find all words matching a given mask.  Unknown letters are entered as _ characters. For example, using Full.dic. the mask "_n_e" returns "ante", "knee", and "once". 

CrosswordHelper addendum,  Jan 20,2001:  I added mask characters "?" as a synonym for "_",  and "*" to represent any number of unknown characters.  Works great to find rhyming words for you poets out there!   Implementation was simplified when I ran across the  MatchesMask function included in Delphi's Mask unit. 

I've put three dictionaries in a separate download file.   Full.dic contains about 60,000 words. General.dic about 16,000 and Small.dic about 1500 words. All should be considered works in progress. Any errors for suggestions for improvements will be appreciated.  Small.dic is duplicated with each of the source and object downloads, so that any download should be runnable, even though you'll want to use one of the larger dictionaries for most purposes.   In general, I'd say that for checking words, you'll want to use the largest dictionary and for  pprograms that generate words, you would be better served by one of the smaller dictionaries. 

Running/Exploring the Program 

Suggestions for Further Explorations

My granddaughter's electronic Hangman game  claims to have an 8,000 word dictionary. It also has "categories", I'll have to borrow it from her to check this out but categories sounds like a good idea for that application.   Perhaps a descendant of TDic, or a special header word in at the start of the dictionary could specify that each word has an added category byte. Category names would also be included in the dictionary and an index of category counts built at load time (to allow random selections of word within a category).
Normal Readln text file code is used to read text files that are not compressed dictionaries.  I have encountered a problem in one case where the entire text file is a single record.  The maximum record (line) size for Readln is 255 characters.  The solution is to convert  to Blockread logic, but I decided I didn't want to read that file anyway.   I'll just put this on the back burner along with all the other stuff I'll probably never get around to.

 


  
  [Feedback]   [Newsletters (subscribe/view)] [About me]
Copyright 2000-2016, Gary Darby    All rights reserved.