Problem Description
PhraseFinder searches for set of valid words
which can be formed from a given set of letters. The user specifies
the smallest and largest word lengths and the number of words in the
output phrase. Words are validated against one of several available
dictionaries, again specified by the user.
Background & Techniques
This program was motivated by a recent puzzle in the Mensa Puzzle-A-Day
calendar. The twenty-four letters ABCDEEFHIKLLLLOOOORRTWWY
are an analog of a "famous movie direction". What
is it phrase? The answer, which I didn't get, is "Follow
the yellow brick road". But that led me to wonder if I could
have found it with a program. The answer turns out to be,
"probably not" - it is the 29,275th solution and takes about 45
minutes to find on my older laptop, even though "solutions" are
found at the rate of about 10 per second. And this
is only up to the "B's"!
But it is much faster and more capable than the older
"Unscramble" program previously published as part of our
Wordstuff program. It does work great for finding anagram
phrases from grandkids full names.
A dictionary file, "general.dic", is included with the
downloadable zip file and contains about 17,000 words. Additional larger
and smaller dictionaries may also be downloaded. You may
provide your own word lists as text files, one word per line.
Non-programmers are welcome to read on, but may
want to skip to the bottom of this page to
download executable version of the program.
Notes for Programmers
The original unscramble program found phrases by
permuting subsets of letters and checking them against a dictionary - a
slow process. This version treats letter groups as
"multisets'., sets of elements which may have repeated
elements. The common data structure used to represent multisets is an array of integers, each integer representing the count of elements
corresponding to the index of that set element. In our case, working with
letters, our TMSetRec type is an array of 26 integers
representing the number of occurrences of the letters 'A' through
'Z'.
Three primary functions handle the multisets: MakeMSet
takes a string of letters and returns a TMSetrec. IsSubset
takes two multisets and returns true if the second is a subset of the
first.. And SetDiff subtracts one multiset from
another.
Rather than pass all possible permutations of
word candidates against the dictionary, in this version, we make a sorted
string list of all dictionary words which are within the given length
range and whose multiset representation is a subset of the master set of
letters. The recursive Findwords procedure passes
through this list looking for words which are subsets of the remaining
unused letters. When one is found, it is subtracted from the
remaining letters and a recursive call is made to find the next
word. After the final word is found, if the remaining letters set is
empty, we have found a solution.
Running/Exploring the Program