[Home] [Puzzles & Projects] [Delphi Techniques] [Math Topics] [Library] [Utilities]
Here's a program that explores several probability distributions. This is the definitely a Math Topic, although there will be few if any equations in this write-up. Any good probability or statistics book will provide enough equations and formulas to confuse any normal human. The program on the the other hand, has considerable embedded math as discussed below.
Four of the most common and useful distributions are demonstrated here.
Each of the four operates the same way - the program generates a specified number of random samples drawn from a population with the distribution being demonstrated. For each sample, four charts are available: For discrete variables, each value is represented by a bar. If the distribution is continuous, then user specifies the number of "buckets" (bars) to create.
Here are sample charts from the Normal Distribution page:
Each chart shows the distribution of the sample as a bar chart overlaid with a line chart showing the theoretical distribution fort he entire population being sampled.
There is also a page illustrating the amazing Central Limit Theorem. The essence of the theorem is that the distribution of the sum of a large number of independent random variables with any distribution approximates a normal distribution! This probably account for the usefulness of the normal distribution in tracking errors, for example, since errors manufacturing often have many independent causes which sum to the total measured error in a finished product. The theorem also applies to samples sums of samples drawn from the same distribution. That is the case demonstrated in this program where we sum samples of a uniformly distributed random variable (each sample value has the same probability of occurring) to form new samples which are approximately normally distributed.
Non programmers are welcome to read on, but may want to skip to the bottom of the page to download the executable in zipped format.
Notes for Programmers
The charts in this program are generated using Delphi's T-Chart component. If you have the "Standard" or "Personal" editions of Delphi it will likely not include T-Chart and you will not be able to recompile. "Professional" and higher editions, at least for D5, D6, and D7, include T-Chart.
The four pages for the four distributions are very similar. Each page has some Tedit controls to obtain the parameters for that distribution, a "Create a set" button to make a set of random data points drawn from the distribution under study, and a TRadioGroup box to specify which of the four plot types is to be displayed.
The Tedits which collect integer values are associated with TUpDown controls just to shift the responsibility for editing input numbers back to Delphi. For real valued inputs, I used the Val procedure to detect invalid decimal number inputs.
The Create a set click procedure generates the data. Conceptually, we will choose random number between 0 and one and then apply the inverse probability distribution function to find the data value for this probability. For discrete distributions, one bar is assigned for each value. Continuous distributions have a user assigned number of "buckets", intervals to which data values are assigned. Each value or bucket is used to accumulate the number of samples which have that value (discrete) or which fall within that interval (continuous) . These FreqCount arrays provide the data for the plots. The Frequency chart is a straightforward plot of the FreqCount data. The Cumulative Frequency chart "integrates" the frequency data by adding the areas of the rectangles represented by the bars. Once we have these, the Probability Density, and the Probability Distribution charts are simply copies of the first two with bar frequency counts normalized so that they sum to 1 , by dividing each value by the number of samples taken.
Once actual data has been handled, there remains the problem of overlaying a theoretical line chart corresponding to this distribution. Since the data bars are centered on the value, for continuous distributions we add half of the interval to the data point being evaluated. For the Normal distribution, there is no explicit formulas for the cumulative distribution, so we resort to summing the areas of the current rectangle plus the previous rectangles to get the Cumulative Frequency and Cumulative Probability charts. Exponential has an explicit formula which we use for those charts.
The Central limit page illustrates the theorem by summing fixed sized sets of uniformly distributed random numbers and plotting the resulting frequency chart to show the characteristic bell shaped curve that emerges and the number of samples and the subset size increases.
I'm sure that thee are a number of bugs left in this program. If you happen find one,. use the feedback link to let me know.
Copyright © 2000-2013, Gary Darby
All rights reserved.