Basic Statistical methods — Probability

The basic approach statistical methods adopt to deal with uncertainty is via the axioms of probability:

Probabilities are (real) numbers in the range 0 to 1.
A probability of P(A) = 0 indicates total uncertainty in A, P(A) = 1 total certainty and values in between some degree of (un)certainty.
Probabilities can be calculated in a number of ways.

Very Simply

Probability = (number of desired outcomes) / (total number of outcomes)

So given a pack of playing cards the probability of being dealt an ace from a full normal deck is 4 (the number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the probability of being dealt a spade suit is 13 / 52 = 1/4.

If you have a choice of number of items k from a set of items n then the formula is applied to find the number of ways of making this choice. (! = factorial).

So the chance of winning the national lottery (choosing 6 from 49) is to 1.

Conditional probability, P(A|B), indicates the probability of of event A given that we know event B has occurred.

Bayes Theorem

This states:

This reads that given some evidence E then probability that hypothesis is true is equal to the ratio of the probability that E will be true given times the a priori evidence on the probability of and the sum of the probability of E over the set of all hypotheses times the probability of these hypotheses.
The set of all hypotheses must be mutually exclusive and exhaustive.
Thus to find if we examine medical evidence to diagnose an illness. We must know all the prior probabilities of find symptom and also the probability of having an illness based on certain symptoms being observed.

Bayesian statistics lie at the heart of most statistical reasoning systems.

How is Bayes theorem exploited?

The key is to formulate problem correctly:

P(A|B) states the probability of A given only B‘s evidence. If there is other relevant evidence then it must also be considered.

Herein lies a problem:

All events must be mutually exclusive. However in real world problems events are not generally unrelated. For example in diagnosing measles, the symptoms of spots and a fever are related. This means that computing the conditional probabilities gets complex.

In general if a prior evidence, p and some new observation, N then computing

grows exponentially for large sets of p

All events must be exhaustive. This means that in order to compute all probabilities the set of possible events must be closed. Thus if new information arises the set must be created afresh and allprobabilities recalculated.

Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.

Knowledge acquisition is very hard.
Too many probabilities needed — too large a storage space.
Computation time is too large.
Updating new information is difficult and time consuming.
Exceptions like “none of the above” cannot be represented.
Humans are not very good probability estimators.

However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning systems with suitable enhancement to overcome the above problems.

We will look at three broad categories:

Certainty factors,
Dempster-Shafer models,
Bayesian networks.