4.4 Expectaion
One of the most important concepts in probability theory is that of the expectation of a random variable.
If X is a discrete random variable taking on the possible values x1, x2,..., then the expectation or expected value of X, denoted by E[X], is defined by
In words, the expected value of X is a weighted average of the possible values that X can take on, each value being weighted by the probability that X assumes it.
For instance, if the probability mass function of X is given by
p(0) = 1/2 = p(1)
then
E(X) = 0(1/2) + 1(1/2)
is just the ordinary average of the two possible values 0 and 1 that X can assume.
On the other hand, if
p(0) = 1/3, p(1) = 2/3
then
E(X) = 0(1/3) + 1(2/3) = 2/3
is a weighted average of the two possible values 0 and 1 where the value 1 is given twice as much weight as the value 0 since p(1) = 2p(0).
Another motivation of the definition of expectation is provided by the frequency interpretation of probabilities.
This interpretation assumes that if an infinite sequence of independent replications of an experiment is performed, then for any event E, the proportion of time that E occurs will be P(E).
Now, consider a random variable X that must take on one of the values x1, x2,...xn with respective probabilities p(x1), p(x2),....p(xn); and think of X as representing our winnings in a single game of chance.
That is, wiht probability p(xi) we shall win xi units i = 1,2,...n.
Now by the frequency interpretation, it follows that if we continually play this game, then the proportion of time that we win xi will be p(xi).
Since this is true for all i = 1,2,...,n, it follows that our average winnings per game will be
To see this argument more clearly, suppose that we play N game where N is very large.
Then in approximately Np(xi) of these games, we shall win xi, and thus our total winnings in the N games will be
implying that our average winning per game are
...................................................................................................................................
Example 4.4a Find E[X] where X is the outcome when we roll a fair die.
SOLUTION: Since p(1) = p(2) =p(3) = p(4) = p(5) =p(6) = 1/6
E[X] = 1(1/6) +2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) +6(1/6) = 7/2
The reader should note that, for this example, the expectated value of X is not a value that X could possible assume.
(That is, rolling a die cannot possibly lead to an outcome of 7/2.)
Thus, even though we call E[X] the expectation of X, it should not be interpreted as the value that we expect X to have but rather as the average value of X in a large number of repetitions of the experiment.
That is, if we continually roll a fair die, then after a large number of rolls the average of all the outcomes will be approximately 7/2.
(The interested reader should try this as an expectation)
......................................................................................................................................
......................................................................................................................................
Example 4.4b If I is an indicator random variable for the event A, that is, if
Hence, the expectation of the indicator random variable for the event A is just the probability that A occurs.
........................................................................................................................................
...................................................................................................................................
Example 4.4c Entropy For a given random variable X, how much information is conveyed in the message that X=x?
Let us begin our attempts at quantifying this statement by agreeing that the amount of information in the message that X=x should depend on how likely it was that X would equal x, the more information would be the message.
For instance, if X represents the sum of two fair dice, then there seems to be more information in the message that X equals 12 than there would be in the message that X equals 7, since the former event has probability 1/36 and the latter 1/6.
Let us denote by I(p) the amount of information contained in the message that an event, whose probability is p, has occurred.
Clearly I(p) should be a nonnegative, decreasing function of p.
To determine its form, let X and Y be independent random variables, and suppose that P{X=x} = p and P{Y=y} = q.
How much information is contained in the message that X equals x and Y equals y?
To answer this, note first that the amount of information in the statement that X equals x is I(p).
Also, since knowledge of the fact the X is equal to x dose not affect the probability that Y will equal y (since X and Y are independent), it seems reasonable that the additional amout of information contained in the statement that Y=y should equal I(q).
Thus, it seems that the amount of information is the message that X equals x and Y equals y is I(p) +I(q).
On the other hand, however, we have that
P{X = x, Y = y} = P{X = x}P{Y = y} = pq
which implies that the amount of information in the message that X equals x and Y equals y is I(pq).
Therefore, it seems that the function I should satisfy the identity
I (pq) = I(p) + I(q)
However, if we define the function G by
However, it can be shown that the only (monotone) functions G that satisfy the foregoing functional relationship are those of the form
G(p) = cp
for some constant c, Therefore, we must have that
or, letting
I (q) = -clog2(q)
for some positive constant c. It is traditional to let c=1 and to say that the information is measured in units of bits (short for binary digits).
No comments:
Post a Comment