9911
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


X K
*
f'
V
V
X
•v
9912
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


a
oo
QQ en
o o
UJ
a
xl"
o
.i
9913
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


9914
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PROBABILITY
An Introduction
9915
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PRENTICEHALL MATHEMATICS SERIES
ALBERT A. BENNETT, EDITOR
9916
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PROBABILITY
An Introduction
SAMUEL GOLDBERG
n
DEPARTMENT OF MATHEMATICS OBEELIN COLLEGE
PRENTICEHALL, INC.
ENGLEWOOD CLIFFS, 1ST. J.
9917
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


©—1960, by PRENTICEHALL, INC., ENGLEWOOD CLIFFS, IT. J.
ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAT BE REPRO BUCED IN ANY FORM, BY MIMEOGRAPH OR ANY OTHER MEANS WITHOUT PERMISSION IN WRITING FROM THE PUBLISHERS
Library of Congress Catalog Card Number: 6010871
Current printing (last digit) 16 15 14 13 12 It 10
PRINTED IN THE UNITED STATES OF AMERICA
71158—C
9918
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PREFACE
THIS BOOK is intended for all who require a mathematically sound^ but elementary introduction to the theory of probability.
Probability concepts are now of great importance in a wide variety of fields. The theory of probability, as the foundation upon which the methods of statistics are based, should command the attention of those who want to understand as well as apply statistical techniques. Probabilistic theories, making explicit reference to the nature and effects of chance phenomena, are the rule rather than the exception in the physical and biological sciences. Less well known is the fact that probability concepts are finding increased use in the social sciences and business: psychologists develop stochastic models for learning; economists use the techniques of game theory to discuss competition and markets ; expected values, variances, and other matters related to random variables turn out to be important in the problem of finding combinations of securities that best meet the needs of the investor; business managers, because their decisions must be made in the face of uncertainty, invoke the theory of probability as an aid in planning inventory, establishing quality control, designing market surveys, etc. We need not go on — it is clear that probability concepts and methods are now widely used and will see even more extensive use in the future.
One noteworthy indication of the importance of our subject is the recent decision of the Commission on Mathematics of the College Entrance Examination Board to recommend that a course in probability and statistical inference be offered in the twelfth grade of the secondary school. Thus, secondary school teachers of mathematics, at some point in their college or inservice training, or in summer
KANSAS CtTY (MO) iHWLIC UBRARY
9919
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


vl PREFACE
Institutes (such as those sponsored by the National Science Foundation) , should achieve some mastery of the elements of probability theory. Parts of this book were used in courses offered in NSF Institutes held at Oberlin College in 1958 and 1959, and the final manuscript has benefited from the helpful comments of the many teachers who studied preliminary versions.
Although there are a number of excellent textbooks on probability, they are all written for readers who have the mathematical sophistication that comes with a working knowledge of the differential and integral calculus. It seemed to me worthwhile to bring the theory of probability to the attention of those who do not have the calculus prerequisite. It was with this aim in mind that I limited myself to those topics that are accessible to readers with only a good background in high school algebra and a little ability in the reading and manipulation of mathematical symbols. The consequent limitation to finite sample spaces, although severe, facilitates a careful logical treatment of the essentials needed by all who use probability concepts. Furthermore, I have found that an understanding of the basic definitions, theorems, and methods in the finite case makes it much easier for students with the necessary preparation to master the corresponding ideas in the infinite case. I am therefore hopeful that this volume, although written as a basic textbook for courses in probability and statistics for students without calculus, will also prove useful in courses for those who have previous training in calculus.
One further possible use of this book is worthy of mention here. There are many college students who, for one reason or another, can take at most one year of mathematics. These students are often offered a smorgasbord survey course in which they sample one topic after another and learn very little about lots of things. Many teachers, however, prefer to offer a course centering on a few main topics, going into each systematically and deeply enough to give the student a reasonable depth of knowledge in the chosen subjects. Although many topics vie for inclusion in such a program, I believe a strong case can be made for a course that concentrates on sets and probability in the finite case at first, proceeds to an introduction to the calculus, and then applies this calculus to the elements of probability in the infinite case. (In my own course, I also include applications
99110
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PREFACE vii
of the differential calculus to simple problems in economics.) Such a course, if properly executed, can give the student a keen sense of the nature and achievements of mathematical thinking, while laying a firm foundation for further study in economics, statistics, operations research, or allied fields. Such a program would therefore be especially valuable for social science and business students, assuming they can devote only a year to mathematics at the college level. I have used this volume in preliminary form in roughly the first third of such a year course at Oberlin College, with students who present less than three years of high school mathematics for entrance. Teachers who share my point of view may also find this book useful in their own introductory mathematics courses.
Since the theory of probability is best formulated using the language and notation of sets, we devote the first chapter to the elementary mathematics of sets. Proofs of laws in the algebra of sets are simplified by the use of socalled membership tables, a device analagous to truth tables in logic. Here we also introduce Cartesian product sets, which are needed at many points throughout the book.
Chapter 2 develops the basic calculus of probability for experiments with only a finite number of possible outcomes (finite sample spaces). A probability measure is first introduced over the events of a sample space and then conditional probability, independent events, and independent trials are carefully defined. Illustrative and problem material is here limited to the simplest experimental situations, and more sophisticated combinatorial techniques are first treated in Chapter 3. The usual order of topics has been reversed because beginning students seem always to have difficulty with the use of permutation and combination formulas, and this difficulty often impairs the learning of the basic probability ideas when both are presented simultaneously. We present the basic ideas first and then, in Chapter 3, offer a set of exercises in which the previously mastered probability theory is applied to a wide variety of situations requiring the use of sophisticated counting techniques. It has been our experience that this procedure makes it considerably easier for the student to learn this basic material.
Chapter 4 is an introduction to the analytic theory of probability in the finite case. Random variables are defined as functions on
99111
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


vi ii PREFACE
sample spaces, and probability distributions, means, standard deviations, joint probability functions, covariance, and correlation are discussed. Independence of random variables is defined and, with these ideas extended to the multivariate case, applications to random sampling theory can be included. The sampling distribution of the sample mean is discussed and formulas for its mean and variance are derived for both sampling with and without replacement.
The most important probability function defined on a finite sample space, the binomial distribution, forms the subject matter of the final chapter. The basic properties of a Bernoulli process and a binomially distributed random variable are derived, and the use of tables of cumulative binomial probabilities is discussed. Applications to the testing of statistical hypotheses (significance tests), as well as to a more complex problem of decisionmaking under uncertainty serve to illustrate how probability methods are applied in statistical investigations.
For some classes, teachers may find it necessary to offer supplementary lessons on the method of mathematical induction and the use of summation signs, as these topics arise in the text. I have also found that it is wise to constantly remind the beginning student of the substitution principle, for example, that from Var(Z) > 0 for all X it follows that Var(2Z  37) > 0. Much of the difficulty beginners have with mathematics stems from a lack of understanding of this principle, and it is well worth emphasis.
In all other respects, I have made every effort to have this book selfcontained, clear, and readable. Throughout, stress is laid on the explanation of fundamental concepts and patterns of mathematical reasoning, as well as on techniques of problemsolving. Problems atthe end of each section are designed to supplement the many workedout illustrative examples in the text and to enable the reader to check his understanding of new definitions, theorems, and methods. From time to time, problems are included to challenge the better student—the sample variance, maximum likelihood estimation, the hypergeometric distribution, regression functions, and OCcurves for sampling inspection are introduced in problems that are written so as to guide the student toward an understanding of these important topics. Answers (often complete solutions) to half of the 360 problems are collected in a 21page section at the end of
99112
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PREFACE ix
the book. To facilitate computations, tables of ordinary logarithms^, logarithms of factorials, and cumulative binomial probabilities are included in the text. A list of books suitable for supplementary reading appears at the end of each chapter. I trust that these features will serve to make the hard job of learning a little less hard. Comments from readers are always welcome.
SAMUEL GOLDBERG Cambridge, Mass.
99113
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


99114
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ACKNOWLEDGMENTS
I TAKE this opportunity to gratefully acknowledge my debt to Professor William Feller who, as my teacher, first showed me the beauty and importance of the mathematical theory of probability.
Part of the material on sets and probability in Chapters 1 and 2 was prepared in preliminary form and tested in the classroom under a grant by the Carnegie Corporation of New York to Oberlin College for experimentation in freshman mathematics.
Assistance of various kinds was rendered at Oberlin College during the summers of 1958 and 1959, when I offered a probability course in National Science Foundation Institutes for secondary school teachers of mathematics, by Bruce T. Marcus, David Webster, and especially by Edward T. Wong.
The manuscript was completely rewritten while I held a visiting appointment at the Harvard University Graduate School of Business Administration to teach at the Institute of Basic Mathematics for Application to Business. The Institute, which was sponsored by the Ford Foundation, arranged for the final typing of the manuscript. W. Allen Spivey read part of the manuscript and offered helpful comments. Howard Raiffa made numerous valuable suggestions and the book is much the better for his counsel. Robert Schlaifer kindly gave permission to use the material in the final section of Chapter 5. William A. Ericson read the manuscript and prepared the solutions to problems.
I am grateful to all these friends for their help and to each goes my sincere thanks.
99115
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


99116
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


CONTENTS
chapter 1 • SETS 1
1. Examples of sets; basic notation, 1
2. Subsets, 8
8. Operations on sets, 16
4. The algebra oj sets, 28
6. Cartesian product sets, 89
chapter 2  PROBABILITY IN FINITE SAMPLE SPACES 45
1. Sample spaces, 45
2. Events, 51
8. The probability of an event, 54 4 Some probability theorems, 64
5. Conditional probability and compound experiments, 74
6. Bayes' formula, 91
7. Independent events, 102
8. Independence oj several events, 107
9. Independent trials, 113
10. A probability model in genetics, 123
chapter 3  SOPHISTICATED COUNTING 132
1. Counting techniques and probability problems, 132
2. Binomial coefficients, 149
chapter 4 • RANDOM VARIABLES 158
1. Random variables and probability functions, 153
2. The mean oj a random variable, 172
99117
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


xiv CONTENTS
8. The variance and standard deviation of a random variable, 185
4. Joint probability functions; independent random variables, 197
5. Mean and variance of sums of random variables; thf sample mean, 212
6. Covariance and correlation; sample mean (cont.*), 282
chapter 5 * BINOMIAL DISTRIBUTION AND
SOME APPLICATIONS 252
1. Bernoulli trials and the binomial distribution, 252
2. Testing a statistical hypothesis, 27%
3. An example of decisionmaking under uncertainty} £86
ANSWERS TO ODDNUMBERED PROBLEMS 295 INDEX 317
99118
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Chapter 1 SETS
1. Examples of sets; basic notation
The concept of a set, whose fundamental role in mathematics was first pointed out In the work of the mathematician Georg Cantor (18451918), has significantly affected the structure and language of modern mathematics. In particular, the mathematical theory of probability is now most effectively formulated by using the terminology and notation of eets. For this reason, we devote Chapter 1 to the elementary mathematics of sets. Additional topics in set theory are included throughout the text, as the need for this material becomes apparent.
The notion of a set is sufficiently deep in the foundation of mathematics to defy being defined (at the level of this book) in terms of still more basic concepts. Hence, we can only aim here, by taking advantage of the reader's knowledge of the English language and his experience with the real and conceptual world, to make clear the denotation of the word "set."
A set is merely an aggregate or collection people, numbers, books, outcomes ofit ures, etc. Thus, we can speak of the set of all integers, or the set of all oceans, or the set of all possible sums when two dice are rolled and the number of dots on the uppermost faces are added, or the set consisting of the cities of Cambridge and Oberlin and all their resi
1
99119
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


2 SETS /
dents, or the set of all straight lines (in a given plane) which pass through a given point.
The collection of objects must be welldefined, by which we meanthat, for any object whatsoever, the question "Does this object belong to the collection?77 has an unequivocal "yes" or "no" answer. It is not necessary that we personally have the knowledge required to decide which answer is correct. .W
..
answers "yes" and "no," exactly one is correct.
Let us also agree that no object in a set is counted twice; i.e., the objects are distinct. It follows that, when listing the objects in a sety we do not repeat an object after it is once recorded. For example, according to this convention, the set of letters in the word "banana'y is a set containing not six letters, but rather the _thrMJJsM£LQJiJ.£ttt_ors b, a, and n.
The following definition summarizes our discussion to this point and introduces some additional terminology and notation.
Definition 1.1. A set is a welldefined collection of distinct objects. The individual objects that collectively make up a given set are called its elem^s, and each element belongs to or is a member of or is contained in the set. If a is an object and A a set, then we write a € A as an abbreviation for "a is an element of A" and a iA for "a is not an element of A!" If a set has a finite number of elements,, then it is called a jn^LJ§tJ otherwise it is called an
We are relying on the reader's knowledge of the positive integers 1, 2, 3, •  , the socalled counting or natural numbers. This is an infinite set of numbers. To say that a set is finite means that one can enumerate the elements of the set in some order, then count these elements one by one until a last element is reached. Let us note that it is possible for a set, like the set of grains of sand on the Coney Island beach, to have a fantastically large number of elements and nevertheless be a finite set.
A set is ordinarily specified either by (i) listing all its elements and enclosing them in braces (the socalled roster method of defining the set), or by (ii) enclosing in braces a ^^!MJE2£2^2/ and agreeing that those objects that have the property, and only those objects. are members of the set. We discuss these important ideas further and introduce additional notation in the following examples.
Example 1 .1 . The set whose elements are the integers 0, 5, and 12 is a finite set with three elements. If we denote this set by A, then it
99120
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / EXAMPLES OF SETS/ BASIC NOTATION 3:
is conveniently written using the roster method: A = {0, 5,12}. The statements "5 e A" and "6 i A" are both true.
Example 1.2. If we write V = (a, e, i, o, u}, then we have defined the set V of vowels in the English alphabet by listing its five elements. To specify V by a defining property we write
V = {x  x is a vowel in the English alphabet},
which is read "V is the set of those elements x such that # is a vowel in the English alphabet." Braces are always used when specifying a set; the vertical bar  is read "such that" or "for which." The symbol x is of course merely a placeholder; any other symbol will do just as. well. For example, we can also write
Y = {*  * is a vowel in the English alphabet}.
A slight modification of this notation is often used. Let us first introduce the set A to stand for the set of all letters of the English alphabet. Then we write
V = {* e A  * is a vowel},
which is read "V is the set of those elements * of A such that * is a, vowel."
Example 1.3. The set B = {—2,2} is the same set as {x e R  x2 = 4}, where R is the set of all real numbers. The set {x € R  #2 = — 1} has no elements, since the square of any real number is nonnegative. But if C is the set of all complex numbers,. then {x e C \ x2 = —1} contains the elements i = V — 1 and — i.
Example 1.4. A jrane^umber is a positive integer greater than 1 but divisible only by 1 and itself. A proof of the fact that the set {p  p is a prime number} is an infinite set was given by Euclid (?330275 B.C.) in the ninth book of his Elements. Strictly speaking, the roster method is unavailable for infinite sets, since it is not possible to list all the members and have explicitly before one a totality of elements making up an infinite set. The notation
{2,3,5,7,11,13,17,19,..},
in which some of the elements of the set are listed followed by threedots which take the place of et cetera and stand for obviously understood omissions of one or more elements, is an often used but logi
99121
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


4 SETS / Chap. 1
cally unsatisfactory way out of this difficulty. (See Problem 1.3.) To specify an infinite set correctly, one must (as we did when we introduced the set of prime numbers) cite a defining property of the set.
Example 1.5. If a rectangular coordinate system (with #axis and 2/axis) is introduced in a plane, then each point of the plane has an ^coordinate and a ^/coordinate, and can be represented, as in Figure l(a), by an ordered pair of real numbers. In analytic geometry, one
(5,5),
(0,0) *
* (4, 3)
(a)
(b)
Figure 1
is interested in sets of points whose coordinates meet certain requirements. For example, the set {(x,y)\ys= x} is the set of all points (in a plane) with equal x and ^coordinates. This infinite set of points makes up the straight line L, a portion of which is sketched in Figure l(a), passing through the origin 0 and bisecting the first and third quadrants. We say that the line L is the graph of the set {(x, y)  y = x}. Similarly, the entire £axis is the graph of the set {(xty) \y = 0}, and the positive zaxis is the graph of the set {(x, y)  x > 0 and y = 0}. The set {(x, y) \ x > 0 and y > 0} is the set of points whose x and ^coordinates are both positive. Thus, the graph of this set is the entire first quadrant (axes excluded), as indicated in Figure l(b).
We see that a relation (in the form of equalities or inequalities
between a; and y) can be considered a ^^ggfedor, and the graph pictures the set of those points (from among all in the plane) selected by the requirement that their coordinates satisfy the given relation.
99122
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / EXAMPLES OF SETS/ BASIC NOTATION 5
Although it may seem strange at first, it turns out to be convenient to talk about sets that have no members.
Definition 1.2. A jgtjgithjQo^]^^^J^^^^SL^S&J^^^
set.
The set {x e R \ x2 = — 1} in Example 1.3 is an empty set. Another example is obtained by considering the set of all paths by which the line drawing of a house in Figure 2 can be traced without lifting one's pencil or retracing any line segment. Whether this set is empty or not is of some interest, since to assert that it is empty is to say that the figure cannot be traced under the prescribed conditions. (Let the reader convince himself that this Figure 2 set is indeed empty.) As our work develops, we shall see many other less frivolous reasons for introducing the notion of an empty set.
We conclude this groundbreaking section with one more definition.
Definition 1.3. Two sets A and B are said to be equal and we write A = B if and only if they have exactly the same elements. If one of the sets has an element not in the other, they are unequal and we write A ~£ B.
Thus A = B means that every element of A is also an element of B and every element of B is an element of A. Equal sets are identical sets, and this identity is symbolized by the equality sign.
This definition has some interesting consequences. First, it is clear that the order in which we list the elements of a set is immaterial. For example, the set (a, 6, c} is equal to the set (c, a, 6}, since they do indeed have exactly the same three elements.
Also, when sets are specified by defining properties, they can be equal even though the defining properties themselves are outwardly diiferent. Thus, the set of all even prime numbers and the set of real numbers x such that x + 3 = 5 have different defining properties, yet they are equal sets, for each contains the number 2 as its only element.
Up to now, we have been careful to speak of a set having no members as an empty set. But it is clear from Definition 1.3 that any two empty sets are equal. For to be unequal it is necessary for one of the sets to contain an element not in the other, and this is impossible
99123
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


6 SETS / Chap. 1
since neither set contains any elements. Therefore we are justified in referring to the empty set or the null set.* We denote the null set by the special symbol 0.
PROBLEIVIS
i.l. We list eight sets. For each set, state whether it is finite or infinite. If finite, count the number of elements in the set. Where feasible, write the set using the roster method.
(a) The set of footnotes in Section 1.
(b) The set of letters in the word "probability."
(c) The set of odd positive integers.
(d) The set of prime numbers less than one million.
(e) The set of paths by which the following figure can be traced without lifting one's pencil or retracing any line segment:
2
Figure 3
(f) The set of those points (in a given plane) that are exactly five units from the origin 0.
(g) The set of real numbers satisfying the equation x2 — 3x + 2 = 0(h) The set of possible outcomes of the experiment in which one card
is selected from a standard deck of 52 cards.
1.2. The following paragraph was written by a student impressed with the
technical vocabulary of set theory. Rewrite in more usual English prose.
Let C be the set of Mr. and Mrs. Smith's children. C was equal to 0
until March 1, 1958. C contained exactly one element from that date
until March 15, 1959 when it increased its membership by two!
* The following true story concerns the attempt of a wellknown professor of mathematics to teach Ms fiveyearold son the subtle distinction between "a" (or "an") and "the." One day the son answered the telephone, listened a moment and then said, "I'm sorry, but you have the wrong number." (Isn't this what most of us say when someone dials incorrectly?) The father, having overheard, immediately called the boy to him and gently instructed, "What you said would be correct if there were exactly one wrong number. But since there are many possible wrong numbers, it would be more accurate to say, Tm sorry, but you have a wrong number/ "
99124
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / EXAMPLES OF SETS; BASIC NOTATION 7
1.3. To illustrate the inadequacy of displaying a few elements of a set and indicating the other elements by three dots, consider the set A of all numbers of the form
where n is any positive integer. Show that the first three elements (i.e., those obtained when n = 1, 2, 3) are 1, 4, and 9, so that one is tempted to write A = (1, 4, 9, • • • } . If A is written this way on an L Q. testj we do not hesitate to write the next element as 16. But show that the next element (obtained when n = 4) is actually 22 and not 16! Indeed, it is possible to write a defining property for a set so that its fourth element (in order of magnitude) is any number, say 94, although its first three elements are 1, 4, 9. Formulate such a defining property.
1.4. Let A = {0, 1, 2, 3, 4}. List the elements, if any, of each of the following sets:
(a) {x e A \ 2x  4 = 0} (b) {x e A \ x*  4 = 0}
(c) {x e A  x*  4x* + 3z = 0} (d) {x e A \ x2 = 0}
(e) {x e A  x + 1 > 0} (f) {x e A  2x + 1 < 0}
I (g) {x e A j x2  5z + 4 > 0} (h) {x e A \ x2  x < 0}
1.5. Let x and y be the coordinates of a point in the plane. Identify the following sets and give a geometric interpretation of your results:
(a) {(x, y)  x + y = 5 and 3x — y = 3}
(b) {(x, y)  x + y = 5 and 2x + 2z/ = 3}
j (c) {(x, y)  x + y = 5 and 2x + 2y = 10}
1.6. Show that set equality has the following properties:
(i) Set equality is a wfl^i^^ation; i*e«? ^ = ^ for any set A.
(u) Set equality is a ^ywm^^jrdgi^n; i.e., for any sets A and B, if
A = B, then B = A. (iii) Set equality is a tra^M^ej^^on; i.e., for any sets A, B} and C,
if A = B and B = C, then A = C.
(Note: A relation that is reflexive, symmetric, and transitive is called
1.7. Determine whether A = B or A ^ B.
(a) A = {2, 4, 6},5 ={4, 6, 2}.
(b) A = {1, 2, 3}, B = {Mars, Venus, Jupiter}.
(c) A = {*  * is a plane equilateral triangle}, B = {*  * is a plane equiangular triangle} .
(d) A = {x  x*  2x + 1 = 0}, B = {x  s  1 = 0}.
(e) 4 = {x j 2x2  5x + 2 = 0}, B = {x  2;r3  5z2 + 2z = 0}.
99125
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


8
1.8. Which of the following are true? Explain. (a) 2 = {2}, (b) 2 € {2}, (c) 0 = 0, (d) 0 60.
SETS / Chap. 1
2. Subsets
Each element of the set of vowels in the alphabet is, of course, an element of the set of all letters. Similarly, each number in {2, 4, 6} is an element of the set of all even integers, and each real number in {x \ x > 3} is also in {x \ x > 0}. In this section, we discuss the simple but important relation between sets illustrated by these examples.
Definition 2.1. A set A is a subset of set B, denoted by A C B, if each element of A is also an element of J5. We agree to call the null set 0 a subset of every set.
For example, we write {1, 3} C {1, 2, 3}, since each of the two elements in {1, 3} belongs to {1, 2, 3}. Also, {1, 3} C {x \ x > 1} and {1, 3} C {1, 3}. The definition of subset implies that a set is a subset of itself; i.e.,\A £ A is always true. We can express this fact using the language introduced in Problem 1.6 by saying that set inclusion (i.e., one set being a subset of another set) is a reflexive relation. It is also transitive, for if A C B and B C C, it follows that A C C. But set inclusion is not symmetric. As a counterexample, let A = {a} and B = {a, b}. Then A C B is true, but B C A is false.
It is noteworthy that the definition of set equality in the preceding
TABLE 1
Number
of
Set A n(A) Subsets of A Subsets
of A
0 0 0 K=2°)
w 1 0, {a} 2 (=20
ja,6( 2 0, {a}, {b}, {a, 6} 4 (=22)
{a, 6,c} 3 0, {a}, (6), {c}, {a, 6}, fa,c}, {6, c}, {a,6,c} 8 (=23)
0, fah {61, {c}, {d}, {a,6}, {a, c},
{a, 6, c, d} 4 {a, d}j {6, c}, (&, c?}, {c, d}, {a, 6, c}, 16 (=24)
{a, 6, d}, {a, c, d}, (6, c, d}, {a, 6, c, d]
99126
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / SUBSETS 9
section was formulated in terms of the subset relation. In fact, it is merely a restatement of Definition 1.3 to say that A = B if and only if A C B and B C A.
Table 1 illustrates the notion of subset, and also directs our attention to a formula relating the number of subsets of a set to the number of elements in the set. We denote the number of elements in A by n(A).
From the numbers in the last column of this table, we are led to conjecture that if n is any nonnegative integer, then a set with n elements has 2" subsets. Before proving this result is true, we need to enunciate a principle that is at the heart of most counting procedures, and that is used time and again in computing probabilities.
Fundamental Principle of Counting:
(a) If one task can be completed in A7! different ways and, following this, another task can be completed in N* different ways, then both tasks can be completed in the given order in NiNz different ways.
(b) More generally, suppose a certain job can be done by completing, in some specified order, n smaller units (which we shall call tasks), where n is any positive integer. The first task can be completed in NI different ways. Having finished the first task, the second can be completed in Nz different ways. Having finished the first two tasks, the third can be completed in Nz different ways. And so on until, having finished all but the last task, this nth task can be completed in Nn different ways. Then the entire job can be done in NiNzNs • • • Nn different ways, it being understood that two ways of doing the job are considered different if and only if there is at least
one task that is completed differently in the two ways.
The treediagram in Figure 4 illustrates (a) for the special case NI = 3 and #2 = 2. Starting from some point, we draw NI = 3 lines. From each of these lines, we draw Figure 4
Nz = 2 lines. The total number of
ways of completing task 1 and then task 2 is the same as the total number of branches in the tree.
When there are only two tasks, as in (a), the fundamental princi
99127
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


10 SETS / Chap.1
pie follows immediately from the definition of multiplication. For each of the N2 ways of doing task 2, we have Ari ways of first doing task 1. Hence both tasks can be done in a number of ways equal to A7i + NI + • • * + NI, where there are Ar2 summands. But this number is precisely the product NiN*
The general principle in (b) can be proved by mathematical induction. We leave this for Problem 2.10 and proceed to illustrate how one uses the fundamental principle of counting.
Example 2.1. We roll a green die and then a red die. How many ways can these dice come up? Our job can be thought of as recording the results of the two rolls. This can be done by first recording the number on the green die (task 1), and then recording the number on the red die (task 2). Task 1 can be done in six ways, and then task 2 can also be done in six ways. Hence, there are 6 • 6 = 36 possible ways that the two dice can come up.
Example 2.2. How many distinct threeletter "words" can be made, using the letters chosen from among those of "number," but with no letter used more than once in a "word"? Our job is to construct a threeletter "word" under the prescribed conditions. This job can be done by selecting the first letter (task 1), then the second letter (task 2), and finally the third letter (task 3). Task 1 can be done in any of six ways, since there are six letters available in "number." Having chosen one letter, there are only five remaining letters, and hence only five ways of completing task 2. Similarly, there are four ways of completing task 3 after the first two letters are chosen. Hence there are altogether 6 • 5 • 4 = 120 different threeletter "words" that can be formed.
Example 2.3. How many different fourofakind poker hands are there? Our job is to select a hand (subset) of five cards from the ordinary deck (set) of 52 cards in such a way that the hand contains four cards with the same facevalue. This job can be done by completing the following tasks in the stated order: (i) Choose one facevalue from among the 13 possible facevalues; (ii) Select four cards from among those with the facevalue chosen in (i), paying no regard to their order; (iii) Choose one card from among the remaining 48 cards. Each time we complete the job this way, we obtain exactly one fourofakind poker hand. Moreover, different ways of com
99128
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / SUBSETS 1 1
pleting the job result in different fourofakind poker hands. (It was to make this last assertion true that we described task (ii) as choosing a set of four cards, and were not concerned with the order in which the cards were selected.) Hence, there are as many different fourofakind poker hands as there are different ways of completing the job. Now, task (i) can be done in 13 ways, task (ii) can be done in only one way (since there is only one set of four cards that can be formed from four given cards), and task (iii) can be done in 48 ways. Hence, there are 13  1 • 48 = 624 different fourofakind poker hands.
We shall return to the fundamental principle of counting in Chapter 3, since it is the basic result from which the formulas of combinatorial analysis are derived. Our main interest here is to use the principle to establish the following theorem
Theorem 2.1 . J[£tj!L]3^ n elements, thmto .
Proof. If n = 0, then A = 0 and the only subset of 0 is 0 itself. Since 2° = 1, the theorem is true in this special case. If n > 1, then let the n elements of A be enumerated in some order. The job of constructing a subset of A can be viewed as made up of the following n tasks. As task 1 we decide whether the first element of A should or should not be an element of the subset. If we decide it should, then let us write down an e; if we decide it should not, then we write i. Then, as task 2, we write e or 4 depending upon whether we decide that the second element of A should or should not belong to the subset. Now we move to the third element of A and complete the third task in a similar manner. Since A has n elements, we complete n tasks, and thus obtain a sequence of n decisions, each symbolized by € or i. For example, if A = {a, &, c, d} , then the sequence €€& determines the subset {a, 6, d}, the sequence eeee determines the subset A itself, the sequence Uii determines the empty subset 0. In general, there are as many subsets of A as there are different ways of making the n decisions. Since each decision can be made in two ways (e or *), we conclude by the fundamental principle of counting that there are 2 • 2 • 2 • • • 2 = 2n ways of making all n decisions, and hence 2n subsets of A.
Forming subsets of a given set is a method that generates a large number of new sets. In fact, a set with 20 elements has 220 or more
99129
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


12 SETS/ Chap. 1
than a million different subsets. Ordinarily, however, as the following examples point out, one is interested in studying a small number of subsets from among the many available ones.
Example 2.4. Let a green and a red die be rolled, and let S denote the set of possible outcomes. S has 36 elements which we can enumerate as follows, using the abbreviation (x, y] to stand for "green die showed the number x and red die showed the number y":
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
By Theorem 2.1, there are 236 subsets of S, But relatively few of these subsets have any special interest, even to players of "craps." Some of these are:
(i) Si = the subset made up of those outcomes for which the sum of the numbers on the two dice is 7; i.e.,
S1 « {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}.
(ii) & = the subset containing the outcomes for which the sum of the numbers on the two dice is 11; i.e.,
&= {(5, 6), (6, 5)}.
(iii) 89 = the subset containing the outcomes for which the sum of the numbers on the two dice is either 7 or 11; i.e.,
& = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)}.
Whenever any experiment is performed (as in this example), we can think of the set of all possible outcomes of the experiment. We shall see that such sets and their subsets are of great importance in the mathematical theory of probability.
Example 2.5. The annual directory of college X lists the name, hometown, college residence, and telephone number of each of the college's 2000 students. Let A be the set of these 2000 entries, each entry containing the four pieces of information described above. The total number of subsets of ^4 is astronomical, being 22000. But the housemother in a certain dormitory is mainly concerned with the
99130
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec 2 / SUBSETS 13
subset of those entries containing the names of women who are residents of her dormitory, the mathematics department must estimate ahead of time the approximate number in the subset of entries naming students who will elect mathematics courses, a student may be especially interested in the subset of entries naming all freshmen who come from his hometown, etc. If the information for each student is entered by punching holes on certain speciallydesigned cards, then the cards corresponding to these subsets of A, as well as many others, can be sorted out of the whole set of cards by a machine. In fact, such sorting machines are designed for the purpose of speedily selecting certain subsets from a given set.
We conclude with an example designed to test the reader's grasp of the difference between the notions of set membership (symbolized by e) and set inclusion (symbolized by C).
Example 2.6. Consider the set M of majorities in a committee of four individuals, each having one vote. Let us label the individuals a, bj c, dj and note that a majority is itself a set of three or more of these committeemen. Thus, the set M has sets as elements, and we write M using braces within braces:
M = {{a, 6, c}, (a, 6, d}, (a, c, d}} {b, c, d}, {a, 6, c, d}}.
Thus {a, b, c} is an element of Mf but although a set, it is not a subset of M. (Why?) But {{a, b, c}} is a subset of M since its only element, {a, 5, c}, is indeed also an element of M.
PROBLEMS
y
2.1. Let S be the set of 36 outcomes of the experiment in which a green and a red die are rolled. (See Example 2.4.) We define certain subsets of S by listing their elements. State a defining property for each subset.
(a) {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
(b) {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)} ±*
(c) {(1, 3), (2, 2), (3,1)} &*± *?  '
(d) {(1, 3), (1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 5); (3, 6), (4,
*2.2. Let A = {1, 2, 3}. Identify the sets B such that {1} C B, B C A, and B**A.
2.3. You are told that there is only one set A such that A C £. Identify the set B.
99131
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


\ 4 SETS / Chap. 1
A be any set. The set {X \ X C A} of all subsets of A is called the p^jgvggLof A and Is denoted by 2A. (If A is a set with n elements, thenTTneorem 2.1 says that the power set 2A has 2n elements. This fact accounts for the name "power set" and the symbol 2A used to denote this set.) Explain the following true statements, assuming A = {x, y,z}:
(a) 0*,4, but0eV.
(b) xtA^butx^^^ '' "
(c) {x,y}*A,but{x,y}e2A.
(d) J. is an element, but is not a subset of 2A.
(e) {A} is not an element, but is a subset of 2A.
2.5. Which of the following are correct and why?
(a) {1} a {{!}} (b) {1} C {{!}}
(c) {1} 6 {1, {!}} (d) {1} C {1, {!}}
2.6. Give an example of two sets A and B such that both A e B and A C B are true.
2.7. The graph of the set of points C = {(x} y)  z2 + y2  4}, where x and y are real numbers, is the circumference of the circle with center at (0, 0) and radius 2 units. Determine the graphs of the following subsets of C:
(a) {(x,y)eC\xQ} (b) {(x, y) e C \x  2}
(c) i(x,y)eC\x = B} (d) {(*,*/) e C  z >0}_____
(e) {(x,y)eC\y>0} (f) {(x, y) eC \y = V4  z*}
2.8. (a) If 2 e ^4 and ^4 C 5, is it necessarily the case that z e £?
(b) If 2 e A and ^4 e B, is it necessarily the case that z e 5?
2.9. Draw a tree diagram to illustrate the fundamental principle of counting, assuming n = 3 and NI = 4, N* = 3, A/"3 = 2.
2.10. Assume the truth of the fundamental principle of counting for n — 2, i.e., for a job made up of only two tasks. Prove the principle for any positive integer n by mathematical induction.
In each of the following problems, state explicitly how the fundamental principle of counting is used in obtaining your answer. Draw a tree diagram where feasible*
2.11. A* man has five coins in his pocket. He agrees to give one coin to his I son and one to his daughter. In how many ways can this be done?
2.12. In how many different orders can one call out the numbers 1, 2, 3, 4, 5?
2.13. In dialing a telephone number, one has to select seven slots, the first two for the letters of the exchange, and then five digits to identify the
99132
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / SUBSETS 15
telephone in that exchange. The telephone dial contains ten slots, one for each of the digits 0, 1, 2,  • •, 9, but letters appear in only eight of these slots. If the first number cannot be a zero, how many different / telephone numbers, distinguishable as dialed, are possible?
2.14. How many threedigit even integers can be formed from the digits 1, 5, 6, and 8, with no digit repeated? „ :,
2.15. How many different ways are there of selecting two letters from the set {a, &, c} ? Let the reader realize that the question as stated is vague and needs to be made precise before it can be answered. We must know how the letters are selected, and we must decide when results of the selection process will be considered different. We list four possibilities. Answer the question in each case.
(a) The first letter is chosen and the second is selected from the remaining two letters; i.e., repetitions are not allowed. We count two ways of making the selections different if they result in different ordered pairs of letters; i.e., we record not only which two letters were selected, but also the order hi which they were selected.
(b) The first letter is chosen and the second is selected from the entire set of three letters; i.e., repetitions are allowed. We count ordered pairs of letters as in (a).
(c) Repetitions are not allowed, as in (a), and we count two ways of making the selections different only if they result in different sets of two letters; i.e., we disregard the order in which the letters were selected.
(d) Repetitions are allowed, as in (b), and we disregard order as in (c).
2.16. How many different ways are there of selecting two cards from a standard deck of 52 cards? Consider various interpretations of this
/ question, as in the preceding problem,%  f
y ". \ v
2.17. How many ways can three coins fall? four coins? n coins, where n is any positive integer?
2.18. Two cards are drawn one after the other from a standard deck of 52 cards. In how many ways can one draw
(a) first a spade and then a heart?
(b) first a spade and then a heart or a diamond?
(c) first a spade and then another spade?
2.19. Repeat the preceding problem, assuming the first card is put back in the deck before the second is drawn.
2.20. Let A = {1, 2, 3,   , 365}. (a) Two numbers are selected in order, each from the full set A. The result is an ordered @tuple, or ordered pair of numbers. How many are there? (b) Three numbers are selected
99133
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


16 SETS/Chap. 1
in order, each from the full set A. The result is an ordered 8tuple, or ordered triple of numbers. How many are there? (c) r numbers are selected in order, each from the full set .4, where r is some positive integer. The result is an ordered rtuple. How many are there? (Note: A general definition of an ordered rtuple is given in Section 5.)
2.21. (a) How many ways are there of placing three distinguishable balls into two numbered cells? Into three numbered cells? Into n numbered cells?
(b) How many ways are there of placing r distinguishable balls into two numbered cells? Into three numbered cells? Into n numbered cells?
3. Operations on sets
In any particular discussion of sets, it is necessary to define some fixed set of elements (called the universal set) to which we limit the discussion. This point has been eloquently made by Langer:
In ordinary conversation, we assume the limitations of such a universe, as when we say: "Everybody knows that another war is coming," and assume that "everybody" will be properly understood to refer only to adults of normal intelligence and European culture, not to babies in their cribs, or the inhabitants of remote wildernesses. For conversational purposes, the tacit understanding will do; but if the statement is to be challenged, i.e., if someone volunteers to produce a person to whom it is not true, then it becomes important to know just what the limits of its applicability really are. Arguments of this sort have their own technique, by which the opposition marshals contradictory cases—in this example, persons who have no such knowledge—and the asseverator rules them out as "not meant" by his statement. The universe of ordinary discourse is vague enough so that this process can go on as long as the bellicosity of the two adversaries lasts. Logicians and scientists, however, take no pleasure in casuistry. Their universe of discourse must be definite enough to allow no dispute whatever about what does or does not belong to it.*
The fixed universal set we shall denote by CIL. Once having decided on the universal set 11 for a particular discussion, all other sets in that same discussion must be subsets of CU. But different universal sets can be used for different discussions.
99134
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 17
We now define the three basic operations on sets.
Definition 3.1 . Let A and B be any subsets of a universal set 11. Then
I. The complement of A (with respect to 'U) is the set of elements of *H that do not belong to A. The complement of A is denoted by A', the particular universal set being understood from the context. In symbols,
Ar = xe^
II. The intersection of A and B is the set of elements that belong to both Ajmd^B. The intersection of A and B is denoted by A C\ B, which is read "A cap 5" or "A intersection B" In symbols,
A H B = [x  s e A and re e B}.
III. The union of A and B is the set of elements that belong to at least one of the sets A and B, i.e., to A jr J5. The union of A and 5 is denoted by A U B, which is read "^ cup £" or "A union B.11 In symbols,
A U5 = {xxeA OTXeB}.
A comment about the meaning of the word "or" in mathematics is in order here. This logical connective is ambiguous in everyday language, sometimes being used in the inclusive sense (in which "p or g" is taken to mean "p or g, or both p and g") and other times being used in the exclusive sense (in which "p or g" means "p or g, but not both"). As we have explicitly indicated by our wording, the "or" in the definition of union of two sets is to be taken in the inclusive sense, in which it is synonomous with the legal use of "and/or." We adhere to the accepted mathematical usage and shall henceforth always so interpret the word "or." The words "not," "and," and "or" are italicized in Definition 3.1, for they are the key words to remember hi the definitions of complement, intersection, and union of sets.
The following examples illustrate how one obtains new sets by applying the operations in Definition 3.1 to given sets.
Example 3.1. Let the universal set 'IL be the set of letters in the alphabet, and let A be the subset of vowels, and B the subset containing the first three letters, i.e.,
A = {a, e} i} o, u}, B = {a, 6, c}.
99135
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


18 SETS / Chap. 1
Then, by Definition 3.1,
A' = the set of consonants, Bf = {d, e, /, • • , x, y, z}9
A U B = {a, 6, c, 6, i, o, w}, A fl # = {a}
Example 3.2. Let the universal set "11 be the set of all residents of New York City. Let A denote the set of male New Yorkers, B the set of New Yorkers who live in the borough of Brooklyn, and C the set of baseball fans in New York who are rooting for the Dodgers to win the National League pennant. Then Ar = set of female New Yorkers, B1 = set of New Yorkers who do not live in Brooklyn, C" = set of New Yorkers who are not baseball fans rooting for the Dodgers, i.e., the set of New Yorkers who either are not baseball fans at all or, if they are baseball fans, are not rooting for the Dodgers to win the pennant, A C\ B = set of male residents of Brooklyn, A U B = set of New Yorkers who are male or Brooklynites, and B n C = set of Brooklynites who are also baseball fans rooting for the Dodgers to win the National League pennant. (It was erroneously asserted by some bitter elements of set B that, when the Dodgers moved to Los Angeles, it would be true that B O C = 0.)
Suppose subsets A, B, C of a universal set <& are given. Since Af and B (~}C are themselves sets, we can form their intersection Af H (B Pi CO, the set of all elements in "U that do not belong to A but do belong to both B and C. Similarly, we can take the complement of the intersection B (~\ C, symbolized by (B (~\ C)', and thus obtain the set of objects in U that are not hi both B and C, i.e., that are not in B or not in C. In general, the three operations we have defined, when applied to sets, produce still other sets to which the operations can again be applied. We illustrate this important point in an example.
Example 3.3. Let 'U = {1, 2, 3, 4, 5, 6, 7} be the universal set, and consider the subsets of U given by
A = {1, 2, 3}, B = {2, 4, 6}, C = {1, 3, 5, 7}. By applying Definition 3.1, we find
A' = {4, 5, 6, 7}, B' = {1, 3, 5, 7} = C, C' = {2, 4, 6} = B, ^£={1,2,3,4,6}, AUC={1,2,3,5,7}) B U C = %
A n B = {2}, A n c = {i, 3>, B n c = 0.
99136
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 19*
We can continue forming complements, unions, and intersections of these sets. For example,
(A'Y = {4,5,6,7}'= {1,2,3} = A, (A U £)'={!, 2, 3, 4, 6}' = {5,7},
(B n cy = v = %
(A n 5) U C = {2} U {1, 3, 5, 7} = {1, 2, 3, 5, 7},
(A U C) H (A H C) = {1, 2, 3, 5, 7} n {1, 3} = {1, 3}, etc.
When considering sets and operations on sets, it is helpful to represent the sets pictorially. A rectangle is drawn to represent the universal set ell. A subset A of *U is represented by the region within a* circle drawn inside the rectangle. Then A', the complement of Af will be represented by the part of the rectangle outside the circle, as. in Figure 5.
Figure 6
Figure 6
Such diagrams, called Venn diagrams (after the English logician John Venn, 18341883), can be drawn for a problem involving two subsets, say A and B, of some universal set eU. In Figure 6, we have labeled the four nonoverlapping regions of the rectangle corresponding to the following four possibilities for any element x e OL:
(1) xeA and x e B, i.e., x e A Pi B (Region EI)
(2) x e A and x i B, i.e., x e A f B' (Region S2)
(3) xiA and xeB, Le., X€A'C\B (RegionjR3)
(4) x i A and x t B, i.e., x e A' n Br (Region E4)
It is important to observe that complements, unions, and intersections of the sets A and B can be represented by combinations of one or more of the regions in Figure 6, as in Table 2. Furthermore^ given any set written in terms of operations on A and B, we can easily determine the particular combination of regions representing: this set. For example, the set (A C\ BY contains those elements that
99137
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


20
SETS / Chap. 1
TABLE 2
Set Region in Figure 6
A Rl & R%
B Rl & RZ
AKB Ri
A' R3&R4
B' Ri&R*
are not in A C\ B, i.e., not in region RI. Hence, (A O By is represented by region R%& R$& J?4
Suppose we are told that A C B. Although the Venn diagram is often drawn in this case with the circle representing A entirely within the circle representing B, we prefer to use Figure 6. But since we are told that every element in A is also in B, we conclude that region J?2 represents 0, i.e., A H B' = 0. Furthermore, the regions RI & R% and RI must now represent the same set of points, i.e., A = A n B. Similarly, RI & Rz and RI & R% & R$ also represent equal sets, so that B = A U B. In
this way, we see that the following are all equivalent assertions, each giving the information that every element of A is also in B:
(l)ACE, (2)AH£' = 0, (3)A=AOS, (4)B = AU#.
In order to consider another application of Venn diagrams, we need to make an important definition.
Definition 3.2. Two sets A and B are said to be disjoint or mutually exclusive if they have~Tio""elements In common, i.e., if A TV # =,fL
When A and B are disjoint, one customarily draws a Venn diagram with nonoverlapping circles representing A and B. But we can equally well use the diagram in Figure 6, provided we note that region RI represents the empty set.
Example 3.4. If S is any set, let us denote by n(S) the number of elements in S. If A and B are disjoint sets, then the number of elements in A or in B is the sum of the number of elements in A and the number of elements in B; i.e.,
(3.1) n(A U £) = n(A) + n(B) if A fl B = 0.
To find a formula for n(A U B} when A and B are not necessarily disjoint, we proceed as follows. A H B' and A H B are disjoint sets (why?) whose union, as is easily seen from Figure 6, is the set A. Hence by (3.1),
n(A H B') + n(A Pi B) = n(A).
99138
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 21
Also, since A H B and A' C\ B are disjoint sets whose union is B,
n(A O B) + n(Af H B) = n(B).
If we add these equations and subtract n(A O B} from both sides of
the result, we obtain
n(A H B') + »(4 H B) + n(A' C\ B) = »(A) + n(5)  n(A C\ B).
But, referring to Figure 6, we recognize the lefthand side of this equation as the number of elements in the set represented by region RI&RZ& Rz. But this region represents the union A \J B. Hence, we obtain the formula
(3.2) n(A U B) = n(A) + n(S)  n(A O B),
which is valid for any sets A and B. Note that (3.2) reduces to (3.1) when A and B are disjoint, for then n(A n B) = n(0) = 0.
Suppose we pick one card from a standard deck of 52 cards. Let 11 be the set of 52 possible choices, and let A and B denote the set of aces and the set of spades, respectively. Obviously n(A) = 4andn(5) = 13. Entn(A(jB) T^ 17, since A and B are not disjoint. Indeed, n(A C\ B) = 1, since only the ace of spades is common to A and B. By ,.
(3.2) we correctly find n(A \J B) = 16. lgUre
As expected, we obtain an ace or a spade with 16 different cards.
The general Venn diagram in the case of three subsets A, B, and C is found in Figure 7, where we have labeled the eight nonoverlapping regions corresponding to the following eight possibilities for any element x € U:
(1) xeA and xeB and xeC (RegionJ?i)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
x eA xeA X eA xiA xiA xiA xiA and and and and and and and xeB xiE xiB xeB xeB xiB xiB and and and and and and and xiC xeC XiC xeC XiC xeC XiC (Region R%) (Region R%) (Region J24) (Region jR5) (Region E6) (Region RT) (Region J?8)
99139
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


22 SETS / Chap. 1
It will be helpful in our later work to have clearly in mind the correspondence between various subsets of U and regions of the Venn diagram in Figure 7. The reader should check the examples in Table 3.
TABLE 3
Set
A B C
AttB
AUB
(A U B) 0 C
A' A' ft (A n B)
Bnc (A n BO n cf
Region in Figure 7
None (the set is empty)
Ri
R*
We see that by starting with two or more subsets of some universal set and forming their complements, unions, and intersections, many other subsets are obtained. In the next section, we explore certain interesting and important relationships among these subsets. We conclude here with another example in which a Venn diagram proves helpful.
Example 3.5. Persons are classified according to blood type and Rh quality by testing a blood sample for the presence of three antigens: A, jB, and Rh. Blood is of type AB if it contains both antigens A and B, of type A if it contains A but not J5, of type B if it contains B but not A, and of type 0 if it contains Figure 8
neither A nor B. In addition, blood is
classified as Rh positive (+) if the Rh antigen is' present, as Rh negative (—) otherwise. If we let A, B, and Rh denote the sets of people whose blood contains the A, B3 and Rh antigens respec
99140
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 23
lively, then all people are classified into one of the eight categories indicated in the Venn diagram of Figure 8.
Suppose a laboratory technician reports the following statistics after testing blood samples of 100 people:
50 contain antigen A 52 contain antigen B 40 contain antigen Rh 20 contain both A and B 13 contain both A and Rh 15 contain both B and Rh 5 contain all three antigens
How many persons of type A— did the technician find? To answer this sort of question, we use the data to fill in the number of people in each of the eight subsets in Figure 8. The trick here is to use the data in reverse order, i.e., work from the bottom of the list to the top. Thus, the last item reported tells us there are five people of type AB+. The 15 people reported to have both B and Rh must be of type AB+ or B+. Since five people are already identified as type AB+, we infer that ten people are of type B+. In this way we complete the enumeration, and thus obtain the number of people in each of the eight categories. We find there were 22 people of type
A ___
) PROBLEMS
3.1. Let °li = {a, I, c}, A = {a}, B = {&}. List the elements of the follow.ing sets: A\ B',A\JB,Ar\ B, A' C\ B', AT\(A\J B).
3.2. Refer to the Venn diagram in Figure 6. Determine the region or combination of regions representing each of the sets (a) (A \J B)r, (b) A' \J Bf, (c) A \J B', (d) (A')', (e) (Ar C\ BY U B.
3.3. A universal set *U has eight elements corresponding to the eight possible outcomes of the experiment in which a penny, a nickel, and a dime are tossed.
(a) List the elements of CIL
(b) Suppose subset A contains those elements corresponding to outcomes for which the penny falls heads, subset B those for which all three coins match, and subset C those for which the number of heads exceeds the number of tails. List the elements of the following sets: A', B', A \J B, A \J C, B \J C, A Pi B} A A C, B H C, A' C\C,(A.r\E)C\ C, and (A C\ Br) C\ C.
99141
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


24 SETS / Chap. 1
3.4. The universal set 11 contains the 52 cards in a standard deck. Let S denote the subset of spades, D the subset of diamonds, and H the subset of honor cards (i.e., ten, jack, queen, king, or ace.)
(a) Identify the following sets and count the number of elements in each: S H H, S', D H S, D C\ S', D U S, (SUD)C\ H.
(b) Write the following sets in symbolic form as in (a): the set of cards that are not honor cards, the set of cards that are neither spades nor honor cards, the set of clubs or hearts that are not honor cards. (Note: It is instructive to try to write this last set in at least three different ways.)
3.5. Table 4 classifies 321 union men with respect to two characteristics: (1) the number of years each has been in the union, and (2) his answer to the question, "Are you willing to picket to help some other shop get organized or get a raise in pay?".
TABLE 4
Response to Question Number of Years in the Union Total
Less than 1 13 410 Over 10
Yes .......... 27 54 137 28 14 18 34 3 3 210 246 69 6
No ..........
Don't know .... Total:
44 74 172 31 321
[Source: Arnold M. Rose, Union Solidarity, University of Minnesota Press, 1952, p. 77.3
Let the 321 men in this survey be the elements of our universal set cll, and define the following subsets of 11:
Y = set of men who answer "yes,"
N = set of men who answer "no,"
A = set of men who are in the union less than one year,
B = set of men who are in the union 13 years,
C = set of men who are in the union 410 years.
(a) Find the number of men in each of the following sets: (i) Y C\ B, (ii) 7 \J B, (iii) (Y \J NY H A, (iv) (N C\ C)'.
(b) Write each of the following sets, using only the symbols A, B} C, Y, N, ', VJ, and Pi. [Example: the set of men who answer "yes" and are in the union less than four years is the set Y C\(A\J B).]
99142
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 25
(1) the set of men who answer "yes" and are in the union 410 years.
(2) the set of men who answer "yes" and are in the union at least four years. • ; , *'
(3) the set of men who answer "don't know."
(4) the set of men who answer "don't know" and who are in the union over ten years. (What does the survey tell you about this set?) ,  ,
3.6. Let °U be the set of all points in the plane. Relative to some fixed rectangular coordinate system, we can write
01= {(x,y) \xeR and yeE},
where R is the set of all real numbers. Let subsets of 01 be defined as follows:
A = {(x,y)\x>0}9 B = {(x, y) \ y > 0},
C = {(x, y)\x + 2yQ}.
Sketch graphs of the sets (a) A, (b) B, (c) C, (d) D, (e) .4 A B,
(f) (A n B) n c, (g) [(A r\B)r\c]r\D.
3.7. (a) Express n(0l), the number of elements in the universal set 11, in
terms of n(A), the number of elements in subset A, and n(A'), the number of elements in the complement of A.
(b) The formula you wrote in (a) can be deduced from Formula (3.2) of the text. Show how to do this.
3.8. A psychologist ran 50 mice through a maze experiment and reported the following data: 25 iHice were male, 25®rere previously trained, 20 turned left (at the first choicepoint), 10 were previously trained males, 4 males turned left, 15 previously trained mice turned left, and 3 previously trained males turned left. Draw an appropriate Venn diagram and determine the number of female mice who were not previously trained and who did not turn left. ( S^
3.9. Of 63 member colleges of the Council For The Advancement of Small Colleges, Inc., 24 were founded before 1931, were coed, and reported annual student costs of less than $1,000; 41 were founded before 1931 and were coed; 27 were founded before 1931 and had student costs of less than $1,000; 45 were founded before 1931; 52 were coed; 34 had student costs of less than $1,000; 4 were not founded before 1931, were not coed, and had student costs of at least $1,000. (Data reported in Supplement Section 11, The New York Times, October 11, 1959.)
A high school senior wants to attend a coed college that is relatively new, say founded 1931 or later. His annual student costs must be less than $1,000. How many of the 63 small colleges meet his requirements?
99143
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


26 $ETS /
3.10. Show that each of the following pairs of sets are represented by the same region in the Venn diagram of Figure 7:
(a) (A \J BY and A' C\ Br,
(b) (A C\ BY and A' \J B',
(c) (AUB)VJCandAU(BUC),
I (d) A C\ (B U O and (A C\ B) U (A C\ 0).
3.11. A, B, and C are subsets of a universal set CU. Arrange the following sets in sequential order so that each set in the sequence is a subset of the next set: A U B, % A H B, 0, B, A U (B \J C), (A C\ B) C\ C,
I (A\JB)\JC,0',Br\A.
3.12. (a) Show that cll/ = 0 and 0' = Hi.
(b) Show that if A C B, then B' C A'.
(c) Suppose A VJ .B = 0. What conclusion can you draw about the r sets A and B? s. *
f (d) Suppose A n B = 0. Does it follow'that A = 0 or B = 0?
3.13. Let 'U be the set of all people and
M = the set of all males, C = the set of all college students, / — the set of all intelligent people, S = the set of sorority members, B = the set of beer drinkers, P = the set of professors, W = the set of welldressed people.
Translate each of the following sentences into an equation or an inequality using only the letters standing for sets and the symbols =, 5^, 0, ', A, W. (For example, the sentence "AH college students are intelligent" means that the set of college students is a subset of the set of intelligent people, i.e., C C /. But we are not permitted use of the set inclusion symbol. Hence we refer to the discussion on p. 20 and rewrite the sentence hi any of the equivalent forms C C\ I = Cy C WI = I, or C C\ I' = 0. Similarly, the sentence "Some college students are intelligent" means there is at least one member of the intersection C C\ I. Hence this sentence is translated into C C\ I ?* 0.)
(a) All professors are beer drinkers. ^ (b) No males are sorority members.
(c) No male college student is well dressed.
(d) Sorority members are neither intelligent nor male.
(e) Some professors are beer drinking males.
(f) Some professors who drink beer are not males.
99144
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / OPERATIONS ON SETS 27
(g) Some professors who drink beer are neither intelligent nor well
dressed.
(h) College students and professors are beerdrinkers, (i) If a person is a beerdrinker, then that person is intelligent, (j) If a person is intelligent, then that person is a beer drinker, (k) A person is a beer drinker if and only if he is intelligent.
3.14. Read the following discussion carefully. Then to test your understanding, do the exercises.
A finite set 11 of n people is given. 11 will be called a decisionmaking body. Let t) = 2U be the power set of 1L, as defined hi Problem 2.4. Each element of *U is a subset of 1L and will be called a coalition. (In particular, the empty set 0 and the set 11 itself are coalitions. There are 2n coalitions altogether.)
We select a subset W of V and write t) = TF U F', where W is the complement of W with respect to V. Since W and W are disjoint, each coalition is in exactly one of W (the set of winning coalitions) or W (the set of nonwinning coalitions).
Now consider the set W. An element of W will be called a losing coalition if its complement (with respect to 11) is a winning coalition. Thus L, the set of losing coalitions, is defined by
L= {A\AeW and A' eW}.
[Note that Wf means the complement of W with respect to "0, whereas A' means the complement of A with respect to 11. This confusion arises from the fact that 11 serves as universal set for all sets whose elements are people, whereas *U serves as universal set for all sets whose elements are coalitions (sets of people).]
Finally, a coalition that is nonwinning itself and whose complement is also nonwinning is called a blocking coalition. Thus B, the set of blocking coalitions, is defined by
B = {A\AeWf and A1 eW}.
Some decisionmaking bodies contain important persons who get special names: A person x € 11 is said to be a dictator if {x} e W, Le., if # is the sole member of a whining coalition. A person t/ e 11 is said to have veto power if {y} e B, i.e. if y is the sole member of a blocking coalition.
In each of the following exercises, a particular decisionmaking body is described and its voting rules specified. Interpret a winning coalttwn to mean a set of persons who control enough votes to carry a proposal. Find all winning coalitions, losing coalitions, blocking coalitions. Determine if any members are dictators or have veto power.
99145
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


28 SETS / Chap. 1
Exercise 1. A committee consists of four people, each with one vote. Majority rale applies; i.e., three votes are needed to carry a proposal.
Exercise 2. A small corporation with 100 shares of stock outstanding has three shareholders. Individual a owns 50 shares, b owns 30 shares, and c owns 20 shares. Each share has one vote, and simple majority rale applies.
Exercise 8. Same as Exercise 2, except that b has sold one of his shares to a.
Exercise 4 A studentfaculty committee consists of five students and four faculty members. For a proposal to be passed on to the entire faculty for its consideration, at least three students and three faculty members must vote for the proposal. Each member has one vote.
4. The algebra of sets
We have studied a number of ways of obtaining other sets, once a universal set 'U is given. There are the many sets that can be constructed by performing the operations of complement, union, and intersection on subsets of 01. The reader must suspect by now (especially in view of the results in Problems 3.103.12) that there are many relationships among the sets obtained in this way. These relationships form the subject matter of the present section.
We begin by listing a number of important laws obeyed by sets. All follow from our definitions of the empty set 0, the universal set at, the operations denoted by', D, and U, together with the definition of set equality.
Theorem 4.1. Let A, B, and C be any subsets of a universal set eU, Then the following laws hold:
Identity laws:
la. A U 0 = A Ib. A n 11 = A
2a. A U
99146
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / THE ALGEBRA OF SETS 29
Commutative laws:
6a. A\JB = B\JA 6b. A C\B = E f}A
De Morgan's laws:______
7a.
(A U BY = A'C}Bf
7b. (A H BY = A'(JBf
Associative laws: 8a. A U (5 U O = (A U S3 U C
sb. ^ n CB n c) = (A n B) n c
Distributive laws:
9a. i u (£ n c) = (^ u s) n (^ u o 9b. A n (^ u c) = (^ n B) u (A n co
Before proving these laws, let us note that we are familiar with many of their names from the ordinary algebra of numbers. Thus, addition and multiplication of numbers is commutative, i.e.,
a + b = b + a and a X b = b X a}
for any numbers a and 6. Analogously, Laws 6a and 6b assert that the order in which two sets are written does not affect their union or intersection. For any numbers a, b, and c, we recall the associative laws
a + (6 + c) = (a + 6) + c and a X (6 X c) = (a X &) X c.
The analogy with 8a and 8b is clear. The associative law 8a asserts that the same set is obtained if we form the union of A with the union of B and C or if we form instead the union of the union of A and B with C. In ordinary algebra, we have only one distributive law, namely, aX(b + c)=aXb + aXc. This is analogous to 9b, one of the two distributive laws for sets.
Since adding zero to any number yields that same number as sum, 0 is called an identity number with respect to addition. Similarly, Since a X 1 = a for any number a, we say that 1 is an identity number with respect to multiplication. As Laws la and Ib showTthe effipty set 0 is an identity set with respect to union and the universal set *U is an identity set with respect to intersection.
Because of these analogies, A U B is sometimes called the logical sum and A C\ B the logical product of the sets A ana B. ±sut tSe analogy with ordinary algebra is not perfect, as a glance at the idempotent laws shows. If a is a number, then a + a = 2a;ifAisa set, then A U A = A.
99147
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


30 SETS / Chap. 1
It is instructive to try to translate these laws into prose form. For example, 7a asserts that the complement of the union of any two sets is equal to the intersection of their complements, and 7b asserts that the complement of the intersection of any two sets is equal to the union of their complements.
Finally, let us note that (except for 5a) the laws in Theorem 4.1 are listed in pairs, la and Ib, 2a and 2b, etc. We shall comment on the significance of this fact after we discuss the proof of these laws.
Our method of proof Involves the use of membership JaUes.^ The basic membership tables for complement, intersection, and union appear in Tables 57. In the first column of Table 5, we symbolize the two possibilities for any element x of the universal set IL: either
TABLE 6
TABLE 7
TABLE 5
A B Ar\B
e e 6
6 i i
i € i
i i i
A B A\JB
€ 6 e
6 i €
i € €
i i 4
xeA or x {A. If x e A, then x i Ar and if x i A, then x e A'. These facts follow from the definition of complement and are summarized in the two rows of Table 5, the membership table for A'.
With respect to the sets A and B, each element x of the universal set 'U falls into exactly one of the following categories: (1) x e A and x e J3, (2) x e A and x i B, (3) x i A and x e B, and (4) x i A and x i B. These are the four possibilities symbolized in the four rows to the left of the double vertical line in Table 6, the membership table for A n B. To the right, in the column headed A n B, is summarized the membership status of x with respect to the intersection of A and B. That is, by the definition of intersection, x e A C\ B in the case (row 1) when x e A and x e B; x i A C\ B in all other cases (rows 24).
Table 7, the membership table for A U B, is similarly interpreted. We know that x € A U B if and only if x belongs to at least one of the sets A and B. Hence, an e appears under A U B in rows 13 of Table 7, but an i appears in row 4.
Using these basic tables, we can construct membership tables for
99148
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / THE ALGEBRA OF SETS 31
other sets. The details of this construction, as well as the rationale behind the use of membership tables for proving equality between sets, are best explained in the context of some examples.
Example 4.1. To prove De Morgan's law, (A U BY = A' 0 B',
we proceed as follows. Since this law Involves two arbitrary sets A and B, we start by listing in columns (1) and (2) of Table 8 the four
(1) (2)
(3)
TABLE 8
(4) (5)
(6)
(7)
A B A\JB (AUBY A' B' A' 0 B'
€ € € i i i i
€ i € i i € £
i € e i € i i
* i * € € € €
possibilities for an element ^ e CU. Since the set (A U BY is obtained by first forming A U B and then taking its complement, we include a column for A U B and another for (A U BY The entries in column (3) are obtained from (1) and (2) by use of the basic membership table for A {J B. The entry in each row of column (4) is obtained from the entry in the corresponding row of column (3) by using Table 5.
The set A' O Bf appearing in the law we are trying to prove is obtained by forming A1 and B', and then taking their intersection. Hence we have columns (5)(7) in Table 8. The entry in each row of column (5) is obtained from the entry in the corresponding row of column (1) by use of the basic membership table for A'. Column (6) is similarly obtained from column (2). Finally, from (5) and (6) we get column (7) by using the basic membership table for intersection.
The crucial observation is that columns (4) and (7) are identical: whenever a row contains an e in column (4) it also contains an c in column (7) and likewise for the occurrences of i. We conclude that whenever an element of *U belongs to (A U £)', it also belongs to
Bf, i.e.,
(4.1)
(A U BY C A'
99149
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


32 SETS / Chap.1
Moreover, whenever an element does not belong to (A U B)f, then it does not belong to A' C\ B'. It follows (why?) that every element that does belong to A' O B' must also belong to (A U B)', i.e.,
(4.2) Ar n B' C (A U B)'.
From (4.1) and (4.2) we conclude that
(A U BY = A' O B' and this completes the proof of De Morgan's law 7a.
Before considering another example, we note the striking similarity between the method of proof just illustrated and the method of verifying relations between sets by means of Venn diagrams. In Figure 9, we apply the latter method to the De Morgan law we have
U : #! & R2 & #3
& R3
Figure 9
just proved. In the space above the usual rectangle, we list our data: the universal set 'U is represented by the entire rectangle, the set A by the region RI & R^ the set B by the region RI & R%. (We use the colon as shorthand for "is represented by" when it separates a set and a region in the Venn diagram.) To the left of the rectangle, we list the steps required to find the region represented by the lefthand side of De Morgan's law; to the right of the rectangle, we find the region represented by the righthand side of the law. We observe that (A U BY and A' C\ Bf are both represented by the same region RI. This fact constitutes the verification of De Morgan's law 7a by means of a Venn diagram.
Since the four regions in Figure 9 are numbered to correspond to the four rows of Table 8, we can follow in the Venn diagram each step in the construction of Table 8. Thus, the fact that column (1) contains e's in rows 1 and 2 is expressed in Figure 9 by the fact that
99150
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / THE ALGEBRA OF SETS 33
A is represented by Ri & R& That columns (4) and (7) are identical and contain e's only in row 4 is expressed in Figure 9 by the fact that both (A U BY and A' C\ B' are represented by region R*. Although the method of membership table suffices for proving all the laws we shall encounter, the method of Venn diagrams is often a helpful aid in understanding these laws. We give one more example in which both methods are used.
Example 4.2. To prove the distributive law 9b in Theorem 4.1, we construct the membership table with eight rows (since three arbitrary sets are involved) in Table 9. The law is proved by noting that the columns headed A Pi (B U C) and (A Pi 5) U (A H C) are identical.
TABLE 9
A B C BUC Ar}(B(JC) Ans Anc (An B)U(AnC)
6 6 € 6 6 c e 6
€ € 4 € € 6 4 6
e 4 e € € 4 € €
e 4 4 4 * 4 4 4
4 £ e € 4 4 4 4
4 £ 4 € 4 4 4 4
4 i € € 4 4 4 4
4 4 4 4 4 4 i 4
In Figure 10, this same distributive law is verified by using an appropriate Venn diagram. We have numbered the eight regions in
Z & R3 & #<,
6, C. flj & R3 & R5 & R7
Figure 10
Figure 10 to correspond to the eight rows of Table 9 in order to bring out here, as in Example 4.1, the similarities in the membership table and Venn diagram methods.
99151
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


34 SETS / Chap, t
We leave the proofs of the other laws in Theorem 4.1 as problems for the reader. In the next examples, we illustrate how to prove still other laws directly from those already known to be true.
Example 4.3. If A and B are any subsets of a universal set % then.
(4.3) A = (AnB)U(AnB').
Proof.
(A n B) U (A H B1) = A H (B U BO [by 9bJ
= A O ni [by 4a.]
= A [by Ib.J
Example 4.4. If A and B are any subsets of a universal set % then.
(4.4) A = (AUB)fl(AUB').
Proof.
(A U B) H (A U BO = A U (B 0 BO [by 9aJ
= A U 0 [by 4b.]
= A [by la.]
These examples enable us to make a point concerning the pairing of the laws in Theorem 4.1. This was done in order that we may note the socalled duality principle: If in any law we replace 0 by % at by 0, U by H, and Pi by U wherever they occur, then the result is again a law. The new law is said to be the dual of the original law. Thus in Theorem 4.1, law Ib is the dual of law la, la is the dual of lb? and so on for all a and b laws in our list. The dual of 5a is itself; law 5a is therefore said to be selfdual.
Note that (4.3) and (4.4) are dual laws, and that the proof of (4.4) can be obtained from the proof of (4.3) by replacing each statement by its dual. Since Theorem 4.1 contains the dual of every one of its laws, we can justify each step in proving (4.4) by appealing to the dual of the law justifying the corresponding step in the proof of (4.3). In this way, we could prove the dual of any law whose proof followed from Theorem 4.1. Indeed, this is the essence of the duality principle.
The importance of the duality principle cannot be fully appreciated until the algebra of sets we have been discussing is treated formally as a mathematical system. In this more abstract study, known asBoolean algebra (after the English logician George Boole, 18151864), the algebra of sets becomes just one concrete interpretation of an.
99152
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / THE ALGEBRA OF SETS 35
abstract system which also has other important interpretations.* The interested student can consult the references listed at the end of this chapter for readings on Boolean algebra.
In our later work, we shall need to consider the union and intersection of more than two sets. Because of the associative laws 8a and 8b in Theorem 4.1, it is not necessary to use parentheses to show how more than two sets separated by U or by H are paired. For example, AnBHCTlDcanbe interpreted as (A f B) C\ (C C\ D} or as (.4 H (B H C)} n D or as ((A n B) n C) n D, since aU of these sets are equal. (See Problem 4.9.) Similar considerations apply to the union of more than two sets, so we make the following general agreement.
Definition 4.1. Let n be any positive integer and suppose BI, 52, • • •, Bn are given sets. Then the set of elements belonging to all the given sets is denoted by
JSiOBsH ••• H#n
and the set of elements belonging to at least one of the given sets is denoted by
B! U B, U • •  U Bn.
Many of the laws in Theorem 4.1 can now be generalized to hold for unions and intersections of more than two sets. We collect some of these formulas in the following theorem.
Theorem 4.2. Let n be any positive integer and suppose A, BI, B%, ^ • • •, Bn are subsets of a universal set CU. Then
(4.5) (BI u B* u • • • u Bny = BJ n #2 n    n B;.
(4.6) (BX n B, n • • • n B«y = Bi u BJ u • • • u B;.
(4.7) A u (BI n £2 n   • n BO
« (A u BO n (A u BO n • •  n (A u BO.
(4.8) A n (Bi U B2 U • • • U Bn}
= (A n BO u (A n £2) u • •  u (A n BO.
Proof. We prove (4.5) and leave the others as problems. Our
* The socalled statement calculus in logic is another interpretation of Boolean algebra, and we can therefore expect that the logical analysis of statements and the study of sets will have many common features. The similarities between the use of truth tables in logic and our use of membership tables is but one of many •examples.
99153
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


36 SETS / Chap. 1
method of proof Is by mathematical induction. When n = 1, both sides of (4.5) reduce to B{, and so (4.5) is certainly true. If n = 2, then (4.5) reduces to De Morgan's law 7a in Theorem 4.1 (applied to the sets ft and ft), and hence is again true.
Now suppose (4.5) is true when n = k, where k is any positive integer. We complete our proof by mathematical induction if we show that (4.5) must also be true when n = k + 1. By grouping the sets as indicated, we obtain
(ft U ft U   * U ft+i)' = [(ft U ft U • • • U ft) U ft+i]'.
Applying De Morgan's law 7a to the sets (Bi U ft U •  • U ft) and ft+i, we get
(Bi U ft U   U ft+i)' = (ft U ft U • • • U ft)' n Bi+i
= (ft n ft n • *  n ft) n Bi+i,
the last equality following by our induction hypothesis that (4.5) is true when n = k. But the parentheses are not necessary in this last expression, so we can write
(ft u ft u  • • u ft+i)' = B[ n ft n • •  n ft+i,
which is precisely (4.5) when n = k + 1.
We have now shown (4.5) is true when n = I and n = 2 and, furthermore, that its truth when n = k + 1 follows from its truth when n = k for any positive Integer /:. We conclude by the principle of mathematical Induction that (4.5) is true for all positive integers n. This completes the proof.
Note that (4.5) and (4.6) are generalizations of De Morgan's laws, whereas (4,7) and (4.8) are generalized distributive laws.
PROBLEMS
4.1. Only laws 7a and 9b of Theorem 4.1 are proved in the text. Construct membership tables for the other laws, and thus complete the proof of Theorem 4.1. Also verify each law by means of an appropriate Venn diagram whose regions are numbered to correspond to the rows of the membership table for that law.
4.2. How many rows are required in a membership table for a law involving four arbitrary sets? five sets? n sets, where n is any positive integer?
4.3. Construct membership tables and thus show that the following laws hold for any subsets A, B, C of a universal set «a.
99154
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / THE ALGEBRA OF SETS 37
(a) (AT\By = A\JB
(b) [Af C\(A\J B)}' = A \J Br
(C) (A r\ B) n (A n #o = 0
(d) A n (.4 W 5) = A U (4 n £) = A
(e) (A' n (£ n C))' == ,A VJ £' U C'
4.4. Prove each of the laws in the preceding problem by using only the laws in Theorem 4.1. Indicate the law in Theorem 4.1 which justifies each step in your proof.
4.5. Verify each of the laws in Problem 4.3 by the method of Venn diagrams.
4.6. (a) It is clear from a Venn diagram that if A and B are disjoint sets,
then A C\C and B C\C are also disjoint. But prove this result by showing that if A C\ B = 0, then (A C\ C) Pi (B C\ C) = 0. Justify each step in your proof by appealing to either the hypothesis or to one of the laws in Theorem 4.1. (b) Show by examples that A C\ B C\ C = 0 does not imply A C\ B = 0
4.7. (a) Consider the following valid argument.
Hypotheses. (1) All college students are beer drinkers.
(2) All beer drinkers are well dressed. Conclusion. Therefore all college students are well dressed. Write the hypotheses and the conclusion using symbols of set theory (see Problem 3.13), and then prove the argument is valid; i.e., the conclusion is true whenever the hypotheses are both true. Each step in your proof should be justified by appealing to one of the hypotheses or to one of the laws in Theorem 4.1. (b) Following the procedure outlined in part (a), prove that the following argument is valid. Hypotheses. (1) All college students are beer drinkers.
(2) No beer drinkers are welldressed. Conclusion. Therefore no college students are welldressed.
4.8. If A and B are subsets of a universal set 'U, the symmetric difference of A and B, denoted by A A B, is defined as the set
A&B~ (AC\Bf)\J(Arr\B).
(a) Construct a membership table for A A B.
(b) In an appropriate Venn diagram, identify the region representing the set A A B.
(c) By means of membership tables, prove each of the following laws:
(i) A A 0 = A
(ii) A A at = A'
(iii) A A A = 0
(iv) AAA' = °a
99155
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


38 SETS / Chap. 1
(v) AAB = BAA
(vi) A A (B A C) = (A A B) A (7
(vii) A r\ (B A (7) = (A H 5) A (A C\ C)
(d) Prove each of the laws in part (c) by showing that they follow from the laws in Theorem 4.1.
(e) Use Venn diagrams to verify each of the laws in part (c).
(f) Show that
AA(BC\C) = (.AAF)n(lAC)
is not a law, i.e., there are sets A, B, and C for which the equality does not hold. By means of a membership table, or otherwise, determine what additional information about the sets A, B} and C suffices to guarantee that the equality does hold.
4.9. Prove that (A C\ B) H (C Pi D) and (A Pi (B C\ (7)) C\ D are equal sets, supporting each step in your proof by citing a law in Theorem 4.1*
4.10. Prove Formulas (4.6)(4.8) by mathematical induction.
4.11. You are given at least one subset of a specified universal set 1i and are instructed to form all possible sets from the given sets by using the operations denoted by O, W, and '. Any new sets you obtain this way are also to be used (with other new sets or with the given sets) to form still other sets by using these same operations. Identify all the different sets you end up with by continuing this process indefinitely if you are originally given the following set(s): (a) It, (b) 0, (c) A (= a subset of 11), (d) A and Af.
4.12. Let <£ be a set of subsets of some fixed universal set 11, Le., the elements of & are subsets of 11. The set d is called an algebra of sets if it satisfies the following conditions:
(1) d is not empty.
(2) If A € a, then A' € a.
(3) If A e a and B e a, then A U B e a.
Prove the following theorems, assuming Ct is an algebra of sets.
(a) 11 e a.
(b) 0 e (X.
(c) If A € a and B e a, then A C\ B e a.
4.13. Show that each of the folio whig is an algebra of sets. (Cf. Problems 4.11 and 4.12.)
(a) a = {11,0}.
(b) a = {11, 0, A, A'}, where A is a subset of 11.
(c) a = 2* the set of all subsets of 11.
99156
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CARTESIAN PRODUCT SETS 39
5» Cartesian product sets
A pair of objects in which we distinguish one of the objects as the first and the other (which need not be different) as the second is called an (yderedj^r. If the first object is called a and the second b, then the ordered pair is written (a, 6). This ordered pair is quite different from the set {a, b} containing the two objects a and b. There is no first element in {a, b}, since order is immaterial when listing elements of a set. Although {a, b} = {&, a}, we want to distinguish between (a, 6) and (&, a). We shall define two ordered pairs to be equal if and only if their first objects are the same and their second objects are the same, i.e.,
(5.1) (a, b) — (c, d) if and only if a = c and b = d.
We have already used ordered pairs, and they are needed even more in our later work. In Example 1.5 and Problems 1.5 and 2.7, we considered ordered pairs of real numbers. Relative to some rectangular coordinate system, we interpreted (x, y) as representing a point in the plane determined by the coordinate axes. According to (5.1), two such ordered pairs of real numbers are equal if and only if they represent the same point.
The objects in an ordered pair need not be numbers. For example, when we toss a coin twice, we can represent the outcome as one of the ordered pairs (H, H), (H, T), (T, H), (T, T), where we write H for "heads" and T for "tails." We agree that the first object in the ordered pair denotes the result of the first toss and the second object denotes the result of the second toss. Since we want to distinguish between the outcomes (H, T) and (T, H), the use of ordered pairs is essential.
It is of some interest that the concept of an ordered pair can be defined in terms of sets and that (5.1) can then be proved. To characterize an ordered pair, it is sufficient to state what two objects make up the pair and also which is to be considered as the first object. Thus the ordered pair (a, &) is determined if we know the set {a, b} of objects in the ordered pair and the set {a} identifying the first object. We are thereby led to the following definition.
Definition 5.1. Let a and b be any objects. The ordered pair (a, b) is defined by
(5.2) (a,b) = {{a,b>, {a}}.
99157
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


40 SETS / Chap. 1
Theorem 5.1. Two ordered pairs (a, b) and (c, d) are equal if and only if a = c and b = d.
Proof. From the definition, we have
(c,d) = {{c,d},{c}}
and this set is clearly identical to the set denning (a, fe) if a = c and 6 = d. This observation constitutes the proof of the "if" part of the theorem. To prove the "only if" part, we assume (a, 6) = (c, d), i.e.,
(5.3) {{MMa}} = {{c,d},{c}},
and proceed to prove that a = c and 6 = d. Now two sets are equal only when they have the same elements. Therefore it follows from (5.3) that either (1) {a, b} = {c, d} and {a} = {c}, or (2) {a, b} = {c} and {a} = {c, d}.
If case (1) holds, then from {a} = {c} we conclude a = c, and {a, 6} = {c, d} then implies b = d. Thus the theorem is true in case
CD.
If case (2) holds, we start from {a, b} = {c} and, recalling that {a, a} = {a}, conclude that a = b = c. Then {a} = {c, d} becomes {a} = {a, d}, from which d = a follows. Hence in case (2) we have a = b = c = d, and the theorem is certainly true. This completes the proof.
Whenever we have two sets, we can always form ordered pairs by taking the first object of the pair from one of the sets and the second object from the second set. This simple observation turns out to be quite important and, as usual, involves a special notation and terminology.
Definition 5.2. If A and B are sets, then the set of all ordered pairs (a, b) such that a belongs to A and b belongs to B is called the Cartesian product of A and B, and is denoted by A X B. In symbols,
A X B = {(a, 6)  a e A and 6 e B}.
We now have still another way of obtaining new sets from given sets: form Cartesian product sets.
Example 5.1. If A = {H, T} and B = {1, 2, 3}, then
AXB= {(H, 1), (H, 2), (H, 3), (T, 1), (T, 2), (T, 3)},
B X A = {(1, H), (1, T), (2, H), (2, T), (3, H), (3, T)},
A X A = {(H, H), (H, T), (T, H), (T, T)},
B X B = {(1,1), (1, 2), (1, 3), (2,1), (2,2), (2, 3), (3,1), (3,2), (3,3)}.
99158
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CARTESIAN PRODUCT SETS 41
This example enables us to indicate the reason for the importance of Cartesian product sets in probability. Consider the following experiments: (1) toss a coin, and (2) choose a number from among the first three positive integers. Each element of A represents an outcome of the cointossing experiment, each element of B an outcome of the second experiment. Now think of the composite experiment in which we first toss a coin and then choose a number. Outcomes can be represented by ordered pairs, like (H, 2), indicating the result of each part of the composite experiment. Thus the outcomes of the composite experiment are given by the Cartesian product set A X B.
Note that B X A is not the same set as A X B. The set B X A yields outcomes of the different composite experiment in which we first choose a number and then toss a coin. If we toss the coin twice, we obtain an outcome represented by an element of A X A. Finally. if we choose a number and then choose another number, each from the set Bj then outcomes of this composite experiment correspond to elements of B~ X B. We hasten to add that not all composite experiments lead to Cartesian product sets. For example, if we choose a number from the set B and then choose another number from the remaining numbers, the set of possible outcomes is
{(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)},
which is not a Cartesian product set, although its elements are ordered pairs. These matters are taken up more fully in the next chapter.
Example 5.2. We have previously mentioned our interpretation of ordered pairs of real numbers as points in a plane. If R is the set of all real numbers, then there is one and only one point corresponding to each ordered pair in R X R, and one and only one ordered pair in R X R corresponding to each point. Indeed, this onetoone correspondence between points and ordered pairs of real numbers is the fundamental idea of plane analytic geometry. A plane with axes is called a Cartesian plane, after Ren6 Descartes (15961650), one of the inventors of analytic geometry. The graph of any subset of R X R is defined as the set of points corresponding to ordered pairs of the subset. For example, the set
is a subset of R X R whose graph is the circle with center at the
99159
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


42 SETS / Chap. 1
origin and radius 3 units. The graph of R X R itself is the entire plane.
We can define ordered triples, and in general, ordergd rtuples. in terms of ordered pairs. An ordered triple (or 3tuple), for example, is an ordered pair whose first member is an ordered pair, i.e.,
(5.4) (a,Z»,c) = ((a,&),c).
Similarly, we define ordered quadruples (4tuples) as an ordered pair whose first member is an ordered triple, i.e.,
(a, 6, c, d) = ((a, 6, c),d). In general, an ordered rtuple is defined as follows:
(5.5) (Oi, 02, • • , Or) = ((Oi, 02, • • ', Or_i), Or).
From these definitions it can be proved that two ordered rtuples are equal if and only if their corresponding objects are equal, i.e.,
(Oi, 02, • • ', Or) = (&1, 62, • • •, &r)
if and only if
Oi = 61, 02 = &2, • • ', Or = &r
We leave the proof for the problems.
Since the Cartesian product of two sets A and B is a set of ordered pairs (2tuples) it comes as no surprise that we can define the Cartesian product of r sets as a set of rtuples.
Definition 5.3. We suppose that r is a positive integer greater than 1 and that AI, A2) • • , Ar are sets. The set of all ordered rtuples (01, as, • • •, cjr) such that a\ belongs to AI, a% belongs to Az, • • •, or belongs to AT is called the Cartesian product of the sets Ai} A2, * • •, Ar> and is denoted by AI X A2 X • * • X Ar. In symbols,
AI X A2 X • • • X Ar = {(
99160
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CARTESIAN PRODUCT SETS 43
rtuples in the Cartesian product of these sets. We denote by n(A) the number of elements in the set A.
Theorem 5.2. If r is any positive integer and At, Az, •••, Ar any sets, then
(5.6) n(Ai X A* X • • • X Ar) = n(A^n(A^  • • n(AJ.
Proof. There are as many elements in Ai X Az X • • • X Ar as there are rtuples (aj, a2, • • •, ar) where a\ € Ai, a2 e A% • —,areAr. The object a\ can be chosen in n(Ai) different ways. We can then choose the object a2 in n(42) different ways, and we continue in this way until we come to the last object ar which can be chosen in n(Ar} different ways. Hence, by the fundamental principle of counting (p. 9), the number of rtuplec is given by the product n(Ai)n(At) •   n(Ar) and the theorem is proved.
Example 5.4. Let
A = {H,T} and B= {1,2,3,4,5,6}. Then the Cartesian product set A X A X B has
n(A)n(A)n(B) = 2 • 2  6 = 24 elements.
Each element is a 3tuple which can be interpreted as representing* one of the 24 possible outcomes of the experiment in which we throw a coin twice and then roll a die. For example, (H? H, 6) would denote that both tosses resulted in heads and the die showed the number 6 on its uppermost face.
PROBLEMS
5.1. Let A = {1, 2}, B = {2, 3}, and C = {3} be subsets of the universal set °ll = {1, 2, 3}. List the elements of the following sets:
(a) AX A (b) C X C (c) A X B (d) B X A
(e) (A X B) C\ (B X O (f) (A X B) U (B X C)
(g) (A X «U) H (11 X B) (h) A X B X C
5.2. Sketch the graph of the sets in (a)(d) of the preceding problem. Bxm would you sketch the graph of the set in part (h)?
5.3. (a) Show that A X B = B X A if A = B. Is the converse true?
(b) Show that A X B * 0 if and only if A = 0 or B = 0.
(c) Show that A X B C C X D if A C C and £ C D. Is the converse true?
99161
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


44 SETS / Chap. 1
5A (a) Prove that A X (B C\ (7) = (A X B) H U X C). (b) Prove that A X (B VJ (7) = (A X B) U (A X (7).
5.5. Let A and jB be subsets of some universal set 'U. Prove that
(A X CU) P\ (at X 5) = A X B.
5.6. The ordered pair (a, 6) is defined as a certain set by Formula (5.2). How many different elements does the set (a, 6) contain? (Do not fail to consider the case when a = b.)
5.7. (a) With ordered 3tuples defined by Formula (5.4), show that
(o,M) = («U/)
if and only if a = d, b = e, and c = /.
(b) More generally, if r is any positive integer greater than 1, prove by mathematical induction that two ordered rtuples are equal if and only if their corresponding objects are equal.
5J. Let A be a set with n elements. How many elements are in each of the following sets?
(a) AX A
(b) {(or, y)\xeA,yeA, and x ^ y} (o) AX AX A
(d) {(or, y,z)\X€A,y€A,zeA,X7*ytX7*z,yy*z} '(e) For each set in (a)(d), describe an experiment whose outcomes can be represented by elements of the set.
SUPPLEMENTARY READING
1. Breuer, J., Introduction to the Theory of Sets, translated by H. F. Fehr, PrenticeHall, Inc., 1958.
2. Kemeny, J. G., J. L. Snell, and G. L. Thompson, Introduction tc Finite Mathematics, PrenticeHall, Inc., 1957.
3. Mathematical Association of America, Committee on the Undergraduate Program, Elementary Mathematics of Sets, 1958.
4. May, K. 0., Elements of Modern Mathematics, AddisonWesley Publishing Company, Inc., 1959.
5. Suppes, P., Introduction to Logic, D. Van Nostrand Company, Inc., 1957.
99162
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Chapter 2
PROBABILITY IN FINITE SAMPLE SPACES
la Sample spaces
Probability questions arise when we think of real or conceptual experiments and their outcomes. Therefore, our first task in the precise formulation of probability theory must be to discover a suitable mathematical way by which an experiment can be specified.
Think of tossing a coin. We ordinarily agree to regard "head" and "tail" as the only possible outcomes. If we denote these outcomes by H and T respectively, then each outcome of the experiment would correspond to exactly one of the elements of the set {H, T). This set is called a sample space for the experiment.
Now let us toss a penny and a nickel. How shall we record the outcome of this experiment? Each time we toss the coins, we can write down the number of heads obtained. Accordingly, each outcome of the cointossing experiment corresponds to exactly one of the elements of the set Si = {0, 1, 2}. Si is a sample space for the experiment. We say a rather than the sample space, since we can think of other ways of describing the outcomes of this same experiment. Indeed, were we to toss the corns and record, let us say, only that we obtained one head, we are then embarrassed by the question, "Did the penny fall heads?" Our method of classifying outcomes was
45
99163
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


46 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
too coarse; we lost information by merely recording the number of heads obtained.
We get a finer classification by recording whether both coins fall heads (HH), the penny falls heads and the nickel tails (HT), the penny falls tails and the nickel heads (TH), or both coins fall tails (TT). Now each outcome of the experiment corresponds to exactly one element in the set
&= {HH,HT,TH,TT}.
82 is another sample space for this experiment. We recognize & as the Cartesian product A X A, where A = {H, T} and where we have introduced simplified notation for ordered pairs, writing HH for (H, H), HT for (H, T), etc. When, as in this example, there is no possibility of misinterpretation, we shall often use this less cumbersome notation for ordered rtuples.
This situation is typical of most examples. Whether to classify outcomes one way or another is not a question our theory answers. Let us therefore agree at the outset that there is no one correct sample space for a given experiment. Different people or even the same person at different times may describe the outcomes differently. We insist only that any sample space meet the requirements in the following definition.
Definition 1.1. A sample space S associated with a real or conceptual experiment is a set such that (1) each element of S denotes I an outcome of the experiment, and (2) any performance of the experiment results in an outcome that corresponds to one and only one Clement of S.
Although many sample spaces may meet these requirements, and hence serve to describe the same experiment, we have seen that one may be more suitable than another. In general, it is a safe guide to include as much detail as possible in the description of the outcomes of the experiment. Imagine that you are recording the outcome in a notebook and insist that what you write enables you to answer all pertinent questions concerning the result of the experiment.
Example 1.1. Let a green die and a red die be rolled. The set Si = {0 sixes, exactly 1 six, 2 sixes}
is a set that meets the requirements of Definition 1.1, and hence can serve as a sample space for this experiment. So can the set
99164
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / SAMPLE SPACES 47
& = {2, 3, 4, 5, 6, 7, 87 9, 10, 11, 12},
If we understand that an element of S2 stands for the sum of the numbers on the dice. But neither Si nor Sz involves a fine enough classification of outcomes to answer the question, "Is the number on the red die greater than the number on the green die?73 To take care of all relevant questions, we should record the numbers on each of the dice. We are thus led to take as sample space the set S (defined on p. 12) containing 36 ordered pairs, it being understood that (x, y} denotes the outcome in which the green die shows the number x and the red die the number y. Since x and y are themselves integers in the set
D= {1,2,3,4,5,6},
the sample space S can be written as a Cartesian product : (1.1) S=£XD  {(x,y) \x€
Note that D itself can serve as a sample space for the experiment in which one die is rolled. Finally, let us observe that
$3 = {0 sixes, 2 sixes} and
£4 = {0 sixes, (1, 6), exactly 1 six, (6, 6)}
are examples of sets that cannot serve as sample spaces for the twodice experiment. Both sets violate condition (2) in Definition 1.1: the outcome (1, 6), for example, corresponds to no element of S* and to two elements of $4.
A sample space can be an infinite set. For example, toss a coin until it falls heads for the first time. It is logically conceivable that we get an unending sequence of tails and that a head is never obtained. Call this outcome co. If a head is obtained, we specify the outcome by recording the number of the toss that produced the first head. Our sample space is
5= {«,1,2,3, •••},
which is clearly an infinite set. As another example, let an experiment consist of selecting one point from among the points on some line of unit length. (This conceptual experiment can be carried out, at least with our mind's eye, by imagining an exceptionally pointed dart thrown at a line segment.) Since we can associate a unique real mim
99165
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


48 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
ber with each point on the line, we can take as sample space the infinite set
S = {xeR\0 4, then S contains more than a billion rtuples. Nevertheless, S is a finite sample space for all values of r. Therefore, probability questions about this
* To refer in any chapter to a theorem, definition, example, problem, or formula in that same chapter, we use only the number by which it is identified in the text. But to refer to one of these items appearing in another chapter, we prefix its identifying number with a roman numeral identifying the chapter. For example, we write Problem 1.2.20 to denote the twentieth problem in Section 2 of Chapter 1. Were we to write Problem 2.20 here, we would mean to refer to the twentieth problem in Section 2 of the present chapter, namely Chapter 2.
99166
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / SAMPLE SPACES 49
experiment will be answered by our theory. (We continue this problem in Example 2.1.)
Example 1.3. What sample space should we use for the experiment in which a bridge hand is dealt from an ordinary deck of cards? Since we care only about which 13 cards make up the hand, and not about the order in which they are dealt, we can consider a bridge hand as a subset of 13 cards from the set of 52 cards hi the deck. Let us write ASJ KSJ • • •, 2« to denote the ace, king, • • , deuce of spades, reserving subscripts h, d, and c to indicate cards that are hearts, diamonds, and clubs, respectively. Then
(1.3) D = {A9>  * •, 2,, Ah,  • •, 2k, Ad, •  , 2* Ac, •  •, 2C}
is a set of 52 elements representing the full deck. For our experiment, we take as sample space the set S of all 13element subsets of D. In symbols,
(1.4) 8 = {B  B C D and n(B) = 13}.
Probability questions concerning bridge hands will be answered by our theory, since S is a finite set. The problem of determining n(S), i.e., of counting the number of possible bridge hands, is taken up in Chapter 3.
PROBLEMS
1.1. We describe certain experiments. In each case specify an appropriate / sample space for this experiment.
^(a) A card is selected from a standard deck of cards. w(b) Three coins are tossed.
~(c) A boy has a penny, a nickel, a dime, and a quarter in his pocket. He takes two coins out of his pocket, one after the other.
(d) Two distinguishable objects are distributed in two numbered cells.
(e) Two indistinguishable objects are distributed in two numbered cells.
(f) A survey of families with two children is made and the sexes of the children (the older child first) are recorded.
(g) A survey of families with three children is made and the sexes of the children (in order of age, oldest child first) are recorded.
(h) A survey of families with r children is made and the sexes of the
children (in order of age, oldest child first) are recorded. ^ (i) r coins are tossed, ^(j) A poker hand (five cards) is dealt from an ordinary deck of cards.
99167
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


50 PROBABILITY SN FINITE SAMPLE SPACES / Chap. 2
1.2. Six boys in a club select a committee of three. The boys are A, B, C, D, E, and F.
(a) List the 20 elements of the appropriate sample space S for this experiment.
(b) Find the subset of S containing those outcomes in which A is selected. How many elements does this subset contain?
(c) Find the subset of S containing those elements in which both A and B are selected. How many elements does this subset contain?
(d) Find the subset of S containing those outcomes in which A or B is selected. How many elements in this subset?
(e) Find the subset of S containing those outcomes in which A is not selected. How many elements in this subset?
1.3. Refer to part (c) of Problem 1.1. For how many outcomes of the sample space is it the case that the boy takes less than 15 cents out of his pocket?
1.4. Refer to the sample space S of 36 elements in Example 1.1 of the text. Let E denote the subset of S whose elements denote outcomes for which the sum of the numbers on the dice is greater than 9, and F the subset whose elements denote outcomes for which the numbers on the dice are equal. Determine the elements in the following sets:
(a) E H F (d) EC\F'
(b) E U F (e) Ef
(c) E'C\F (f) F'
1.5. An experiment consists of selecting one chip from a hat containing six chips numbered 1, 2, 3, 4, 5, and 6. Of the following sets, state which are suitable sample spaces for this experiment and which are unsuitable.
(a) £={1,2,3,4,5,6}
(b) 5{1,2, 3, 4, 5}
(c) S = {odd number, even number}
(d) £ — {1, 33 5, even number}
(e) S = {1, 2, number less than 6, 6}
(f) S = {number less than 3, 3, number greater than 3}
1.6. An experiment consists of selecting r light bulbs from the lot produced by a machine and testing them. A bulb can be good (G) or bad (B). Define a sample space for this experiment and compare your sample space with those hi Problems 1.1 (h) and 1.1 (i). What observation do you make and what lesson is learned thereby?
1.7. Urn 1 contains one black and two white balls. Urn 2 contains two black and one white ball. An experiment consists of first selecting an orn and then drawing a ball from this urn. Define a suitable sample space for this experiment.
99168
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / EVENTS 51
2« Events
The theory of probability begins when a sample space S, the mathematical counterpart of an experiment, is specified. The sample space serves as the universal set for all questions concerned with the experiment.
We may be interested in the occurrence of a variety of events when an experiment is under consideration. For example, think of the experiment of tossing a coin three successive times and let
(2.1) S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
be the associated sample space. We may be interested in the event "the number of heads exceeds the number of tails." For any outcome of the experiment we can determine whether this event does or does not occur. We find that HHH, HHT, HTH, and THH are the only elements of S corresponding to outcomes for which this event does occur; if the experimental outcome corresponds to one of the other elements of S, then the event in question does not occur. Thus, to say that the event "the number of heads exceeds the number of tails" occurs is the same as saying the experiment results in an outcome corresponding to an element of the set
A = {HHH, HHT, HTH, THH}.
We recognize A as a subset of the sample space S. The subset A can be taken as the mathematical counterpart of the event "the number of heads exceeds the number of tails." Similarly, we find the follow* ing correspondence between various events and subsets of S:
Verbal Description of Event Corresponding Subset of S
Number of heads exceeds
number of tails A = {HHH, HHT, HTH, THH}
Number of heads is exactly 2 B = {HHT, HTH, THH}
Number of heads is at least 2 C = {HHH, HHT, HTH, THH} = A
Second toss is heads D = {HHH, HHT, THH, THT}
All tosses sho,w the same face E = {HHH, TTTJ
Number of heads is less than 2 C' = {HTT, THT, TTH, TTT}
Second toss is not heads D' = }HTH, HTT, TTH, TTT} Second toss is heads and the
number of heads is exactly 2 D D B = {HHT, THH}
99169
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


52 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Second toss is beads or the
number of heads is exactly 2 D U B = }HHH, HHT, HTH, THH, THT}
In the light of this example, we introduce the following general terminology.
Definition 2.1. Let a sample space S be given. An event is a subset of S. We say the event E occurs if the outcome of the experiment corresponds to an element of the subset E.
Because of this definition, the language and notation of set theory can be expected to find extensive use in the theory of probability. To illustrate this point, suppose an experiment specified by the sample space S results in an outcome denoted by the element o eS. The reader must be certain that he understands the correspondence between the everyday language on the left and the set language and symbolism on the right in the following glossary:
Event E Subset E of the sample space S
Event F Subset F of the sample space S
Event E occurs o e E
Complementary event of E (notE) Er (the complement of set E)
Event E does not occur o e E'
Event E or event F EOF (the union of sets E and F)
Either event E or event F occurs
(at least one of E and F occurs) o e (E U F)
Event E and event F E f! F (the intersection of sets E and F)
Both event E and event F occur o c (E (1 F)
Event E is impossible E = 0
Event E is certain E = S E and F are mutually exclusive events E f) F = 0 If event E occurs, then event F
cccuis (£ implies F) E C F (E is a subset of F)
Because of its intuitive appeal, we shall continue to use the everyday language listed in the left column. But let us recognize that each such phrase is defined by the corresponding settheoretic equivalent in the right column and is thereby given precise meaning within our mathematical theory.
Example 2.1. We return to the experiment in which r people are selected and their birthdays recorded. In Example 1.2, we defined a
99170
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / EVENTS 53
sample space S for this experiment and found that it had 365r elements. Now let E be the event that at least two among the r people selected have the same birthday. We want to determine n(E), the number of elements in the subset E. It turns out to be easier to calculate n(E'), the number of elements in the complementary event of E. We then use the formula (Cf . Problem 1.3.7)
n(E) + n(Er) = n(S) = 365"
to determine n(E).
Now Ef is the event that no two among the r people selected have the same birthday. Hence, n(E') is equal to the number of ways of selecting r different numbers (birthdays), each being chosen from the full set of 365 possible birthdays. The first man's birthday can be chosen in 365 ways, the second man's in 364 ways, the third man's in 363 ways, and so on until we select the rth man. His birthday, in order for it to be different from all the others, can be chosen in only 365 — (r — I) or 365 — r + 1 ways. Invoking the fundamental principle of counting, we conclude that
^J^^
Finally, we find r „, fi \ : '* •' "
for the number of different ways that the r selected people include ^ at least two having the same birthday. (Continued in Example 3.6.)
PROBLEMS
2.1. Refer to the sample space of Problem 1.1, part (a), and determine the subsets defining the following events.
(a) The card selected is a spade.
(b) The card selected is a jack, queen, or king.
(c) The card selected is the ace of spades.
2.2. A green and a red die are thrown. (Cf. Example 1.1.) Let A be the event that the sum of the numbers on the faces is even, and B the event that the number on the green die is odd.
(a) List the elements of subsets A and B.
(b) Give a concise verbal description of the event A P\ B.
(c) How many elements of the sample space S are in the event A W Bl
99171
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


54 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
2.3. Refer to the sample space S in Example 1.2. Let E be the event that the first person selected is born on January 3 and F that the second person selected is born on January 28. (a) Write E, F, and E Pi F as Cartesian product sets, (b) Count the number of elements in E} F, E A F, and E \J F.
3.4. Let .4, B, and C be any events of a sample space S. Using only the symbols C\, W, ', A, B, C, write expressions for the events that of A, B, and C:
(a) At least one occurs. (b) Only A occurs,
(c) A and B occur, but not C. (d) All three occur,
(e) None occurs. (f) Exactly one occurs,
(g) Exactly two occur. (h) At most two occur.
(Hint: Refer to Figure 7 on p. 21 and determine the region representing each event.)
2.5. Refer to the sample space of Problem 1.1 (e) and let E be the event that the first cell is empty, F the event that the second cell is empty, and G the event that the second cell contains both objects. Show that the following relations among these events are true.
(a) E H F = 0 (c) F C Q'
(b) E £ G (d) S~Ef\JF'
2.6. Let S be the sample space defined in Problem 1.7. Suppose E is the event that the first urn is selected, and F that a white ball is drawn. Describe the following events in words, and list their elements: E C\ F} E', E/ r\ F, F', E\JF.
3. The probability of an event
If a (real or conceptual) experiment is under consideration, there &re many events in whose occurrence we may have some interest. Using our mathematical language, this amounts to saying that if a sample space S is given, then we can form many subsets of S. In fact, if
(3.1) S = {01,02, ***,on}
Is a finite sample space containing the n elements 01, 02, • • , on, then there are 2n different subsets of S and since each subset is an event, there are 2n different events. In this section, we are finally in a position to define what is meant by "the probability of an event." Our first step is to distinguish certain special events that form huilding blocks from which other events can be constructed.
99172
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / THE PROBABILITY OF AN EVENT 55
Definition 3.1. Let a sample space S be given. We shall mean by a simple event a unit subset of S, i.e., a subset containing only one element of S.
With S defined in (3.1), there are exactly n simple events, viz.,
(3.2) {ox}, {0 0'= 1,2,.. ,n).
(ii) The sum of the probabilities assigned to all simple events of the sample space is 1. In symbols,
(3.4) S P({0y}) = P({0l}) + P({02}) + • •  + P({0«}) = 1. yi
We shall say that an assignment of probabilities to the simple events of S is acceptable if it satisfies (i) and (ii).
In view of condition (ii), it is clear that the probability of each simple event is not only at least 0. as required by condition (i), but also at most 1; i.e.,
(3.5) 0
99173
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


56 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
It is most important to recognize that in spite of these restricting conditions, there are many possible assignments of probabilities to the simple events. We give some examples.
Example 3.1 . Let a coin be tossed. We define the sample space S = {H, T} and so have two simple events: {H} and {T}. Each of the following assignments of probabilities to these simple events is acceptable :
(1)
(2) P({H}) = i and P({T}) = §,
(3) P({H}) = 1 and P({T}) = 0.
In fact, if p is any real number between 0 and 1 inclusive, then P({H}) = p and P({T}) = 1  p
is an acceptable assignment of probabilities to the two simple events {H} and {T}. Therefore, we see that there are infinitely many possible acceptable assignments, one for each choice of the number p. Most people would find the choice P({H}) = P({T}) = % the "natural" choice. We do not go into the psychological reasons behind this feeling, but merely make three points:
(1) This choice is neither more nor less acceptable than any other acceptable choice. An assignment of probabilities to simple events is either acceptable or not; there are no degrees of acceptability. Definition 3.2 requires only that we meet the two conditions for assigning probabilities to simple events.
(2) This "natural" choice is not dictated by experience with real coins. Experience with real coins shows that they usually fall short of being "ideal" coins: they are not perfectly circular and S3rmmetrically weighted; heads and tails are not equally likely.
(3) Nevertheless, we often do choose to develop the theory for such "ideal" or "fair" coins since, as we shall see, the theory then is both logically and psychologically appealing. But more important is the fact that the theory for "ideal" coins supplies a basis for testing real coins and deciding, depending upon the extent to which the predictions of the theory are borne out by the empirical results with the real coin, whether it is reasonable to assume that the real coin is "fair." This last point leads us to expect correctly that probability theory will prove useful in problems of statistical inference.
99174
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec 3 / THE PROBABILITY OF AN EVENT 57
Example 3.2. A green and red die are rolled, and we use as sample space the set S denned in Example 1.1. There are 36 simple events. The "natural" assignment of probabilities to these simple events is the one in which each simple event is assigned probability fa. This is an acceptable assignment, since both required conditions are fulfilled: the probability of each simple event is nonnegative and the sum of the probabilities of all simple events is 1. Of course, here too there are infinitely many other acceptable assignments of probabilities to the 36 simple events.
Example 3.3. A person is selected from the population of a certain country and asked the question, "Do you think there will be another world war?" We classify each answer into one of the three categories "Yes/' "No," "Don't Know." Our sample space S contains three elements, one for each possible answer:
S= {Y,N,DK}.
There are three simple events. This example illustrates the fact that we often do not have any basis for the assignment of probabilities to the simple events of an experiment. In the absence of information about the opinions of people in the country, we can do no more than guess appropriate values for these probabilities. However, we can make the assignment
P({Y}) = p, P({N}) = g, P({DK}) = r,
where we know only that p, q and r are nonnegative real numbers whose sum is 1,
P > 0, g > 0, r > 0, p + q + r =1.
If we are told that 60 percent of the population of the country expect another world war, 30 percent do not expect another world war, and 10 percent are uncertain, then it seems natural to choose p = 0.6, q = 0.3, and r = 0.1.
It is now an easy step to define the probability of any event. Let a finite sample space S be given, and suppose an acceptable assignment of probabilities has been made for the simple events of S. Let E be any event. Then E is either (i) the empty set 0, (ii) a simple event, or (iii) the union of two or more different simple events. In case (ii) the probability of E has already been assigned. The following definitions take care of cases (i) and (iii).
99175
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


58 PROBABILITY SN FINITE SAMPLE SPACES / Chap. 2
The probability of the empty set 0, denoted by to be zero; i.e., P(0) = 0.
Definition 3.4. If E is the union of two or more different simple evenfsTTEeSTffiFproBability of E, denoted by P(E), is the sum of the
probabilities of those simple events whose union is E. (It is understood/that each simple event is counted exactly once.)
Example 3. 4.^ A green and a red die are rolled, and we choose the sampIeTspace S containing the by now familiar 36 ordered pairs. Let us assign the probability fa to each of the 36 simple events of S. If A is the event "sum of numbers on dice is 7," then
A = {(1, 6)} U {(2, 5)} U {(3, 4)} U {(4, 3)} U {(5, 2)} U {(6, 1)}. Hence, by Definition 3.4, we find
Similarly, if S is the event "sum of numbers on dice is 11," then
B = {(5, 6)} U {(6, 5)} and, again by Definition 3.4,
Example 3.5. A card is selected from a standard deck of 52 cards. W^take™the°set D in (1.3) as sample space, and seek the probability that the card selected is a spade. Let us assign equal probabilities to each of the 52 simple events of D. Then each simple event has probability ^V The event that the card is a spade is the union of the following 13 simple events:
Hence
P(card selected is a spade) = A + ife + • • • + A = ii = i
13 times
Similarly, the event that the card selected is an ace or a spade is the union of the following 16 simple events: {A8}} {A*}, {Ad}, {Ac}} {Ks}> {&},• ,{2.}. Hence
P(card selected is an ace or a spade) =£$ + £%+ • • • +^ = if = T$~
16 times
99176
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / THE PROBABILITY OF AN EVENT
59
Example 3.6. We continue the birthday problem of Example 2.1, and now compute the probability of the event E that at least twoamong the r people selected have the same birthday. We assume that each ordered rtuple of the sample space S is as likely as any other to represent the outcome of the experiment. (This assumption would of course be false if the r people were selected at a convention of twins. But even if the selection is made from the entire population of the United States, the fact that the proportion of all births occurring in a given month varies from month to month still makes our mathematical model only a first approximation to the actual state of affairs.) In any case, we assign to each of the 365r simple events of S the same probability l/(365)r. Since the number of simple events in E is equal to n(E), the number of elements in E (why?), it follows from Definition 3.4 that P(E) is the sum of n(E) probabilities, each equal to l/(365)r. Hence
= »CB)
and if we substitute the expression for n(E) found in Formula (2.2), we obtain
365 • 364  • • (365 r + D
(3.6)
P(E) = 1 
(365)r
The probability P(E) is given (to two decimalplace accuracy) for various values of r in Table 10. Note the rather surprising fact that
TABLE 10
r 10 20 22 23 24 30 40 50
P(E) .12 .41 .48 .51 .54 .71 .89 .97
in as small a group as 23 people, the probability of finding at least two people with the same birthday is greater than J.
Example 3.7. Two coins are tossed. We define the sample space S given by
S = {HH, HT, TH, TT}
and ask for the probability of the event E that at least one head occurs.
99177
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


60 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Solution 1. We assign probability J to each of the four simple events of S. The event E is the union of three simple events,
E = {HE} U {HT} U {TH}. Hence P(B) *=t + i + l = i
Solution 2. We assign probabilities to the simple events as follows: P({HH}) = P({HT}) = , P({TH}) ~ P({TT}) = 0.
The event E is still the union of the same three simple events, but now
These two solutions yield different values for the probability of the same event E. This should not be disturbing since, according to Definition 3.4, the probability of an event depends on the previous assignment of probabilities to the simple events. With different acceptable assignments of probabilities to the simple events, as in Solutions 1 and 2 of Example 3.7, we have no reason to be surprised if an event E turns out to have different probabilities. Which assignment of probabilities to the simple events should be made is not a mathematical question, but one that depends upon our assessment of the realworld situation to which the theory is to be applied. The assignment in Solution 1 is the natural one for unbiased, "fair" coins, but the assignment in Solution 2 is more sensible if one is sure that the first coin is loaded so as to turn up heads all the time.
We conclude with three remarks.
(1) Statements in the theory of probability, as in all of mathematics, are of the conditional form; i.e., "If such and such is assumed, then such and such follows. " Ordinarily, when we select a card from a full deck and inquire, "What is the probability of selecting a spade?" wre answer, as in Example 3.5, "The probability of selecting a spade is J." But a complete answer would read as follows: "// we choose as sample space the set D containing 52 elements, one for each card in the deck, and if we assign equal probabilities of ^ to each of the 52 simple events of Z), then the probability that a spade is selected is J." The antecedents of this conditional assertion are usually omitted and only the consequent is stated, when it is clear from the context which sample space and which assignment of probabilities to simple events have been chosen. But if two people do the
99178
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / THE PROBABILITY OF AN EVENT 61
same problem and get different numbers for the probability of the same event, it becomes important to spell out the hypotheses each used. Each answer may logically follow from the hypothesis employed in its derivation, and the answers may differ because different sample spaces or different assignments of probabilities to simple events were made. However, if the same sample space is used and the simple events are assigned the same probabilities by both people, then two different values for the probability of the same event can only mean that at least one of the people has committed a logical error.
The situation here is analogous to that in plane geometry. The assertions "The sum of the angles of a triangle is ISO degrees" and "The sum of the angles of a triangle is less than 180 degrees" certainly differ, but we cannot say whether they are true or false, since neither is a complete mathematical assertion. The hypotheses must be explicitly stated or implicitly understood. Thus, the first statement is true if it is intended to read, "If the axioms of Euclidean geometry are accepted, then the sum of the angles of a triangle is 180 degrees." And the second statement is true if we expand it to read, "If the axioms of Lobachewskian geometry are accepted, then the sum of the angles of a triangle is less than 180 degrees." Both Euclidean and nonEuclidean geometries are fruitful mathematical theories, and since their premises differ, nobody is now disturbed when their conclusions also differ. Which geometry should be used in a particular context is not a mathematical question, but one that is of great interest to those (like physicists) who apply geometry to the real world.
(2) Our definitions are so framed that events and only events have probabilities. Some authors prefer to formulate the theory so that statements are assigned probabilities. Except for linguistic differences, the theories are equivalent.
(3) Often, a particular assignment of probabilities to the simple events of a sample space S is implied by the use of certain adjectives. For example, if we say a "fair" coin is tossed, we shall mean that the two simple events {H} and {T} are to receive equal probabilities (each i). When we say a pair of "fair" or "unbiased" dice are thrown, we mean to insist that each of the 36 simple events of the familiar sample space be assigned probability ^g. When we say that a number is selected "at random" from n different numbers, then we agree to assign probability 1/n to each of the n simple events of the
99179
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


62 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
appropriate sample space. Further examples will be discussed later, but for the present we adopt this convention in the problems that follow.
PROBLEMS
3.1. Consider the dice experiment of Example 3.4 and make the "natural" assignment of probabilities to the 36 simple events, so that each has probability ^V Find the probability of the following events.
(a) The sum of the numbers on the dice is less than 4.
(b) One die gives a 3 and the other die a number less than 3.
(c) The sum of the numbers on the dice is 2 or 12.
3.2. A letter of the alphabet is chosen at random. Find the probability of the event that the letter selected
(a) is a vowel.
(b) is a consonant.
(c) precedes u (in alphabetical order) and is a vowel.
(d) follows t and is a vowel.
(e) follows v and is a vowel.
3.3. A committee of three is selected from six people A, B, C, D, E, and F. (Cf. Problem 1.2.)
(a) Specify a suitable sample space S and make an acceptable assignment of probabilities to the simple events of S.
(b) Find the probability that A is selected.
(c) Find the probability that A and B are selected.
(d) Find the probability that A or B is selected.
(e) Find the probability that A is not selected.
(f) Find the probability that neither A nor B is selected.
3.4. Let the sample space S = {QI, o2, o3, 04} be given. Probabilities are assigned to the simple events so that
P({o2}), P({o3}) = KndP(Ko3}).
3.5. In each of the following, specify an appropriate sample space $, assign probabilities to the simple events of S, and then find the required probability.
(a) Find the probability of obtaining exactly two tails if three coins are tossed.
99180
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / THE PROBABILITY OF AN EVENT 63
(b) Find the probability that one cell is empty when two distinguishable objects are distributed in two cells. [Cf. Problem l.l(d).]
(c) Find the probability that one cell is empty when two indistinguishable objects are distributed in two cells. [Cf. Problem 1.1 (e).]
(d) Find the probability of finding a family with no boys among families with two children; with three children; with r children. [Cf. Problem l.l(f)(h).]
(e) Find the probability of all coins falling heads when r coins are tossed. [Cf. Problem l.l(i).]
(f) A die is loaded in such a way that the probability of the face marked j turning up is proportional to j for j = 1, 2, * • *, 6. Find the probability that an odd number turns up when the die is rolled.
(g) A month of the year is randomly selected, and we note the day of the week on which the 13th day of the month falls. Find the probability that this 13th day falls on a Sunday.
3.6. Refer to Table 4 (p. 24) reporting the result of a survey of 321 union men. Let one man be selected at random from this group of 321 men. Decide on a suitable sample space and assignment of probabilities to its simple events, and then find the probability that the man selected
(a) answers "yes."
(b) answers "yes" and is in the union four or more years.
(c) answers "don't know" and is in the union less than four years.
(d) answers "don't know" and is in the union over 10 years.
3.7. Refer to Problem 2.3 and find the probability of the events EC\F and E \J F. Make the same assignment of probabilities to simple events of S as was made in Example 3.6.
3.8. Find the probability that among r people, there will be at least one whose birthday is the same as yours. Use logarithms, or otherwise determine the smallest value of r for which this probability is at least J.
3.9. Three squares numbered 1, 2, and 3 are marked on a table. A deck of three cards numbered 1, 2, and 3 is shuffled and then dealt so that one card appears in each numbered square. This experiment can be thought of as resulting in a permutation of the numbers 1, 2, and 3.
(a) Enumerate the six permutations that make up the sample space.
(b) Assign equal probabilities to each simple event.
(c) If card number j is dealt so as to fall hi square number j, we say that a match occurs hi square j. Let E$ be the event that a match occurs in square j. (Of course, j can be equal to 1, 2, or 3.) Compute the probability of each of the following events: E\, E% E^ Ei \J E*, Ei C\ Ez, Ei r\ Ei C\ E3j El \J
99181
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


64 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(d) Is P(Ei VJ E2) equal to P(El) + P(#2) ? Why not? Can you write a correct formula for P(Ei VJ J5J2)?
3.10. Find the probability that the player wins hi each of the following lotteries. For each lottery, first define a sample space and assign probabilities to its simple events.
(a) Two white and four black balls are placed in an urn and thoroughly stirred. The player draws one ball and wins if the ball he draws is black.
(b) Same as (a), except that the player tosses a coin with one face painted white and the other face painted black just before he draws the ball. He wins if the ball drawn is of the color he tossed with the coin.
Some probability theorems
We now derive some consequences of the definitions given in the preceding section. We assume throughout that a finite sample space S is given,
and that some acceptable assignment of probabilities to the simple events of S has been made.
Because it will be helpful in visualizing the results to be proved, we pause to reformulate the definitions of the preceding section in a more picturesque language. We use the basic idea of a Venn diagram to represent the sample space S and events (subsets) of S. But now we use dots to indicate elements of S. Each dot determines one simple event, namely, the simple event containing as its only member the element represented by the dot. We imagine a flag erected at each dot and on this flag we write the probability assigned to the simple event determined by this dot. The flag erected at the dot representing outcome Oi flies the number P({OI}), the flag Figure 11
erected at the dot representing
outcome 02 flies the number P({o2}), etc. See Figure 11. Because of Definition 3.2, the number on each flag is nonnegative and the sum
99182
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / SOME PROBABILITY THEOREMS 65
of the numbers on all the flags is 1. We now show that we can paraphrase our definitions as follows: The probability of E is the sum of the numbers on the flags erected at elements of E.
If E s= 0, then no dots, and therefore no flags, appear in the region representing E. In order to have P(0) = 0, as required by Definition 3.3, we shall agree to say that the sum of the numbers on the flags is 0 wiien there are in fact no flags. If E is a simple event, then exactly one dot appears in the region representing E and the flag erected at that dot carries precisely the number P(E). If E is the union of two or more, say x, different simple events, then the region representing E contains exactly x dots and, when we add the numbers on the flags erected at these x dots, we are adding the probabilities of the simple events whose union is E. By Definition 3.4, we obtain P(E). Thus, we have shown that the italicized phrase concluding the preceding paragraph is correct for all events E. We shall therefore feel free to use this picturesque "numbers on flags" language whenever it seems helpful.
Theorem 4.1. P(S) = 1; i.e., the probability of a certain event is 1.
Proof. The sample space S is the union of all n simple events,
S {0i}U{02}U ••• UK}. Hence, by Definition 3.4,
= 2
and this sum is 1 by our agreement (in Definition 3.2) that the sum of the probabilities assigned to all simple events must be 1.
Using our picturesque language, the proof of Theorem 4.1 is equally easy. For to find P(S) we must add the numbers on the flags erected at elements of S. But this means adding the numbers on all flags, and we know this sum is 1.
Theorem 4.2. If E and F are events such that E C F, then P(E) < P(F); Le., if E implies F, then the probability of E cannot exceed the probability of F.
Proof. By hypothesis, each element of E is also an element of F. Hence, each simple event among those whose union is E is also a simple event among those whose union is F. Since the probability of
99183
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


66 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
each simple event is nonnegative, it follows that the sum of the probabilities of the simple events of F is at least as large as the sum of the probabilities of the simple events of E. But this is precisely the required conclusion.
In a Venn diagram, all the points representing elements of E would also be in the region corresponding to F. To find P(E) we add the numbers on the flags erected at points of E. All of these numbers as well as those, if any, that are erected at points in F but outside of E9 are summed to get P(F). Hence P(E) < P(F), as before.
Theorem 4.2 says that if event F occurs whenever event E occurs, then the probability of F is at least as large as the probability of E.
Theorem 4.3. If E is any event, then 0 < P(E) < I.
Proof. We have E C $, since an event is by definition a subset of the sample space S. Hence by Theorems 4.1 and 4.2, we conclude that P(E) < P(S) = 1. Also 0 C E, so that Theorem 4.2 yields P(0) < P(E). Since P(0) = 0, our proof is complete.
The extreme values 0 and 1 are worthy of special attention. We know that P(0) =0 and P(S) = 1. Recalling the definition of impossible and certain events given in the glossary on p. 52, we can say that if an event is impossible, then it has probability 0, and if an event is certain, then it has probability 1. But the converse of each of these implications is false; i.e., if P(E) = 0 we cannot conclude that E is impossible, and if P(E) = 1 we cannot conclude that E is certain. For example, in Solution 2 of Example 3.7, the event that the first coin falls tails is {TH, TT}, certainly not the empty set. Yet, by our assignment of probabilities to the simple events, this event has probability 0. In that same example, the event {HH, HT} has probability 1, but is not certain since it is not the entire sample space.
The reason for this state of affairs is that we have allowed simple events to be assigned probability 0. If we insisted, as some authors do, that the probability of each simple event must be positive, then only the empty event would have probability 0, and only the whole sample space would have probability 1. However, it turns out to be the case in problems involving infinite sample spaces that there must exist events that are not impossible but yet have probability 0. Although we cannot pursue this matter here, our definitions are formulated in such a way that the reader need not be surprised by this fact when he goes on to study probabilities in infinite sample spaces.
99184
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / SOME PROBABILITY THEOREMS
Theorem 4.4. Let E and F be two events. Then
(4.2) P(E U F) = P(E) + P(F)  P(E C\ F).
In words, the probability that at least one of the events E and F occurs is obtained by adding the probability that E occurs and the probability that F occurs, and then subtracting the probability that both E and F occur.
Proof, First add the probabilities of all simple events containing elements of E. Their sum is P(E). Then add the probabilities of all simple events containing elements of F. Their sum is P(F). In the sum P(E) + P(F) we have included P({o,}) if and only if os e E U F. But we have added P({OJ}) twice for every o3 eE C\ F: once in the sum P(E), and again in the sum P(F). The sum of the probabilities of simple events that are counted twice is P(E C\ F). We conclude that P(E) + P(F)  P(E O F) is precisely the sum of the probabilities of all simple events in E U F, each counted once. Since this sum is P(E U F), the theorem is proved.
The reader should draw a Venn diagram and test his understanding of this proof by formulating each step in the "numbers on flags7' language. Before illustrating how Theorem 4.4 is used in a particular example, we deduce two more results.
Theorem 4.5. If E and F are mutually exclusive events, then
(4.3) P(E U F) = P(E) + P(F).
Proof. In the result of Theorem 4.4, we have only to note that now P(E H F) = P(0) = 0.
Theorem 4.5 says that the probability of the occurrence of at least one of two mutually exclusive events is the sum of their individual probabilities. Let us not forget the italicized hypothesis that must be true before using Formula (4.3). The use of Formula (4.2) requires no such caution, since it holds for any two events.
Theorem 4.6. Let E and Ef be any complementary events. Then
(4.4) P(E') = 1  P(E).
In words, the probability that E does not occur is obtained by subtracting from 1 the probability that E does occur.
Proof. E and E' are mutually exclusive events, since E Pi E' = 0. Hence by (4.3),
P(E U E') = P(E) + P(Er).
99185
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


68 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
But E U E' = S, the entire sample space, and P(S) = 1 by Theorem 4.1. Hence
1 = P(E)
which is equivalent to (4.4).
In our less formal language, (4.4) is merely the result of noting that we obtain the sum of the numbers on all flags in S by adding the sum of the numbers on flags in E to the sum of the numbers on flags not intf.
The following examples illustrate how our formulas can be used to compute probabilities.
Example 4.1 . Three coins are tossed. Find the probability of get
ting at least one head. We assign equal probabilities to the eight
.simple events of the sample space S defined in (2.1), p. 51. If E is
the event "at least one head," then the complementary event E' is
"no heads." By Theorem 4.6,
P(E) = 1  P(E')
= I P([TTT}) = 1  i = i
Note that we could have computed P(E) directly by recognizing that E is itself the union of seven simple events. This example is so simple that either method is easy. But often in more complicated problems, the most efficient way to find the probability of an event is first to compute the probability of its complementary event and then use Formula (4.4). Recall that we followed this procedure in solving the birthday problem. Although interested in the event E (at least two people have the same birthday) we found it convenient (see Example 2.1) to first study the event E' (no two people have the same birthday).
Example 4.2. An integer is chosen at random from the first 200 positive integers. What is the probability that the integer chosen is divisible by 6 or by 8?
Let E be the event "integer selected is divisible by 6" and F the event "integer selected is divisible by 8." We are required to find P(E U F), We define S = {1, 2, 3, • • , 200} and assign probability
99186
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / SOME PROBABILITY THEOREMS 69
^g to each simple event of S. Now E contains [^]* = 33 integers, and is therefore the union of 33 simple events, each with probability ^. Hence P(E) = jfo. Similarly, F is the union of [*££] = 25 simple events, so that P(F) = j^. Since there are integers among the first 200 (like 24 and 48) that are divisible by both 6 and 8, the events E and F are not mutually exclusive. Hence we must compute P(E C\ F}> An integer is divisible by both 6 and 8 if and only if it is divisible by 24, the least common multiple of 6 and 8. There are Krr] = 8 integers among the first 200 that are divisible by 24. Hence P(E C\ F) = ^^. By applying Formula (4.2), we find the required probability,
P(E U F) = jfa + ^  jfa = i
(For a generalization of this result, see Problem 4.11.)
We have seen that in many examples it is reasonable to assign the same probability to each simple event of the sample space. In this circumstance, there is a simple formula for the probability of an event.
Theorem 4.7. Suppose each of the n simple events of the sample space S in (4.1) is assigned the same probability. (This probability must then be 1/n.) If E is an event containing / elements, then
(4.5) P(E) = f
n
In other words, the probability of an event is the ratio of the number of elements in the event to the number of elements in the entire sample space.
Proof. Since E contains / elements, E is the union of / simple events of the sample space S. Hence, directly from the definition, P(E) is the sum of / probabilities, each equal to 1/n. But this sum is precisely f/n} so that our proof is complete.
If we call an outcome of the experiment "favorable" to E whenever E occurs, then Theorem 4.7 can be paraphrased as follows: If an experiment can result in n equally likely outcomes, then the probability of E is the ratio of the number of outcomes favorable to E to the total number of outcomes. This is the classic "definition" of probability
* The symbol [x] stands for the greatest integer less than or equal to the number x. Thus,
[3.6]  3, [£] = 0, RH = 5, etc.
99187
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


70 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
given by Laplace (17491827), one of the first and most important contributors to the mathematical theory of probability. Let us not forget that this rule for computing probabilities is applicable only when all simple events have been assigned the same probability. Thus, Formula (4.5) does not apply to the wide variety of important problems where it is not reasonable to make this special assignment of probabilities to simple events.
Ordinarily, to compute P(E) we must first determine which elements of the sample space are hi E, and then we add the probabilities of the corresponding simple events. But when Theorem 4.7 applies, we need only know how many elements are in E. It is therefore extremely useful to have effective techniques for counting the elements in sets specified by defining properties. In Example 3.6, for instance, the probability of the event that at least two people have the same birthday was easy to find because we had been able (in Example 2.1) to count the elements in this event by using the fundamental principle of counting. We discuss some other techniques for counting in the next chapter. Until then, our examples will be chosen so as to lead to events whose elements can be counted by explicit enumeration or by use of the fundamental principle.
We conclude this section with a brief discussion of the relation between the probability of an event and "odds" for the event.
Definition 4.1. Let E be any event. We say that odds for E are a to b if and only if
P(E) = ~
^ a + 6
If odds for E are a to &, then odds against E are 6 to a. Table 11 gives some common odds and corresponding probabilities.
TABLE 11
Odds for E P(E)
1 tol i
2 to 1 !
3 to 1 1
3 to 5 1
1 to 2 t
12 to 5 If
99188
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / SOME PROBABILITY THEOREMS 71
Given the odds for E we have only to apply Definition 4.1 to find the probability of E. On the other hand, if P(E) is given, we write it in the fractional form a/ (a + Z>) and then know that the odds for E are a to b. For example, if P(E) = 0.7, we first write
Hence, odds for E are 7 to 3. Since odds are often used to express probabilities of events, it is useful to be able to translate odds to probabilities and vice versa.
PROBLEMS
4.1* Two fair dice are rolled. Find the probability of the event E that the ^ dots on the two uppermost faces do not add to 4. What are odds for Et
4.2. A card is drawn at random from a standard deck of playing cards. Let E be the event "card selected is an ace" and F the event "card selected is a spade."
(a) Are E and F mutually exclusive events?
(b) Find the probability that at least one of the events E and F occurs.
(c) What are odds for the event E VJ F?
4.3. A fair die is rolled twice. What are odds for the event that at least one roll yields a number less than 3?
4.4. Odds a to b and c to d are said to be equal if a:b = c:d, i.e., if their ratios are equal. For example, odds of 10 to 5, 4 to 2, and 2 to 1 are equal.
(a) Show that if odds for two events are equal, then the events have equal probabilities.
(b) Show that odds against an event E are equal to odds for the complementary event E'.
4.5. Odds for event E are 2 to L Odds for E U F are 3 to 1. Consistent with this information, what are the smallest and largest possible values for the probability of event F?
4.6. A card is drawn at random from an ordinary deck of 52 cards. This card is replaced, and then another card is selected at random from the full deck.
(a) Define a suitable sample space for this experiment and assign probabilities to its simple events.
99189
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


72 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(b) Find the probability that at least one of the cards selected is the ace of spades.
(c) What are the odds for the event that neither card is the ace of spades?
4.7. Repeat the preceding problem, but now assume that the first card is not replaced before the second is drawn.
4.8. The output of a machine producing nails is known to contain 2% defectives, the other 98% meeting specifications. From the very large lot of nails produced by the machine, two nails are drawn at random and inspected.
(a) Define a suitable sample space for this experiment and make a reasonable assignment of probabilities to its simple events.
(b) Find the probability that at least one of the nails is defective.
4.9. A high school senior applies for admission to college A and college B. He estimates that the probability of being admitted to A is 0.7, that his application will be rejected at B with probability 0.5, and that the probability of at least one of his applications being rejected is 0.6. What is the probability that he will be admitted to at least one of the colleges?
4.10. If in Theorem 4.2 we make the hypothesis that E is a proper subset of F, i.e., that E C F but E ^ F, does it then follow that P(E) < P(F)?
4.11. (a) An integer is chosen at random from the first 20 positive integers.
What is the probability that the integer chosen is divisible by 6 or 8?
(b) An integer is chosen at random from the first 2000 positive integers. What is the probability that the integer chosen is divisible by 6 or 8?
(c) The result of Example 4.2 in the text together with the results of parts (a) and (b) should lead you to conjecture a general theorem of which these results are special cases. State such a theorem and try to prove it.
4.12. Prove that if E and F are any events, then
P(E C\ F) < P(E) < P(E \J F) < P(E) + P(F).
4.13. Let E and F be any two events. Suppose the numbers P(E), P(F), and P(E C\ F) are known. Find formulas in terms of these numbers for the following probabilities. In each case give a verbal description of the event whose probability you are finding.
(a) P(Er U *") (b) P(E' C\ Ff)
(c) P(E' U F) (d) P(E* H F)
(e) P(EC\Fr) (f) P((EC\F)')
99190
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / SOME PROBABILITY THEOREMS 73
4.14. Generalize Theorem 4.4 by showing that the probability of the occurrence of at least one among three events E\, E2) and E$ is given by
(4.6) P(E1 \J Ei VJ #8) = P(El) + P(E*) + P(E3)  P(El C\ E,)
 P(EI n EZ)  P(E, r\ E*) + P(EI r\EzC\ E$.
[Note: You will find a Venn diagram like the one in Figure 7 helpful in checking that the probability of each simple event making up EI \J Ez VJ E* is counted once and only once in the expression on the right in (4.6).]
4.15. From a standard deck we select one card at random. Use Formula
(4.6) to find the probability that the card is a spade, an honor card, or a deuce.
4.16. Use Formula (4.6) to find the probability that a number selected at random from the first 200 positive integers is divisible by 6 or 8 or 10.
4.17. We make a definition and then state a theorem. Use Formula (4.6) to prove the theorem.
Definition 4.2 Let k be any integer greater than 1. Events EI, E2, •••, Ek are said to be mutually exclusive in pairs if and only if all possible pairs of events from E\, E^ • • •, Ek are mutually exclusive, i.e., E1C\E1 = 0 for all i 7* j where i and j can assume the values 1,2, ...,&.
Theorem 4.8 If EI, E^ and EZ are mutually exclusive in pairs, then
(4.7) P(E, \JEz\J E,) = PC&) + P(E2) + P(E3).
4.18. Suppose we assume only that EI C\ E% C\ E% = 0. Show by example that (4.7) does not necessarily hold. (Cf. Problem L4.6b.)
4.19. Prove the following generalization of Theorem 4.8 by mathematical induction.
Theorem 4.9 Let k be any integer greater than 1 and suppose the events EI, E% •••, Ek are mutually exclusive in pairs. Then
(4.8) P(El \J E* U    U Ek) = P(#i) + P(Et) + • • • + P(#*)
4.20. Modify the hypothesis in Theorem 4.9 so that EI, E% • • , Ek are any events, not necessarily mutually exclusive in pairs. With this weaker hypothesis, prove the following weaker result:
P(Et U Ei U    \J Ek) < P(E1) + P(Ei) + •• + P(E$.
4.21. (a) Find the probability of at least one match when using a deck of
three cards. (Cf. Problem 3.9.)
(b) Find the probability of at least one match using four numbered squares and four cards. (First define a sample space and make an acceptable assignment of probabilities to its simple events.)
99191
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


74 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(c) We want to find a formula for the probability of at least one match when using N squares and a deck of N cards (numbered 1, 2,  • •, N), where N is any positive integer. Define a suitable sample space for this experiment, determine the number of elements in this sample space, and then make an acceptable assignment of probabilities to its simple events. Xote that we could find the probability of at least one match if we had a formula for P(Ei \J E$ \J •" VJ EN), where E2 denotes the event that a match occurs at card number j. Can you guess this formula by detecting a pattern in Formulas (4.2) and (4.6)? If not, then first use (4.2) and (4.6) to derive a formula for the special case N — 4, and then try guessing again. The proof of the correct general formula and its use to find the probability of at least one match require counting techniques that we have not yet discussed.* But even when we can't complete a problem, it is useful to think about it and try to see what we need to learn in order to be able to complete it. This problem is the famous problem of rencontre hi probability theory and was originally discussed by the French mathematician Montmort (16781719).
5* Conditional probability and compound experiments
Suppose an experiment is performed and we are interested in the probability of some event E. But now assume that we are given additional information, namely, that another event F has occurred. In this section, we discuss how the computation of the probability of E is affected by the information that F is known to have occurred.
It is helpful first to take a close look at an example in which we can find reasonable answers on intuitive grounds. The methods we employ in this simple example will lead us to formulate precise definitions that will become part of our mathematical theory.
Example 5.1. A club with five male and five female charter members elects two women and three men to membership. From the total of 15 members, one person is selected at random. We are interested in two events:
E = person selected is a male,
F = person selected is a charter member.
* See W. Feller, An Introduction to Probability Theory and Its Applications? 2nd edition, John Wiley and Sons, Inc., 1957, pp. 8891. For another solution and interesting historical comments, see I. Todhunter, A History of the Mathematical Theory of Probability, Chelsea Publishing Co., 1949, pp. 9193.
99192
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 75
As sample space we take a set S of 15 elements, one for each club member. Since the selection is "at random'7 we assign probability Yg to each simple event of S. Observing that E is the union of eight simple events, F the union of ten simple events, and E C\ F the union of five simple events, we calculate
P(E) = A, P(F)  « = f, P(E O F) = A = i
So far we have nothing new. But now suppose we are informed that the person selected is a charter member. What is the probability of E, now that this fact about the outcome of the experiment has been made known to us? Most people quickly answer that the revised probability of E should be f^. They reason as follows: Since F is known to have occurred, we know that one of the ten charter members was selected. The event E occurs if one of the five male charter members is selected. Because the selection is at random, the probability of selecting one of the five males from the ten charter members is f$. If we introduce the symbol P(E\F) to denote this revised or conditional probability of E given F, then
P(J5)=* and P(E\F)=^
Thus, in this example the probability of E decreases due to the added information that event F has occurred.
This informal and intuitive reasoning can be described in another way. Ordinarily, given a sample space S and an acceptable assignment of probabilities to the simple events of S, we compute the probability of an event E by adding the probabilities of the simple events whose union is E. Since P(S) = 1 and E H S = E, we can write the identity
(5.1) PC*) =
which shows that P(E) is the ratio of the probability of that part of E included in S (which happens to be all of E) to the probability of S itself (which happens to be 1).
But if we are told that event F has occurred, then the outcomes corresponding to elements of F7, the complement of F, are no longer possible. Hence, in the light of our added information about the outcome of the experiment, the event F replaces the sample space S as the set whose elements correspond to all possible outcomes of the
99193
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


76 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
experiment. With this in mind, observe how reasonable it appears to write, in analogy with (5.1),
(5.2)
which says that P(E\F), the conditional probability of E given F, is the ratio of the probability of that part of E included in F (which is E n F) to the probability of F itself.
Applied to the problem in Example 5.1, this ratio is
P(EJF) = = i'
as before.
Formula (5.2) is the basis of our formal definition of conditional probability.
Definition 5.1. Let E and F be two events of a sample space S. Suppose an acceptable assignment of probabilities has been made to the simple events of S in such a way that P(F) > 0. Then the conditional probability of E given F, denoted by P(EF), is defined by Equation (5.2). The conditional probability of E given F is undefined if P(F) = 0.
Formulas (5.1) and (5.2) show that the role of F in computing P(E\F) is analogous to the role of S in computing P(E). It is helpful to carry this analogy further. When we are told that F has occurred, then F can be considered as a new sample space, since all possible outcomes of the experiment must now correspond to elements of F. Then we must be sure to have the probabilities of the simple events of F add to 1, as they must for any sample space. But they actually add to P(F). If P(F) = 1, then no changes are required. However, if P(F) < 1, we imagine the probabilities of all simple events of F increased proportionately by dividing each by the same number P(F). We thus obtain new probabilities for the simple events of F. In view of the relation between original and new probabilities of simple events of F, Formula (5.2) can be paraphrased as follows: The conditional probability of E given F is the sum of the new probabilities of those simple events whose union is the event E H F, i.e., whose union is the part of E included in the new sample space F.
Thus P(E\F) is simply a probability calculated for events considered as subsets of the new sample space F. It follows that the formulas we proved in Section 4 for probabilities relative to the
99194
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 77
sample space S apply without modification to conditional probabilities relative to the information that a fixed event F has occurred (See Problem 5.16.)
Example 5.2. Three fair coins are tossed, one after the other. Let E be the event "at least two heads" and F the event "first coin falls heads." We define the usual sample space containing the eight outcomes HHH, HHT, * • • , TTT and assign each simple event the probability . Then E is the union of four simple events, F the union of four simple events, and E C\ F the union of three simple events. Hence P(E) = P(F) = £ and P(E Pi F) = f . Thus, the conditional probability of E given F is
As expected, the added knowledge that the first coin falls heads increases the probability of getting at least two heads. Before this additional information is revealed, P(E) = \. Afterwards, P(E\F) = f .
Example 5.3. A person is selected at random from among 321 union men whose opinions were reported in Table 4 on p. 24. Let E denote "man answers yes" and F "man is in union less than one year." Then we compute (as in Problem 3.6),
p&) = m, pw = ^, P(E n F) = jfe.
Therefore,
P(E\F) = jjg = it
Note that P(E\F) < P(E) ; i.e., the knowledge that the man is in the union less than one year decreases the probability that he answers "yes."
Example 5.4. A card is selected at random from a standard deck. Let E denote "card is a spade" and F "card is an ace." Then
and
P(E\F) = ^ = I'
T¥
Here we have a case in which P(E) and P(E\F) are equal: the knowledge that the card is an ace does not change the probability that it is a spade.
99195
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


78
PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
The important but special case when P(E) and P(E\F) are equal, as in Example 5.4, will be discussed in Section 7 where we introduce the concept of independent events. In the remainder of this section, we consider some consequences of Definition 5.1, as well as an application of conditional probabilities to socalled compound or composite experiments.
If P(E) > 0, the roles of E and F in (5.2) can be interchanged. Then the conditional probability of F given E is
P(F n E) P(E n F)
(5.3)
P(F\E) =
P(E) " P(E) the last equality following from the commutative law for the intersection of two sets.
By solving (5.2) and (5.3) for P(E Pi F), we obtain the following result, sometimes referred to as the theorem on compound probabilities: (5.4) P(E n F} = P(E)P(F\E) = P(F)P(E\F).
Formula (5.4) finds extensive use when we compute probabilities for events defined in terms of a compound experiment. For example, the experiment in which we toss a coin, toss it again and then toss it a third time is an example of a compound experiment with three trials. If we have two urns containing colored balls and we choose an urn and then a ball from that urn, we have performed a compound experiment with two trials. Many experiments are most conveniently described as a compounding of two or more trials: first, something is done (trial number 1); then, after the first trial is completed, something else is done (trial number 2); etc. An example will best serve to illustrate the use of conditional probabilities in such compound experiments. ^ , Green
Green 2
Example 5.5. Urn I contains three green and five red balls. Urn II contains two green, one red, and two yellow balls. We select an urn at random and then draw one ball
at random from that urn. What is the probability that we obtain a green ball?
The data of the problem are conveniently summarized in the tree diagram of Figure 12. Since the urn is selected at random, we write
Red
Yellow
99196
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY
79
probability \ on each branch leading from the starting point to an outcome of the first trial (number of the urn). We are also given conditional probabilities of drawing a ball of a specified color, given the urn selected. These conditional probabilities appear on each branch leading from an urn to an outcome of the second trial (color of ball). We give two solutions to the problem posed in this example.
Solution L The event "green ball selected" can occur in one of these two mutually exclusive ways: (1) select urn I and draw a green ball, or (2) select urn II and draw a green ball. Hence the event "green ball selected" is the union of the mutually exclusive events described in (1) and (2). By Formula (4.3), we obtain (with obvious shorthand notation for events),
P(green) = P(urn I and green) + P(urn II and green).
Each of the terms on the right is the probability of an intersection of two events. Applying Formula (5.4),
P(green) = P(urn I)P(greenurn I) + P(urn II)P(greenlurn II), and using the data summarized in Figure 12, (5.5) P(green) = (i)(f) + ()(f) = ft.
Solution 2. We go back to first principles. Let us define as sample space for this compound experiment the set
S = {Ig, Ir, Ilg, Ilr, Ily}
whose elements are ordered pairs denoting the outcomes of the two trials making up the experiment. Thus, Ig denotes the outcome for which urn I is selected and then the green ball drawn, etc.
Each of the five simple events of S corresponds to one path from left to right through the tree in Figure 12. We assign to each simple event the probability given by the product of the numbers appearing
TABLE 12 ,
Simple Event of S Probability
!lg} (i)() = A
fir} (l)(i)=A
!Hg! (i)(f) = i
{Ilr} (l)(iJ = A
illy} (»(« = *
99197
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


80 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
in the tree along the path to which that simple event corresponds. Using this rule, we obtain the probabilities listed in Table 12 and appearing at the end of each path through the tree in Figure 12. Let us note that this is an acceptable assignment of probabilities to the simple events of S: each probability is nonnegative and their sum is 1.
Now the event "green ball selected" is the union of the two simple events {Ig} and {Hg}. Hence
(5.6) P(green) = & + * = I*,
as in Solution 1.
The reader may object that in Solution 1 we have violated our rule requiring the designation of a sample space and an assignment of probabilities to its simple events before probabilities can be computed. Strictly speaking, this claim is correct. But by comparing (5.5) and (5.6) we observe that the sample space and assignment of probabilities in Solution 2 were implicit in Solution 1. Indeed, let us agree that a compound experiment of n trials will always have as sample space the set S of ordered ?ituples denoting possible outcomes of the experiment. If we are given enough data (in the form of certain probabilities and conditional probabilities) to fill in a tree diagram like the one in Figure 12, then an acceptable assignment of probabilities to simple events of S is made as in Solution 2: Each simple event corresponds to one path through the tree, and the product of the numbers appearing on the branches of a path is the probability assigned to its corresponding simple event. It can be shown (see Problem 5.17) that the resulting assignment of probabilities to the simple events of S is not only acceptable, but is the only assignment consistent with the data of the problem.
Solution 1 is shorter, more direct, and easier than Solution 2. It is typical of many problems involving compound experiments that we choose to compute unknown probabilities by using the data of the problem directly, and thus bypass the explicit construction of a sample space and assignment of probabilities to its simple events. We shall adopt the shorter direct solution from now on, but with the knowledge that we could, if called upon to do so, go back to first principles and complete the longer, less direct solution that underlies the shorter procedure.
The theorem on compound probabilities, as expressed in (5.4), is a
99198
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 81
special case of an extremely useful formula which we now prove.
Theorem 5.1. If n is any integer (n > 2) and EI, E%, •••, En are any n events for which P(EiC\E^r\ • • • O Eni) ^ 0, then
(5.7) P(EI n E, n    n #») = P(EI}P(E,\EI}P(E,\EI n E$
•  • p(£n#i n a n • • • n ^0.
Proof. Denote by Sn the statement expressed by Equation (5.7) and let Ar denote the set of those integers n for which Sn is true. We use the method of mathematical induction to prove that N is the set of all integers greater than 1.
(i) 2 e N. For S2 is the statement
P(El 0 #2) = P(E1)P(E,\E1).
That 82 is true follows from the definition of P(Ez\Ei). Note that our hypothesis reduces to P(Ei) ^ 0 when n = 2, so that the conditional probability is defined.
(ii) Now assume k e N7 where k is any integer greater than 1. We want to prove that also (k + 1) e N. But & is the statement
(5.8) P(EI n E, n  •  n #*)
= P(EI}P(E,\EI) • • • pcai^ n #2 n • • • n &i).
We verify that by the definition of conditional probability (and using properties of set intersection),
Pi Ei n • • • Pi ff*+0 P/ET 11? ^ CT ^ r\ ^?^
^nn^) = p(&+1^1 n ^ n • • • n ^,).
Multiplying corresponding sides of Equations (5.8) and (5.9) yields
P(EI n E* n • • • n &+0
= P(EI)P(E,\EI) .   P(EW\EI n ^ n • • • n Ek\
which is precisely the statement &+i. Hence we have shown that tikeN, then (k + 1) e Ar for every k > 2.
We conclude from (i) and (ii) that N is the set of all integers greater than or equal to 2, and thus Theorem 5.1 is proved.
Example 5.6. You are told that an urn contains x red and 5 — x green balls, but the value of x (which can be 0, 1, 2, 3, 4, or 5) is not disclosed to you. Mr. Y is to draw a ball at random from the five balls in the urn, and you must guess the color of the ball he draws. We shall say that you win if you guess correctly and lose otherwise.
99199
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


82
PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Let us consider each of the following strategies for making your
guess.
Strategy 1. Guess that Y will draw a red ball.
Strategy 2. Guess that Y will draw a green ball.
Strategy 3. First draw a ball from the urn. If it is red, then guess Y will draw a red ball. If it is green, then guess Y will draw a green ball.
Strategy 4 Draw a ball from the urn and replace it. Then draw another ball and replace it. If both balls are red, then guess Y will draw a red ball. If both balls are green, then guess Y will draw a green ball. If yon draw one red and one green ball, then draw one more ball from the urn. If this ball is red, then guess Y will draw a red ball. If it is green, then guess Y will draw a green ball.
Strategy 5. Same as Strategy 4, except that the first ball is not replaced before the second is drawn. Also, if a third draw is required, it is done without replacing the first two balls.
We are interested in calculating the probability that you win, i.e., your guess is correct. This probability will depend on the unknown value of x (which determines the composition of the urn) and on the strategy you decide to use. For example, if you choose Strategy 1, then you guess red. Y draws a red ball from the urn with probability x/5. Putting x = 0, 1, 2, 3, 4, 5 in turn, we get the probabilities of winning listed in Table 13 in the column headed Strategy 1. In that
TABLE 13
Number of Red Balls in Urn x Probability of Winning with Strategy
1 2 3 4 5
0 0 1 1 1 1
1 .20 .80 .68 .74 .80
2 .40 .60 .52 .53 .54
3 .60 .40 .52 .53 .54
4 .80 .20 .68 .74 .80
5 I 0 1 1 1
table, we list the probability of winning for each possible composition of the urn and for each of the five available strategies. To see how these probabilities are calculated, consider a few examples.
991100
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 8 3
(a) Suppose x = 2 and you adopt Strategy 3. Then you win whenever you and Y both draw red balls or both draw green balls. In order to simplify the notation, let us write RI to denote the event that the first ball you draw is red, R? the event that Y draws a red ball, (?2 the event that the second ball you draw is green, etc. Since the events RI O #Y and
991101
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


64 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
for Strategy 5 is much more complicated, and is drawn in Figure 14, Here too, we have written next to each Dranch the appropriate probabilities for the case x — 2. We note that there are now six mutually exclusive ways of winning. For example, you win if the experiment
I
R (guess R)
t
First bail
\ \
Second Your ball guess
t t
Third Your ball guess
Figure 14
(win) .04
(lose) .06
(win) .04
(lose) .06
(lose) .08
(win) .12
(win) .04
(lose) .06
(lose) .08
(win) .12
(lose) 12
G (win) .18
t t t
Y's Outcome Probability ball
results in the event Ri C\ G% Pi Rs H RY in which you first draw a red, then a green, then another red (and thus guess red) and then Y draws a red ball. This event corresponds to the third path through the tree in Figure 14. We find the probability of this event by applying Formula (5.7) with n = 4.
(5.10) F(BI n ft n
But again
n BY)
Pi
n G, n ft) = F(BT) = A =
Given that you drew a red ball first, since you do not replace this ball, there remain four balls in the urn, of which three are green. Hence
= .75.
991102
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec, 5 / CONDITIONAL PROBABILITY 85
If you get a red and a green, then the third ball is selected from an urn containing one red and two green balls. Hence
P(B8Bi H #2) = I = .33, approximately. Putting these values in (5.10) we find
P(RI n G, n #3 n BO = (.4) (.75) (.33) (.4) = .04.
This probability appears at the end of the third path through the tree in Figure 14 and is just the product of the probabilities for the branches of that path. Adding the probabilities of all events (paths) for which you win, we find that when x = 2 and you use Strategy 5 your probability of winning is .54. This number appears in the appropriate row and column in Table 13. In this way all the entries in Table 13 are computed.
All other things being equal, you prefer the strategy which gives you the highest probability of winning. Thus, referring to Table 13, since the probabilities in Column 4 are at least as large as those in Column 3 for all possible compositions of the urn, you prefer Strategy 4 to Strategy 3. For the same reason, Strategy 5 is preferred to Strategy 4. However, if for some reason, you are sure that the um contains more red than green balls (i.e., x = 3, 4, or 5), then you might reasonably prefer Strategy 1 over any of the other strategies. A complete analysis is not possible here, but it is clear that the strategy you prefer will depend upon a number of factors that we have omitted from our discussion. For example, there may be a prize for winning and a penalty for losing. You may have to pay for the information you get by drawing one or more balls from the urn before making your guess. Thus, Strategy 3 may cost you more than Strategy 1 or 2, and Strategies 4 and 5 may cost still more than Strategy 3. The strategy you prefer will depend upon all of these factors, as well as on your belief about the composition of the urn. References 2 and 10 in the supplementary reading list at the end of this chapter may be consulted for discussions of how to evaluate strategies and why this is of great importance in statistical decision theory.
PROBLEMS
5.1. A green and red die are rolled.
(a) Find the conditional probability of obtaining a sum greater than 10, given that the red die resulted in a 5.
991103
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


86 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(b) Find the conditional probability of obtaining a sum less than 6, given that the red die resulted in a 2.
(c) Find the conditional probability of obtaining sum 7, given that the red die resulted in a number less than 4.
(d) In parts (a)(c), how does the given information affect the prob ability of the event in question?
5.2. A fair coin is tossed three successive times. Find the odds for obtaining three heads. How do the odds change if it is given that the second toss resulted in a head?
5.3. Three indistinguishable objects are distributed in three cells. Find the conditional probability that all three occupy the same cell, given that at least two of them are in the same cell.
5.4. A committee of three is selected from six people A, B, C, D, E, and F. [Cf. Problem 3.3.] Find the conditional probability of A and B being selected, given that neither C nor D were selected.
5.5. Two people are selected (one after the other) at random from the 321 union men whose opinions are recorded in Table 4 on page 24. Find the probability that both men answered "yes."
5.6. Students in a summer school program took two courses: Chemistry and History. The registrar reports that 4 percent failed Chemistry, 3 percent failed History, and 1 percent failed both Chemistry and History.
(a) What percentage passed Chemistry and failed History?
(b) Among those who failed Chemistry, what percentage also failed History?
(c) Among those who failed History, what percentage also failed Chemistry?
5.7. (a) A fair coin is tossed three successive times.
(i) Find the probability that the third toss results in heads, (ii) Find the conditional probability that the third toss results ij> heads, given that the first two tosses result in heads.
(b) A fair com is tossed N successive times, where N is a positive integer.
(i) Define a suitable sample space and make the appropriate assignment of probabilities to its simple events.
(ii) Find the probability that the Nth toss results in heads.
(iii) Find the conditional probability that the Nth toss results in heads, given that all preceding tosses result in heads. (Question: Does the coin have a memory?)
991104
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 87
5.8. The manager of a retail grocery store advertises the following promotional scheme in the newspaper. During a specified week, each customer purchasing at least $10 worth of groceries at one time will receive a numbered ticket. At the same time, the cashier places a duplicate ticket in a large bowl. Tickets are numbered serially, starting with number 1. At the end of a week, the tickets in the bowl are thoroughly stirred, one ticket is chosen, and its number determines the winner of a previously announced cash prize. Let us suppose that 200 tickets are distributed during the week.
(a) What is the probability that the first digit of the winning number isl?
(b) What is the conditional probability that the first digit of the winning number is 1, given that the winning number is greater than 100?
(c) What is the probability that the first digit of the winning number is 9?
(d) What is the probability that the first digit of the winning number is 9, if it is known that the winning number is greater than 100?
(e) Suppose the number of tickets distributed is a positive integer, say N. We want each of the nine digits (1, 2, 3, • • •, 9) to have the same probability of being the first digit of the winning number. What are all the possible values of JV?
5.9. Two defective radio tubes get mixed up with two good ones. You start testing the tubes, one by one, until you have discovered both defectives.
(a) Construct a tree diagram for this experiment.
(b) What is the probability that the second defective tube will be the second tested? the third tested? the fourth tested? What is the sum of the three probabilities you computed? Is this a sum that could have been expected before doing the computations?
5.10. An urn contains g green and r red balls. One ball is drawn at random. It is replaced and c more balls of the same color are added to the um, where c is some positive integer. Another ball is drawn at random from the urn and this ball, together with c more of the same color are again added to the urn. This procedure can be repeated any number of times and supplies a model (first studied by G. Polya) in which the drawing of a ball of either color increases the probability of the same color in the next drawing. (Polya drew the analogy with contagious diseases, where each case of a disease increases the probability of further cases.)
(a) Find the conditional probability that the second ball is red, given that the first ball is red.
991105
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


88 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(b) Find the probability that the first three drawings all result in red balls.
(c) Find the conditional probability that the first ball is red, given that the second ball is red.
5.11. A drawer contains four black, six brown, and two blue socks. Two socks are taken at random from the drawer, one after the other. What is the probability that both socks will be of the same color?
5.12. Use Formula (5.7) to find the probability that r people selected at random will all have different birthdays. Then find the probability that at least two people among the r will have the same birthday, and compare with the answer in (3.6).
5.13. Refer to Table 14, which is a fragment from the American Men Mortality Table published In 1918 by the Actuarial Society of America.
TABLE 14
Rate of Mortality Per 1000
Duration of Policy in Years
Age at Issue of Policy
0 1 2 3 4 5
20 2.73 3.59 3.80 3.96 4.13 4.31
21 2.78 3.66 3.S6 4.01 4.18 4.35
22 2.83 3.72 3.91 4.06 4.21 4.39
[For example, the entry 3.96 at age 20, duration of policy 3 years, means that 0.00396 is the probability that a person now aged 23 who was issued insurance at age 20 will die before attaining age 24. Similarly, the entry 3.91 at age 22, duration 2 years, means that 0.00391 is the probability that a person now aged 24 who was issued insurance at age 22 will die before attaining age 25.] Calculate the probability that a man now aged 21 who was issued insurance a year ago will die
(a) between ages 21 and 22, (b) between ages 22 and 23, (c) between ages 23 and 24.
5.14. (a) Let n be a positive integer and define npx as the probability that a person aged x years will survive n years. Put px = ipx and show that
nPx = Pxpx+l • • • Px+nl*
(b) Let Zo be an arbitrary positive integer (an observed number of newborn babies) and define for all x > 0,
lx = k(xpo), dx = lx — Zx+i.
991106
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / CONDITIONAL PROBABILITY 89
Give interpretations for lx and dx and show that
(e) Let qx = 1 — p«. Show that
„ d,
q* = r
tat
The probability qx is known in actuarial mathematics as the rate of mortality at age x.
5.15. Let E and F be events and suppose P(F) is neither 0 nor 1. Show that
if P(E\F) > P(E), then P(E\F') < P(E). State why this result is intuitively reasonable.
5.16. Prove the following laws, in each case assuming the conditional probabilities are defined.
(a) P(F\F) = 1.
(b) P(0\F) = 0.
(c) If Ei C E2, then P(El\F) < P(E«\F).
(d) P(E/\F) = 1  P(E\F).
(e) P(Ei U E.\F) = P(El\F} + P(E.\F)  P(J& A £2F).
(f)
(g) If P(F) = 1, then P(E\F) (h) If P(F) > 0 and ^ and F are mutually exclusive events, then P(E\F) = 0.
5.17. In Table 12, let the probabilities of the five simple events be a, 6, c, d, and e. We know that the sum of these numbers must be 1. Also, these numbers must be consistent with the probabilities associated with the branches of the tree in Figure 12 since these probabilities are given in the statement of Example 5.5. Show that a, 6, c, d, e must have the values given in Table 12.
5.18. Refer to Example 5.6 of the text.
(a) By drawing tree diagrams or otherwise, verify the probabilities given in Table 13.
(b) The tree diagram for Strategy 5 is drawn in Figure 14. Note that the tree diagram for Strategy 4 is the same diagram. But the probabilities associated with the branches of the tree are not the same. For the case x = 2, what are the branch probabilities for Strategy 4?
991107
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


90 PROBABILITY SN FINITE SAMPLE SPACES / Chap. 2
(c) Suppose you adopt the following strategy: You draw two balls from the urn, not replacing the first before drawing the second. If you get two red, then you guess Y will draw a red ball. Otherwise you guess Y will draw a green ball. Draw a tree diagram for this strategy and calculate the probability that you win for each possible composition of the urn.
5.19. A seller of rebuilt television tubes and a buyer get together to draw up a contract. The seller will supply tubes in lots of 100 tubes. The buyer, when a lot is offered to him, wants to protect himself against the possibility that the lot contains too many defective tubes. The contract therefore provides that out of each lot, two tubes will be selected at random and tested. The buyer considers the following alternative plans as guides for making his decision in the light of the experimental evidence.
Plan 1 If both of the tubes tested are satisfactory, then accept the whole lot. Otherwise, reject the lot.
Plan % If both of the tubes tested are defective, then reject the whole lot. Otherwise, accept the lot.
Plan 8 If both of the tubes tested are satisfactory, then accept the lot. If both are defective, then reject the lot. If one of the tubes is satisfactory and the other defective, then select a third tube at random from the remaining 98 tubes in the lot and accept or reject the lot according as this tube is satisfactory or defective.
Denote by E the event that the buyer accepts the lot and let x be the (unknown) number of defective tubes in a lot offered to the buyer. Clearly, P(E) depends upon both the plan adopted by the buyer and the quality of the lot as determined by the value of x.
(a) Obtain a formula for P(E) in terms of x for each of the three plans.
(b) Substitute the values x = 0, 5, 10, 20, 30, 40, 50 in the formulas obtained in (a) and make three graphs plotting (for each plan) the value of x along the horizontal axis and the value of P(E) along the vertical axis. A graph of this kind is called an operating characteristic curve (OCcurve) for a plan.
(c) Which rule is most favorable to the buyer? to the seller? (Note: these questions become more interesting and more difficult when such things as the utilities resulting from the desirable actions of accepting good lots and rejecting poor lots, the disutilities from the undesirable actions of accepting poor lots and rejecting good lots, and the costs of testing tubes are brought into the analysis.)
991108
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMULA 9t
6. Hayes' formula
In this section, as an application of conditional probabilities, we derive a famous formula first used by Thomas Bayes in a paper published posthumously in 1763. To prepare the way, we make a definition and prove a preliminary result.
Definition 6.1. A partition of a set E is a set (ffi, E2,  * *, En} with the following properties:
(i) ES^E o* = 1,2, •••,*)
(ii) Ejr]Ek = 0 (j = 1,2, ••,»;*=: 1,2, ••;n;J7*k)
(in) #! U E* U • • • U En = E
In words, a partition of a set E is a set of subsets of E [property (i)} that are disjoint [property (ii)] and exhaustive [property (iii)]. Every element of E is a member of one and only one of the subsets in the partition.
We are already acquainted with partitions of sets. Two complementary events F and Fr form the partition {F, F1} of the sample space S. For F and Fr are certainly subsets of /S, and we have F n F' = 0 and F U F' = S, as required by Definition 6.1.
From a Venn diagram, wre see immediately that {E O F', E C\ F} is a partition of the set E, {E H f", E C\ F, Er C\ F} is a partition of the set E U F, and {E C\ F', E C\ F} E' O F, Er O F7} is a partition of the entire sample space S.
Two more examples of partitions should suffice to make the notion clear. In the sample space S of 52 elements, each denoting one outcome of the experiment in which a card is selected from a standard deck, let Es, EH, E&, and ECJ denote the events that the card selected is a spade, heart, diamond, and club respectively. Then {E8y Eh, Ed, EC} is a partition of S, since the four subsets are clearly mutually exclusive in pairs and exhaustive. Another partition of S is the set of all 52 simple events of the sample space S.
Theorem 6.1. Let {Ei, E%, *—,En} be a partition of the sample space S, and suppose each of the events E^ Ez, • • •, En has nonzero probability. Let E be any event. Then
P(E) = P(E1)P(E\E1} + P(E$P(E\EZ) +   or, using the summation symbol,
(6.1) P(E) = 2 P(Ej)P(E\Ei}.
991109
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


92 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Proof. From the hypothesis that {Ei, Ez, •  • , En} is a partition of S, it can be shown (see Problem 6.13) that {E C\ EI, E H #2,   •, JE? pi E*} is a partition of the event E. Hence
E = (E n #0 u (E n £2) u • • • u (# n # »)
expresses 2? as the union of n mutually exclusive events. Applying Formula (4.8) in Problem 4.19 yields the equation
(6.2) P(E) = P(E 0 El] + P(E 0 E$ + •   + P(E H #n).
But, directly from the definition of conditional probability, we have forj = 1,2, ,71,
Making this substitution in (6.2) proves the theorem. Note that we have guaranteed the existence of the conditional probabilities in (6.1) by our assumption that the events EI, Ez} • • *, En do not have zero probability.
As the following examples show, Formula (6.1) is useful because an evaluation of the probabilities P(Ej) and conditional probabilities P(E\Ej) is often easier than a direct calculation of P(E).
Example 6.1 . Freshmen account for 30%, sophomores 25%, juniors 25%, and seniors 20% of the members of a college fraternity. Fifty percent of the freshmen, 30% of the sophomores, 10% of the juniors, and 2% of the seniors are enrolled in a mathematics course. A member of the fraternity being chosen at random, what is the probability that he is enrolled in a mathematics course?
We let E denote "member selected is enrolled in a mathematics course," and EI, E% E$, and E± denote the events that the member selected is a freshman, sophomore, junior, and senior respectively. Then {Ei, EZj E^ E±} is a partition of the sample space S consisting of all ordered pairs the first object of which identifies the class and the second object the presence or absence of the student in a mathematics course. In fact, the data of the problem specify all the probabilities needed to apply Formula (6.1) with n = 4. We find
P(E) = (.30) (.50) + (.25) (.30) + (.25) (.10) + (.20) (.02) = .254,
or slightly more than 25 percent of the fraternity are enrolled in a mathematics course.
991110
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMULA 93
Example 6.2. Find the probability that in a wellshuffled deck of cards, the ace of spades is next to the king of spades. Here, as in the preceding example, it seems sensible to break the problem up into cases, i.e., to consider first the event in which the ace of spades is the top card of the deck, then the event that it is the bottom card of the deck, and finally the remaining event in which the ace of spades is somewhere within the deck. Let EI, Ez, and Ez denote these events in the order stated. We choose as sample space S the set of ordered 52tuples denoting all possible orderings of the 52card deck. Then {Ei, E% Ez} is a partition of S. If E denotes the event that the ace and king of spades are neighboring cards, then noting that only one card is next to the top or bottom cards but two cards are next to a card within the deck, we find
P(El) = PC&) = A, P(tf.) = H = ft ,
P(E\El) = P(E\Ej = A, P(W = A.
Hence, by Formula (6.1) with n = 3,
P(E) = (A) (A) + (A) (A) + («)(&) = A
From Theorem 6.1 it is only a short step to Bayes7 formula.
Theorem 6.2. Let {Ei, Ez, • • , En} be a partition of the sample space Sj and suppose each of the events EI, Ez, * * • , En has nonzero probability. Let E be any event for which P(E) > 0. Then for each integer k (1 < k < n), we have Bayes' formula:
(6.3)
S P(Ej)P(E\Ej) i=i
Proof. Applying the definition of conditional probability twice, we find
P(E 0 &) P(Ek}P(E\Ek)
The theorem is proved by rewriting P(E} according to Formula (6.1). The following example illustrates the use of Bayes' formula.
Example 6.3. Suppose that the reliability of a chest Xray test for the detection of tuberculosis is specified as follows: of people with tuberculosis, 90% of the Xray examinations detect the disease but 10% go undetected. Of people free of tuberculosis, 99% of the X rays
991111
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


94 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
.are judged free of the disease, but 1% are diagnosed as showing tuberculosis. From a large population of which only 0.1% have tuberculosis, one person is selected at random, given a chest X ray, and the radiologist reports the presence of tuberculosis. What is the probability that the person has tuberculosis?
We let EI denote the event that the person selected has tuberculosis, and E the event that the person's X ray is diagnosed as positive, i.e., as showing tuberculosis. We seek P(Ei\E). Now {Ei, E{} is a partition of the sample space of all people in the population. We are given the following probabilities:
P(E1) = .001, P(EQ = .999, P(E\E1) = .9, P(E\EQ = .01. Prom Bayes' formula, we find
P(El}P(E\El}
+ P(E'i)P(E\E[)
Note that although the Xray test is fairly reliable, we have found that only slightly more than 8% of those with positive X rays turn out to have tuberculosis. The results of such calculations must be taken into account when largescale medical diagnostic tests are planned.
We note here the terminology often used when Bayes' formula is .applied. The events EI, J?2, • • •, En are called hypotheses, and they are assumed to be disjoint and exhaustive. The probability P(Ek) is called the a priori probability of hypothesis Ek. The conditional probability P(Ek\E) is called the a posteriori probability of the hypothesis Et, given the observed event E. Thus, in Example 6.3, the events EI (person has tuberculosis) and E{ (person does not have tuberculosis) &re the hypotheses. The a priori probability of a person having tu
JS (positive Xray) .00090
JE'(negative Xray) .00010 E (positive Xray) .00999
^'(negative Xray) .98901 Figure 15
991112
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMULA
95
berculosis is P(Ei) = .001. But the a posteriori probability of a person having tuberculosis, given that his X ray is positive, is P(El\E} = .083.
In Figure 15, we have the tree diagram for Example 6.3 in which a person is first classified according to whether or not he has tuberculosis and then according to whether his X ray is positive or negative. Probabilities given in the data of the example are written on the appropriate branches of the tree. To the right of each of the four possible paths from left to right through the tree we have written the probability associated with that path. For example,
P(has TB and X ray positive) = P(has TB)P(X ray positivelhas TB)
= (.001) (.9) = .0009
is the probability for the topmost path through the tree in Figure 15. These path probabilities can also be recorded in tabular form, as in Table 15, where the entry in each cell is the probability of the intersection of the events given by the row and column in which the cell appears. If we add the entries in any row or column, then by (6.2), we obtain the probability of the event defining that row or column. These probabilities appear in the margins of the table.
TABLE 15
E (X ray positive) E' (X ray negative)
Ei (has TB) P(Ei (I E) .00090 P(EI n W} .00010 P(EJ .001
Ei (no TB) F(E'i n E) .00999 P(E{ n W) .98901 PGBi) .999
p(& .01089 P(E') .98911 Total 1
From the entries in Table 15, we can derive all possible conditional probabilities. In particular,
P(Ei n E) 00090 P(El\E) = (p(E} = ~Qgg = .083, approximately,
in agreement with (6.4).
Computing probabilities from the Table or by Bayes' formula, we can construct the tree diagram in Figure 16 which differs from the
991113
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


36 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
diagram in Figure 15 because the order of events has been reversed. In Figure 16, we think of a person first being classified according to whether his X ray is positive or negative and then according to whether or not he has tuberculosis. (Probabilities in Figure 16 are
,0109^ E (P°slt5ve x'ray)
.0009
E[ (no TB) .0100 ^(hasTB) .0001
.9890
.9891^' (negative Xray)
Figure 16
rounded to four decimal place accuracy.) Using the language associated with Bayes7 formula, we have in Figure 15 the conditional probabilities of the possible observed events given the various hypotheses, whereas in Figure 16 we have conditional probabilities of the possible hypotheses given the various observed events.
We conclude with two more examples in which Bayes' formula proves useful.
Example 6.4. Three urns contain colored balls as specified in Table
TABLE 16
Urn Red White Blue
1 3 4 1
2 1 2 3
3 4 3 2
16. One urn is chosen at random, and a ball is withdrawn. It happens to be red. What is the probability that it came from urn 2?
WTe let E denote the event "ball selected is red." To account for the occurrence of E we have three hypotheses: EI (urn 1 selected), E» (urn 2 selected), E$ (urn 3 selected). Since the urn is chosen at random,
We also are given the conditional probabilities,
991114
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMUU\ 97
Since {Ei, E% Es} is a partition of the sample space for this compound experiment, Bayes' formula is applicable. Putting k = 2, n = 3 in (6.3) we find
V3/ Vff/ _ _ i£

The reader may find it helpful to construct tree diagrams like those in Figure 15 and 16 for this example.
Example 6.5. After a severe flood, a warehouse finds itself stocked with boxes of flashbulbs from which identification labels have been washed off. There are three kinds of bulbs, each packed in units of 100 in identical boxes: low quality, medium quality, and high quality. It is known that in the entire warehouse, the proportions of boxes with low, medium, and high quality bulbs are .25, .25, and .50, respectively.
Since testing a flashbulb means destroying it, exhaustive testing of the bulbs is unpractical. Instead, the distributor orders that two bulbs from each box be tested. The manufacturer, on the basis of past experience, estimates the conditional probabilities given in Table 17.
TABLE 17
Conditional Probabilities of Finding x Defectives Given That Two Bulbs Tested Were from Box of Known Quality
Number of Defectives X Quality of Box
Low Medium High
0 1 2 .49 .64 .81 .42 .32 .18 .09 .04 .01
Suppose two bulbs are selected from a box, tested, and both are found to fire satisfactorily. What is the probability that the box contains high quality bulbs? Our hypotheses are the three events L, M, H that the box contains low, medium, and high quality bulbs, respectively. If we let E denote the observed event that neither of the two bulbs tested was defective, then by Bayes' formula we find
991115
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


98
PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2 P(H)P(ffH)______________
(.50) (.81)
(.25)(.49) + (.25)(.64) +
Proceeding in this way, we compute the a posteriori probabilities of the three hypotheses, and so obtain Table 18.
TABLE 18
Quality of Box A Priori Probability A Posteriori Probability Given That We Observe
0 Defectives 1 Defective 2 Defectives
Low Medium High .25 .25 .50 .18 .23 .59 .38 .29 .33 .60 .27 .13
We see that the most probable hypothesis in the event neither of the two bulbs tested is defective is that the bulbs come from a high quality box. But if one or both of the two tested bulbs aie defective, then the most probable hypothesis is that the bulbs come from a low quality box.
Calculations of this sort using Bayes7 formula are quite common in statistical decision theory. The reader interested in further details can consult References 2 or 10 listed at the end of this chapter.
PROBLEMS
6.1. From a group of four boys and two girls, first one child is selected at random and then, from the remaining five children, another child is selected at random. Find the probability that the second child selected will be a girl (a) from first principles, i.e., by defining a suitable sample space, assigning probabilities to its simple events, etc. (b) by use of Theorem 6.1.
6.2. Refer to Example 5.5 and construct a tree diagram in which the selected ball is identified first by its color and then by the urn from which it was drawn. Find probabilities associated with each branch of the tree, as well as for each path through the tree. Compare with Figure 12.
6.3. Refer to Example 6.1 of the text. Construct a tree diagram in which a fraternity member is first classified according to whether he is enrolled in a mathematics course and then according to his class. Find prob
991116
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMULA 99
abilities associated with each branch of the tree as well as for each path
through the tree.
6.4. In the Polya urn model of Problem 5.10, find
(a) The probability that the first ball is green.
(b) The probability that the second ball is green.
(c) The probability that the third bail is green. [In view of the answers in (a), (b), and (c), are you willing to make a conjecture? Try proving it!]
6.5. This problem should be done from first principles and also by using Bayes' formula. Three identical boxes each contain two coins. In one box both are pennies, in one both are nickels, and in the third there is one penny and one nickel. A man chooses a box at random and takes out a com. If the coin is a penny, what is the probability that the other coin in the box is also a penny?
6.6. (a) Bolts are made by two machines A and B, but A produces twice as
many bolts as B in a given time. A is known to produce two percent defectives and B one percent defectives. A bolt is examined and found to be defective. What are the probabilities a priori and a, posteriori that the bolt was produced by At
(b) Suppose Tii to 7i2 is the ratio of the number of bolts produced by A to the number produced by B. Let pi and p% denote the proportion of defectives produced by A and B, respectively. Suppose a bolt is tested and found to be defective. Show that if n±pi > nzp^ then the a posteriori probability that the bolt was produced by A is, greater than the a posteriori probability that the bolt was produced by B.
6.7. Air. Smith, having lived in his city many years, estimates the a priori probability that today's weather will be inclement is ,2. (He thinks, today will be fair with probability .8.) Mr. Smith listens to an early morning weather forecast to get some information on the day's weather. The forecaster makes one of three predictions: fair weather, inclement weather, uncertain weather. Mr. Smith has made estimates of conditional probabilities of the different predictions given the day's weather,, as shown in Table 19. For example, he believes that of the fair days
TABLE 19
Forecast
Day's Weather
Fair Inclement Uncertain
Fair .7 .2 .1
Inclement .3 .6 .1
991117
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


100 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
70% are correctly forecast, 20% are forecast as inclement and 10% as uncertain.
Suppose Mr. Smith hears the forecaster predict fair weather. What is the a posteriori probability of fair weather?
6.8. In a Tmaze, a laboratory animal is given a choice of going to the left and getting food or going to the right and receiving a mild electric shock. Before any conditioning (in trial number 1) animals are equally likely to go to the left or right in the maze. Having received food on any trial, the probabilities of going to the left and right become .6 and .4, respectively, on the following trial. Having received the electric shock on any trial, the probabilities of going to the left and right on the next trial are .8 and .2, respectively.
(a) "What is the probability that the animal will turn left on trial number 2? On trial number 3?
(b) Let us denote by pn the probability that the animal will turn left on trial number n. Derive an equation relating pn and Pni and use this equation to find a general formula for pn in terms of pi and n.
6.9. Refer to Problem 5.11 and suppose the selected socks are of the same color. What is the probability that they are black?
6.10. A multiplechoice test question lists five alternative answers, of which just one is correct. If a student has done his homework, then he is certain to identify the correct answer; otherwise, he chooses an answer at random. Let p denote the probability of the event E that a student does his homework, and let F be the event that he answers the question correctly.
(a) Find a formula for P(E\F) in terms of p.
(b) Show that P(E\F) > P(E) for all values of p. When does the equality hold?
(c) Suppose the test lists n alternative answers of which only one is correct. Now find P(E\F) in terms of n and p, and show that if p is fixed but unequal to 0 or 1, then P(E\F) increases as n increases. Is this result reasonable?
6.11. Of the freshmen in a certain college, it is known that 40% attended private secondary schools and 60% attended public schools. The registrar reports that 30% of all students who attended private secondary schools but only 20% of those who attended public schools attain A averages in their freshman year. At the end of the year, one student is chosen at random from the freshman class and he has an A average. What is the conditional probability that the student attended public schools?
991118
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / BAYES' FORMULA 101
6.12. You know that urn A contains two green and one red ball and urn B contains three green and two red balls. One of these urns is selected at random, but you don't know which one is selected. You may perform one of the following experiments before guessing which urn was selected.
(i) Take one ball out of the selected urn and observe its color, (ii) Take two balls out of the selected urn, replacing the first before
drawing the second, and observe their colors, (iii) Same as (ii), except that you do not replace the first ball before
drawing the second.
Whichever experiment you choose, as soon as its outcome is known you compute the a posteriori probabilities of urns A and B being selected, given the observed outcome. You then guess the urn whose a posteriori probability is larger.
(a) For each of the three experiments, determine which urn you guess for each possible experimental outcome.
(b) For each of the three experiments, calculate the probability that you actually guess correctly which urn was selected. Which experiment leads to the highest probability of guessing correctly? [It is interesting to observe that most people, when offered a choice of one of the three experiments, prefer experiment (iii).]
6.13. Let {Ei, E% • • •, En} be a partition of a sample space S and let E be any subset of S. Show that
{E r\ EI, E n Et, •  , E r\ En}
is a partition of E.
6.14. Let a universal set *U of people be given. Let M, F, C, A, H, and S denote the subsets of male, female, child, adult, healthy, and sick people respectively. Then we can form the following partitions of *U:
Px = {M, F} P2 = {C, A} P3 = {H, S}. From PI and P% we can form a new partition, namely
p4 = {M n c, M n A, F n c, F n A>,
in which people are classified both according to sex and age. Pi is called the crosspartition of Pi and P%. Analogously we can classify people according to their health in addition to sex and age. We thus are led to the partition
ps = {M n c n H, M n A n H, F n c n H, F n A n H? Mncns,MnAns,Fn
which is called the crosspartition of PS and P*.
991119
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


102 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
From these illustrative examples, formulate a reasonable definition for the crosspartition of any two partitions of an arbitrary set E. Can you prove that a crosspartition of E is a partition of El
7> Independent events
As we have seen, the probability of E and the conditional probability of E given F are generally unequal, although they can be equal. The case of equality,
(7.1) P(E\F) = P(E),
is especially important, for (7.1) expresses the fact that knowing F has occurred does not change the probability of E having occurred. If (7.1) holds, we shall say that E is independent of F. Let us note that this relation between the events E and F is denned only if F has positive probability, i.e., only if P(E\F) is meaningful. Assuming P(E) > 0 and P(F) > 0, we rewrite (5.4) here,
(7.2) P(E 0 F) = P(E)P(F\E} = P(F)P(E\F),
&nd from the equality on the right deduce that if (7.1) is true, then
(7.3) P(F\E) = P(F)
is also true. We have thereby proved the following result.
Theorem 7.1. Let E and F be events with positive probability. If E is independent of F} then F is independent of E.
In words, if knowledge that F occurs does not change the probability of E, then knowledge that E occurs does not change the probability of F. Thus, if P(E) > 0 and P(F) > 0 so that the conditional probabilities in (7.1) and (7.3) are defined, then these equations must both be true or both be false. When either is true, we find from (7.2) that
(7.4) P(E n F} = P(E)P(F).
An important definition is based on equation (7.4).
Definition 7.1. Two events E and F are said to be independent •events if and only if Equation (7.4) holds; i.e., the probability that both E and F occur is the product of the probability that E occurs and the probability that F occurs. Two events that are not independent are said to be dependent events. We shall refer to Equation (7.4) as the multiplication rule for the events E and F.
In the literature, one often finds two independent events referred to as "mutually independent/' "stochastically independent," "inde
991120
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 7 / INDEPENDENT EVENTS 103
pendent in the sense of probability/' or "statistically independent." We shall use the simpler language of Definition 7.1.
There is a reason for using Equation (7.4), rather than (7.1) or (7.3), as a means of defining independent events. With the latter equations, events E and F with zero probability would be excluded from our definition. No such restriction is involved when Equation (7.4) is used. In fact, it is easy to see from Definition 7.1 that if P(E) = 0 and F is any event, then E and F are independent. For E H F is a subset .of E, and so P(E C\ F) < P(E) by Theorem 4.2. Since we are assuming P(E] = 0, it follows that P(E O F) = 0. Thus Equation (7.4) holds, and this proves that E and F are independent events, as claimed.
Whether or not two events E and F are independent is a question that we can answer in our present state of knowledge only by showing that Equation (7.4) does or does not hold. Although we will often have the intuitive feeling that two specified events E and F are independent, our intuition must be checked by computing P(E), P(F), and P(E H F}, and then verifying that the multiplication rule (7.4) is true. We give three examples.
Example 7.1. A green and red die are rolled. Let E be the event "six on green die" and F the event "five on red die." We choose the familiar sample space containing 36 outcomes and assign probability ^ to each simple event. Then
P(E)=& = i, P(F) = ^ = t, P(E 0 F) = AEquation (7.4) holds and, therefore, E and F are independent events.
Example 7.2. Two fair coins are tossed. Let E be the event "not more than one head" and F the event "at least one of each face." Define the sample space S = {HH, HT, TH, TT} and assign proba^ bility I to each simple event. Then
P(E) = f, P(F)  , and P(E H F) = . Hence
P(E C\F)^ P(E}P(F},
and so E and F are dependent events.
Example 7.3. Three fair coins are tossed. Let E and F be events as described in Example 7.2. Now the sample space S contains the familiar eight elements HHH, HHT, • , TTT. We assign probability  to each simple event of S. Then
991121
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


104 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
P(E) = , P(F) = f, P(E O F) = P({HTT, THT, TTH}) = .
Now the multiplication rule (7,4) holds, and E and F are independent events.
It is not unusual for students to feel that the events E and F should either be independent in both Example 7.2 and 7,3, or dependent in both. But our intuition is not to be trusted—Equation
(7.4) must be relied upon to determine if two events are independent or dependent. [See Problem 7.3.]
If the probability of E is unchanged by the knowledge that F occurred, then it seems reasonable that it should also be unchanged by the knowledge that F did not occur. This and related observations are made precise in the following result.
Theorem 7.2. Let E and F be independent events. Then the following pairs of events are also independent: (i) E and F', (ii) Ef and F, (in) E' and Ff.
Proof. "We prove (i) and leave (ii) and (iii) for the reader. (See Problem 7.4.) In view of Definition 7.1, to prove E and F' are independent events, we must prove that the multiplication rule (7.4) holds for E and F'; i.e.,
(7.5) P(E H *") = PCE)P(F). Now, by the result of Problem 4.13(e),
P(E H F') = P(E]  P(E O F) = P(E)  P(E}P(F), since the multiplication rule holds for E and F by hypothesis. Hence,
P(E O F'} = P(E}[1  P(F)] = P(E)P(F'\
by Theorem 4.6. Thus, Equation (7.5) holds and the proof of (i) is complete.
Example 7.4. Suppose A has probability PA. of surviving one year and B has probability pB of surviving one year. If we assume that event E (A survives one year) and event F (B survives one year) are independent, then we have the following possibilities and their probabilities:
Event Verbal Description Probability
E f! F Both. A and B survive 1 year PA.PS
E n F' A survives 1 year but B does not p^(l — p%)
E' 0 F A does not survive 1 year but B does (1 — PA)PB
E' fl Fr Neither A nor B survives 1 year (1 — PA)(! — PB)
991122
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 7 / INDEPENDENT EVENTS
105
PROBLEMS
7.1. A card is drawn at random from a standard deck of 52 cards. Let E be the event that the card is a spade, F the event that the card is a deuce? and G the event that the card is a deuce or a trey. Determine which of the following pairs of events are independent.
(a) E and F (b) E and G (c) F and G.
7.2. Refer to the mortality table in Problem 5.13. Mr. Smith is now aged 21 and Mr. Jones is now aged 23. Each man was issued insurance one year ago. Assuming that the events "Smith survives to age 22" and "Jones survives to age 24" are independent, calculate the probability that at least one of the men dies within one year.
7.3. (a) Four coins are tossed. Let E and F be the events described in
Example 7.2. Show that E and F are dependent.
(b) Let n coins be tossed, where n is any positive integer greater than 1. Let E and F be the events described in Example 7.2. Show that E and F are independent events if and only ifn = 3.
7.4. Complete the proof of Theorem 7.2 by proving (ii) and (iii).
7.5. Consider the data in Table 20 on the smoking habits of a sample of females in the United States.
TABLE 20
Distribution of Females 1824 Years of Age, by Current Amount of Smoking and by Income, February 1955 ;
Income Number of Persona Percentage Distribution
Not Regular Smokers j Regular Smokers •
Nonsmokers { Occasional j Av. Daily No. of Cigarettes Total
Never! Disconi j 19 I 1020 1 2140 1 tinued j 1 ! j
None Under $1000 $10001999 $20002999 $3000Total: 3335 1677 1117 956 375 % % % % % % 64.1 2.5 4.1 13.4 15.1 .8 65.1 2.9 6.2 14.1 11.2 .5 64.5 3.0 4.0 14.5 11.0 3.0 593 0.8 11.6 12.2 15.5 .6 40.5 6.5 13.1 10.2 27.6 2.1 % 100 100 100 100 100
7460 62.6 2.6 6.0 13.4 14.3 1.1 100
Source: Tobacco Smoking in the United States in Relation to Income, Marketing Research Report No. 189, U.S. Dept. of Agriculture, Washington D. C., July 1957, page 110.
991123
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


106 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Suppose that a single female Is selected at random from the 7460 females making up this sample.
(a) Define the sample space S for this experiment. What probability is to be assigned to each simple event of $?
(b) Find the probability that the female selected lias an income of at least $3000.
(c) Find the probability that the female selected smokes between 10 and 20 cigarettes per day on the average.
(d) Find the conditional probability that the female selected smokes between 10 and 20 cigarettes per day on the average, given that she has an income of at least $3000.
(e) Find the probability that the female selected has an income of at least $3000 and also smokes between 10 and 20 cigarettes per day on the average.
(f) Are the events "female selected has an income of at least $3000" and "female selected smokes between 10 and 20 cigarettes daily on the average" dependent or independent events?
7.6. One student is selected at random from the summer school students of Problem 5.6. Are the events "student failed Chemistry" and "student failed History" independent or dependent?
7.7. From a pack of playing cards, two cards are drawn successively, the first being replaced before the second is drawn. Let E be the event "first card Is a spade," F the event "second card is not a king," and G the event "first card is an ace or a king." Determine which (if any) of the three pairs of events E and F} F and (?, E and G are independent*
7.8. Repeat Problem 7.7, but now assume that the first card is not replaced before the second is drawn.
7.9. Show that if E is any event and P(F) = 1, then E and F are independent.
7.10. Of what events E can it be said that the events E and E are independent?
7.11. Let E, F, and G be three events. We are told that E and F are independent events, and that F and G are independent events. Does it follow that E and G are independent events? Defend your answer.
7.12. Of the three events E, F, and G, we know that E and F are independent and G C E. Does it follow that G and F are independent? Defend your answer.
7.13. The 19571958 Combined Membership List of the American Mathematical Society (S), the Mathematical Association of America (A), and
991124
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 8 / INDEPENDENCE OF SEVERAL EVENTS 107
the Society for Industrial and Applied Mathematics (I) gives the following information for the 46 members listed on page 1.
Memberships Number
S only 16
A only 15
I only 7
S and A 6
A and I 1
S and I 0
S and A and I 1
One person is selected at random from this group of 46 people.
(a) Show that the events "person belongs to the American Mathematical Society" and "person belongs to the Mathematical Association of America" are dependent.
(b) Assuming everyone else maintains their memberships, how many of the 16 members of only the American Mathematical Society must also become members of the Mathematical Association of America in order that the events in (a) be independent?
7.14. Two partitions of 8, say {Ei,E2, ••,.#»} and {Fi,F*, ,Fm} are defined to be independent if
f or i — 1, 2, • • •, n and j = 1, 2,  • •, m, i.ev if the multiplication rule (7.4) holds for every pair of events formed by taking one event from each partition. Show that the events E and F are independent if and only if the partitions {E, E'} and {F} F'} are independent.
8. Independence of several events
In this section, we generalize the notion of independence to an arbitrary (but finite) number of events. Let us first consider the special case of three events EI, E% and E$.
Definition 8.1 . The events EI, EI, and Es are pairwise independent (or independent in pairs) if all of the possible pairs of events (i.e., EI and EI, EI and 25*, E% and Ez) are independent.
Thus, if EI, Ez, and JE?» are pairwise independent, then the multiplication rule (7.4) holds for each pair of events:
991125
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
P(Ei 0 Ed = P(EdP(Ed P(E1 0 Ed = P(EdP(Ed P(E* 0 #3) = P(EdP(Ed.
Let the reader note that we have defined what we mean by the pairwise independence of three events, but we have not yet said what is meant by the phrase, "E\, Ez, and E% are independent events.7' Nevertheless, we do think of reasonable consequences that such a definition should entail. For example, we would like to be able to show that if Ei, E%, and E% are independent events, then the two events (Ei C\ Ed and E$ are also independent. Does this result follow if we assume only that EI, Ez, and E% are pairwise independent? This question amounts to asking if the equations in (8.1) imply that the multiplication rule holds for (Ei O E*) and Es, i.e., whether or not
(8.2) P((EI n Ed n E$ = P(EI n
follows from (8.1).
But (Ei n E£ O Ez = Ei n J&2 n E* and if we use (8.1) to simplify P(Ei n Ez), then (8.2) becomes
(8.3) P(Ei C}E2r} Ez) = P(EjP(E3P(Ed.
We are thus led to inquire whether Equations (8.1) imply Equation (8.3). That the assumption of pairwise independence of EI, E^ and Z?a does TIO£ imply the desirable consequence that (Ei pj ^2) and E% are independent is shown by the following example where three events are defined for which (8.1) is true but (8.3) is false.
Example 8.1 . To control the quality of a manufacturing process, each unit produced passes through three inspections. Of four units A, B, C, and D it is known that A passed only inspection 1, B passed only inspection 2, C passed only inspection 3, and D passed all 3 inspections. One of the four units is selected at random. Let EI = "unit passed inspection 1," Ez = "unit passed inspection 2," and Es = "unit passed inspection 3." Then
= f
P(EI n Et) = P(EI n Ed = P(E* n Ed = },
so that all three equations in (8.1) hold. Thus the events EI, E% and ES are pairwise independent. But (8.3) does not hold, since
n Et n EI) = i
991126
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 8 / INDEPENDENCE OF SEVERAL EVENTS 109
We conclude that the pairwise independence of EI, E^ and E% does not imply the independence of (Ei C\ E%) and E%.
From this example (see also Problems 8.18.3), it is clear that the definition of independence for more than two events requires care. For three events, a suitable definition is obtained by demanding that Equation (8.3) hold in addition to the three equations in (8.1). We shall find it convenient to refer to Equation (8.3) as the multiplication rule for the events EI, E^ and E$.
Definition 8.2. Three events EI, E2, and #3 are said to be independent if and only if the multiplication rule holds for all combinations of two or more of the events.
The three equations in (8.1) express the multiplication rule for the three pairs of events obtainable from EI, E% and E& Equation (8.3) is the multiplication rule for all three events. To say that EI, E%, and E% are independent is to say that all four of these equations are true.
It is now possible to prove that certain expected consequences do indeed follow from Definition 8.2.
Theorem 8.1. Let Ei, E% and E$ be independent events. Then the following events are also independent:
(a) Ei and (J?2 Pi Ez) (b) E% and (Ei U E*)
(c) E( and (E* H Eft (d) Ei, E* and Ei
More generally, EI and any event expressible in terms of J?2 and E* are independent, E% and any event expressible in terms of EI and E$ are independent, etc.
Proof. We prove (c) here, and leave (a), (b), and (d) for the reader. The general result can be proved by considering these and all other similar combinations of the three events EI, E%, and E%.
To prove (c) requires (by Definition 7.1) that we prove
(8.4) P(Ei n (S 0 Eft) = P(EQP(E* O Eft.
We are given that EI, E%, and Es are independent, so that Equations (8.1) and (8.3) are true by hypothesis.
By drawing an appropriate Venn diagram and resorting to the fundamental definition of the probability of an event, let the reader verify that
991127
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


110 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
(8.5) p(Ei n (EI n ESU) = P(E2 n E&  P(E, n E$
+ P(EI n E, n js?s).
(Using the "numbers of flags" language introduced in Section 4, the proof of (8.5) amounts to noting that one obtains the sum of the numbers on all flags in E{ C\ E% C\ Ef^ each counted once, by writing the sum of the numbers on all flags in Ez D E'% and in EI O E% p E^ and then subtracting the numbers that have been added twice, their sum being P(Ei C\ E£.) By using Equations (8.1) and (8.3), we find
(8.6) P(Ei 0 (Ei 0 EQ)
H E£)  P(El)P(E,) + P(EdP(Ej)P(Ej) 0 E£)  P(El)P(E,)[l  P(EZ)] = P(E* 0 EQ  P(El)P(E,)P(Ef^}.
But since E% and E$ are independent by hypothesis, it follows from Theorem 7.2 that E% and E$ are also independent, and so the multiplication rule holds for E% and ^3. Hence, continuing from (8.6),
( n (& n EM = P(E* n EQ  P(EI)P(E, n E$
n J^S).
This completes the proof of part (c) of Theorem 8.1. The following example shows how this theorem is applied.
Example 8.2. One shot is fired from each of three guns. Let EI, E», Ez denote the events that the target is hit by the first, second and third gun, respectively. Suppose
P(El) = 0.5, P(E$ = 0.6, P(E,) = 0.8.
Assuming EI, Ez, EB are independent events, what is the probability that exactly one hit is registered?
Since the one hit can be made by any gun, the required probability is given by
P(EI n #2 n E£) + p(Ei n E2 n EQ + P(EI n E^ n ^).
By the independence assumption and Theorem 8.1, each of these probabilities can easily be evaluated. For example,
n EL n EQ =
= (0.5) (0.4) (0.2) = 0.04.
991128
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 8 / INDEPENDENCE OF SEVERAL EVENTS 111
In this way, we find the probability of exactly one hit is 0.26. (See Problem 8.5.)
Definition 8.2 has been so formulated that it can be used as a definition of independence for any finite number of events.
Definition 8.3. The n events EI, E2,   , En (n > 2) are said to be independent if and only if the multiplication rule holds for all combinations of two or more of the events, i.e., if and only if we have
P(E% 0 EJ =
n Ej C\ Ed = P(Et}P(E,}P(Ek)
(1 < i < j < k < n) (8.7) j P(Et C\ Ei C\EkC\ E$ = P(Et)P(E,)P(Ek)P(El)
(1 < i < j < k < I < n)
n E* n • • • n EJ
How many defining conditions must be checked if n events are to be proved independent? Let us think of the set U containing as elements the n events 2?i, E%, • • •, J?». The set ^1 has exactly 2n subsets. The multiplication rule is required to hold for all subsets of 'U containing at least two events. There is one null subset and there are n unit subsets for which multiplication rules are not required. Hence, there are 2n — n — 1 equations summarized in (8.7).
Let us observe that Definition 8.3 implies that if n events are independent, then any smaller number of events taken from these n are also independent.
In our later work, we shall find many applications of the important idea of independence of events. Most of these involve "independent trials of an experiment" or "experiments repeated independently under identical conditions." In the next section, we present a mathematical formulation of these important concepts.
PROBLEMS
8.1. A green and a red die are rolled. Let Ei = "6 on red die," Ez = "6 on green die," and E$ = "sum of numbers on two dice is odd." Show that Ei, E*, and E& are pairwise independent but are not independent.
991129
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 1 2 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
8.2. A card is selected at random from a standard deck. Let EI = "card is a spade or a club," E* = "card is a spade/3 and Ez = "card is ace, king,  *  , 8 of diamonds or the ace of spades." Show that Equation (8.3) holds, but that none of the three equations in (8.1) is true.
8.3. Suppose of three events EI, E», and Es it is known that EI and E* are independent and that the multiplication rule given in Equation (8.3) applies to all three events. Prove that EI C\ E* and E$ are independent, but show by example that EI and E$ need not be independent.
8.4. A coin is tossed three times in succession. We define the customary sample space
S = {HHH, HHT,    , TTT},
and let EI, Ez, and E$ denote the events that a head is tossed on the first, second, and third toss, respectively. Suppose we require
= P(Ej = P(E3) = p,
where 0 < p < 1, and also that EI, Ez, and Es are independent. Show that there is one and only one acceptable assignment of probabilities to the simple events of S that is consistent with these assumptions.
8.5. Refer to Example 8.2 and find the most probable number of times that the target is hit.
8.6. Let all pairs of events from EI, EZj and E$ be mutually exclusive.
(a) Are EI, E%, and Ez pairwise independent?
(b) What additional hypotheses are required to make "yes" the correct answer in (a)?
(c) With the hypotheses added in (b), are the events E\, E2, and E$ independent?
8.7. Let EI and Ez be independent events and suppose Ez has probability zero or one. Show that EI, E$, and E$ are independent events.
8.8. Complete the proof of Theorem 8.1 by proving (a), (b), and (d).
8.9. Suppose the events EI, Ez, *  • , En are independent and that P(Ek) = !/(& + !) for 1 < k < n. Find the probability that none of the n events occurs, justifying each step in your calculation.
8.10. Let p be the probability that a man aged x will die in a year. There are four men (A, B, C, and D) each aged x. We assume that events EI, E%, Es, EI are independent whenever E\ is defined in terms of only A's life, Ei in terms of only B's life, etc. Find the probability that
(a) A will die within the year.
(b) A and B will die but C and D will not die in the year.
(c) Only A will die within the year.
991130
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec, 9 / INDEPENDENT TRIALS 11 3
8.11. The president of a company must decide which of two actions to take, say whether to rent or buy expensive machinery. His vicepresident is likely to make a faulty analysis and thus recommend the wrong decision with probability .05. The president hires two consultants who separately study the problem and make their recommendations. After watching them at work, the president estimates that one consultant is likely to recommend the wrong decision with probability .05, the other with probability .10. He decides to take the action recommended by a majority of the three reports he receives. What is the probability that he will make a wrong decision? Does the assumption of independence you have made seem reasonable for this problem?
9m Independent trials
The notion of "experiments repeated independently under identical conditions" is central to empirical science and, as such, is worthy of precise formulation. This we do in the present section. Since we shall make extensive use of Cartesian product sets, the reader may find it helpful to review Section 5 of Chapter 1 at this time.
Suppose an experiment is under consideration. As we know, we think instead of its mathematical counterpart, the sample space S, where
(9.1) S = Ko2j *,on}.
We assume that an acceptable assignment of probabilities has been made to the simple events of $; i.e., to each {o3} there is assigned a nonnegative number P({o3}) in such a way that
(9.2) S P({oy}) = 1.
y=l
Now let us think of performing this experiment and then performing it again. The succession of two experiments is a new experiment that we want to describe mathematically. In order to avoid confusing references to original experiments and this new experiment, it is convenient to refer to the original experiments as trials and to describe the new experiment as made up of two trials, each represented by (or corresponding to) the sample space S. This new experiment is mathematically defined, as are all experiments, by a sample space. The elements (outcomes) of this new sample space are all the ordered pairs (o/, Ok) denoting the occurrence of outcome o/ at the first trial and outcome 0& at the second trial. Thus the sample space for the
991131
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 1 4 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
experiment is the Cartesian product set S X S. Since the sample space S for each of the two trials making up the experiment has n elements, there are n2 ordered pairs in S X S.
Before probability questions can be answered for the experiment, we must make some acceptable assignment of probabilities to the n2 simple events of S X S; i.e., we must assign a nonnegative number to {(oy, Ok}} for each j and k in such a way that the sum of ah1 n2 numbers is 1. As we know, there are infinitely many ways of doing this. But if we say that the two trials are independent, then by definition there is one and only one way that we must use: the assignment must be made so> that
(9.3) P({(OJ, o,)}) = P({o,})P({o*})
for j = 1,2,  * • , n and k = 1,2, — ,n.
Formula (9.3) expresses the probability of the simple event {(<>j9 °k)} of S X S as the product of the probabilities of the simple events {o/} and {o&} of S. Before discussing the significance of this rule, we first demonstrate that (9.3) provides an acceptable assignment of probabilities to the simple events of S X S.
The number P({(0/, 0*)}) is certainly nonnegative, since it is the product of two nonnegative numbers. Now to find the sum of the probabilities of all simple events of S X S, we first write them in rows and columns as follows :
P({(02,02)}) .. P({(o2,
(0», 00}) P({K<>2)}) •.. P({(0n, On)})
The sum of the probabilities in the first column is
S P({(oi, d)}) = S P({o,})P({o1})) by (9.3),
3 = 1 3 = 1
, by (9.2).
Similarly, the sum of the probabilities in the Hh column is P({ok}) for k = I, 2,   *, 7i. The sum of the probabilities of all n2 simple events is the sum of the column totals,
991132
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 9 / INDEPENDENT TRIALS 1 1 5
2
and the assignment specified by (9.3) is acceptable, as claimed. We summarize in the following formal definition.
Definition 9.1 . Let S be a sample space with elements 01, o2, • * • , on and let P({o3}} be the probability of the simple event {oy} for j — 1, 2, •  •, n. By the experiment consisting of two independent trials corresponding to S, we mean the sample space S X S (the Cartesian product set of S with itself) whose elements are the n2 ordered pairs (Oj7 o&) and whose simple events {(oy, o&)} are assigned probabilities in accordance with the product rule in Equation (9.3).
Example 9.1. A fair coin is tossed once and then tossed again. Each toss is a trial represented by the sample space S = {H7 T} whose two simple events are each assigned probability . The experiment made up of the two trials is defined by the sample spaceS X S, where
SXS = {HH,HT,TH,TT}.
There are infinitely many acceptable assignments of probabilities. to the four simple events of S X S (see the discussion in Example 3.7), but if the two tosses are said to be independent, then each simple event must be assigned probability J, in accordance with Equation (9.3). We remind the reader of our agreement to write HH rather than (H, H), HT rather than (H, T), etc. It is customary to write ordered pairs using parentheses with the two objects of the pair separated by a comma, but when no confusion can arise we shall continue to use the less cumbersome notation.
Example 9.2. From a population of n people, one person is selected at random. Another person is then selected at random from the full group; i.e., we allow the same person to be selected at both trials. Each selection (trial) is defined by the sample space S = {lj 2, •••, n}, where each person is identified by a positive integer. Each of the n simple events of S is assigned probability 1/n; i.e., P({j}) = l/n for j = 1, 2, *  , n. The experiment made up of these two trials is called selecting a sample of two with replacement from the population and is represented by the Cartesian product set S X S given by
991133
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


116 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
To say that the two trials (i.e., the selection of the first person and the selection of the second person) are independent is to require the assignment of probabilities to the simple events of S X S in accordance with Equation (9.3). Since
the independence of the trials means that each simple event of S X S is assigned the same probability I/ft2. Thus we have the formal mathematical counterpart of our intuitive feeling that selecting a random sample of size two with replacement can be considered as a succession of two independent selections. (See Problem 9.9.)
Example 9.3. Consider the experiment consisting of two independent rolls of a fair die. Since each roll corresponds to the sample space S = {1, 2, — ,6}, each simple event of which is assigned probability i, Definition 9.1 demands that the experiment be defined by the familiar sample space
S X S = {(1, 1), (1, 2),  • , (1, 6), • • , (6, 1), (6, 2),   , (6, 6)}
for which each simple event has probability fa. Let EI = "first roll results in a 6" and E% = "second roll results in an even number." We intuitively expect that the independence of the trials will have as a consequence that EI and E^ are independent events. That this is the case is easy to verify, since
P(El O #2) = A, so that the multiplication rule
P(EI n E
does hold, and the events EI and E% are independent, as expected.
We feel this result is reasonable because of the special nature of the events EI and E$. Of the two independent rolls of the die, the first roll determines whether or not EI occurs, and the second roll determines whether or not E$ occurs. More generally, given any two independent trials, it seems reasonable to expect that, if the first trial determines whether or not an event EI occurs and the second trial determines whether or not an event E2 occurs, then EI and E2 will be Independent events. We want to prove this result is generally true.
But first we must define precisely what we mean when we say that
991134
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 9 / INDEPENDENT TRIALS 11 7
the first trial (or the second trial) determines whether an event E occurs. Let us refer to Example 9.3 and note that, in the experiment consisting of two independent rolls of a fair die, we have
Ei = "first roil results in 6"
= {(6,1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
= {6} X {1, 2, 3, 4, 5, 6]
= {6} X S.
Similarly,
E2 = usecond roll results in an even number77
= {(1, 2), (2, 2), (3, 2), (4, 2), (5, 2), (6, 2), (1, 4), (2, 4), (3, 4),
(4, 4), (5, 4), (6, 4), (1, 6), (2, 6), (3, 6), (4, 6), (5, 6), (6, 6)} = {1, 2, 3, 4, 5, 6} X {2, 4, 6} = SX {2,4,6}.
Generally, our event E is of course a subset of S X S and, as such, is a set of ordered pairs. To say that E is determined by the first trial means that the first member of the ordered pair is restricted by the requirement that E occurs, but the second member is unrestricted. Similarly, to say that E is determined by the second trial means that the first member of the ordered pair is unrestricted, but the second member is restricted by the condition that E occurs. We make the following formal definition.
Definition 9.2. Consider an experiment consisting of two independent trials, each trial defined by the sample space S. (The twotrial experiment therefore has as sample space the Cartesian product set S X S.) An event E (subset of S X S) is said to be determined by the first trial if and only if there is some subset Ci of S such that
E=dXS.
Similarly, an event F is said to be determined by the second trial if and only if there is some subset Cs of S such that
F = S X C2. We are now able to state and prove our main result.
Theorem 9.1. Consider the experiment consisting of two independent trials corresponding to the sample space
S = {01,02, ,0n}. Let EI and E% be any two events of S X S such that EI is determined
991135
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 1 8 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
by the first trial and Ei is determined by the second trial Then E\ and .Z?2 are independent events.
The following lemma is the essential result needed to prove Theorem 9.1.
Lemma. Let S X S be the sample space defining an experiment consisting of two independent trials, each trial corresponding to the sample space S. Let Ci C S and Cy2 C S. Then
(9.4) P(Ci X C2) = P(d)P(C2).
Proof of Lemma, Let the reader first note that (9.4) reduces to (9.3) in the special case when Ci and C2 are simple events of S. And, of course, (9.4) is also true if Ci or C2 is the null event. Now let us suppose that the subsets Ci and C2 are given as follows:
The event Ci X C2 is the union of all those simple events {(o3, o^)} of S X S for which {o3} e Ci and {ok} e C2. We arrange these events in r rows and s columns and write
{(Oyw OM)} U {(Oyw Ofc2)} U ' ' ' U {(Ofi, Ofc)} U
L {(oyr> ofa)} U {fa okz}} U   • U {fe
Now to compute P(Ci X C2) we must add the probabilities of all these simple events. If we add the probabilities of the simple events in the first row, we find from the assumed independence of the trials,
v—l v=l
the last equality following from the definition of P(C2) as the sum of the probabilities of the simple events whose union is C* The sum of the probabilities in any other row is obtained in the same way, the sum for the wth row (u = 1, 2,  • •, r) being P({oj^)P(C%). We obtain the sum of the probabilities of all simple events of Ci X C2 by adding these row sums. Thus,
991136
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 9 / INDEPENDENT TRIALS 1 1 9
P(dXC2)= S P((o,
= P(COP(C2), and the proof of the lemma is complete.
Proof of Theorem 9.1. Our hypothesis concerning EI and E*, in view of Definition 9.2, implies the existence of sets Ci and C2, each subsets of S, such that
E^dXS, E, = SX C2.
To prove the theorem, it suffices to prove that the multiplication rule holds for EI and J?2, i.e.,
(9.5) P(Ei H Ej = P(#i)P(#2). We first note (cf. Problem 1.5.5) that
(9.6) E1C\E,= (Ci X /S) O (S X C2) = Ci X C2. Now apply the lemma three times to obtain
(9.7) P(E, H J2t) = P(Ci X C2) =
(9.8) P(£i) = P(d X S) = P(COP(S) =
(9.9) P(52) « POS X C2) = P(/S)P(C2) = P(C2),
where we have used the fact that P(S) = 1. Thus we see that the multiplication rule (9.5) holds, and so EI and Ez are independent. The proof of Theorem 9.1 is now complete.
It is important to note the significance of our results. Referring back to the dicerolling experiment in Example 9.3, we recall that the phrase "first roll results in 6" was interpreted as the description of the event EI = {6} X S, a subset of S X S. But this phrase also describes an event, namely {6} , which is a subset of S. In general, the event EI = Ci X S and 'the event Ci, although events of different sample spaces, are both determined by the same first trial of the experiment. We expect that these events should therefore have the same probability. Formulas (9.8) and (9,9) guarantee that such expectations are realized.
Note also that when EI and E2 satisfy the hypotheses of Theorem 9.1, then, as shown by (9.7), we can calculate P(Ei O E^ as a product of probabilities of events (subsets) of the sample space S, and we do
991137
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 20 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
not have to do any computations relative to the sample space S X S of the twotrial experiment.
Our definitions and results can be generalized to any finite number of repetitions of the same experiment or, still more generally, to any number of successive experiments whether like or unlike. We omit proofs.
Definition 9.3. Suppose TV" is a positive integer and let Sj (for ] = 1, 2, *  , N) be a sample space with outcomes o(f, o%\ • • •, o%. By the experiment consisting of the succession of N trials, the first corresponding to Si, the second to S%, etc., we mean the sample space Si X & X • • • X SN (the Cartesian product set of Si, S2, • • , &v) whose elements are all the niu^ •••UN ordered TVtuples (o(1), o(2), •, o<^>) where o<*> eSi, o(2)eS2, ••, ow e SN. For each sample space Sj let there be an acceptable assignment of probabilities to its simple events. To say that the N trials are independent is to define the probabilities of all simple events in Si X $2 X • • • X SN by the product rule
Theorem 9.2. Consider the experiment consisting of N independent successive trials corresponding to the sample spaces Si, S2, • • • , S&, in that order. Let events EI, E»,  •  , EN be such that Ej is determined by the jth trial for j = 1, 2, * • •, N. [To say, for example, that EI is determined by the first trial means that there is a subset Ci of Si such that
EI = Ci X S2 X S* X  • • X SN'}
to say that E% is determined by the second trial means that there is a subset C2 of S2 such that
Et = Sx X C2 X Sz X •   X Sy; and so on.] Then the events EI, E^ • • • , EN are independent.
If N = 2 and Si = S2 = S, then Definition 9.3 and Theorem 9.2 reduce to Definition 9.1 and Theorem 9.1 respectively.
Example 9.4. A fair coin is tossed, then a symmetric die is rolled, and finally a card is selected at random from a standard deck. We assume that these three trials are independent. Introducing obvious notation, we note that the event "head" is determined by the first trial, "even number" is determined by the second trial, and "spade"
991138
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 9 / INDEPENDENT TRIALS 1 21
is determined by the third trial. Applying Theorem 9.2, we conclude that "head/7 "even number," and "spade" are independent events. Hence
P(head, even number, spade) = P (head) P (even number) P (spade)
= (4XSXH) = A
Example 9.5. A quiz has four questions of multiplechoice type. There are three possible answers for each question, but only one answer is right. Assuming a student guesses at random for his answer to each question and that his successive guesses are independent, what is the probability that he gets more right than wrong answers?
The sample space for each trial (answering a question) is S = {R7 W}5 where R denotes a right answer, W a wrong answer. We are given that
P({R}) = i, P({W}) = f.
For the fourquestion test, the sample space is S X S X S X S. The event "3 or 4 right" is the subset
{RRRR, RRRW, RRWR, RWRR, WRRR). Because the trials are independent,
P({RRRR}) = ()<, P({RRRW}) = (f)3(f),
and the probability of the other simple events for which exactly three answers are right is also (J)3(f). Hence
P(3 or 4 right) = (f)4 + 4()»(f) = *.
PROBLEMS
9.1. (a) A fair coin is tossed three independent times. Determine a suitable
sample space for this threetrial experiment and make the required assignment of probabilities to its simple events.
(b) Repeat part (a), but now assume that the coin is constructed so that the probability of head on any toss is p (0 < p < 1) and the probability of tail is q = 1 — p. (Cf. Problem 8.4.)
9.2. A random sample of five is selected with replacement from a population of which 40 percent are female and 60 percent are male. Define the sample space for this experiment and assign probabilities to its simple events. Find the probability that the sample contains
(a) no males (b) at least one male
(c) exactly one male (d) all males
991139
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


122 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
9.3. A test has ten questions of multiplechoice type. There are six choices for each answer, but only one is correct. Suppose a student guesses his answer to each question. (For example, he can toss a fair die and let his answer be determined by the number that turns up.) Assuming his guesses are independent, define a sample space for this experiment and assign probabilities to its simple events. Find the probability that he gets nine or ten correct answers.
9.4. At a busy street intersection, it is estimated that a jaywalker will be hit by a car with probability .01. Assuming individual trips form independent trials, find the probability of a jaywalker remaining unhit if he crosses the street twice per day for 30 days.
9.5. A football team wins its weekly game with probability .7, loses with probability .2, and ties with probability .1. Consider the games played on three consecutive weekends as a threetrial experiment in which the trials are independent. Find the probability that the number of wins exceeds the sum of the number of losses and ties.
9.6. A baseball player approximates his chances at bat as follows: probability .3 of getting a hit, .1 of getting a base on balls, and .6 of being out. Consider the four times the player is at bat in a game as four independent trials and compute the probability that he gets (a) one walk and three hits, (b) one walk, one hit, and is put out twice.
9.7. Suppose a missile has probability f of destroying its target and probability J of missing it. Assuming the missile firings form independent trials, determine the number of missiles that should be fired at a target in order to make the probability of destroying the target at least .99*
9.8. (a) Each of two urns contains three identical balls numbered from 1
to 3. One ball is drawn from each urn, and we assume these drawings are independent. What is the probability that 2 is the greatest number drawn?
(b) Each of k urns contains n identical balls numbered from 1 to n» One ball is drawn from each urn, and we assume these drawings are independent. What is the probability that m is the greatest number drawn?
9.9. From a population of n people, one person is selected at random. A second person is then selected at random from the remaining (n — 1) people. Imagine the n people lined up in order in positions numbered 1,2, •  •, n. The first trial (selecting the first person) amounts to selecting one of the n positions, and so can be represented by the sample space Sn = {1,2,  * •, n}, in which each simple event is assigned prob
991140
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 10 / A PROBABILITY MODEL SN GENETICS 123
ability l/n. After this first person is selected, imagine the remaining (n — 1) people keeping their relative positions but closing ranks so* that they are lined up in positions numbered 1, 2, • • •, n — 1. The second trial (selecting the second person) amounts to selecting one of these (n — 1) positions, and so can be represented by the sample space Sn^_l = {1, 2, • • •, n — 1}, in which each simple event is given probability l/(n — 1). If these two trials are assumed to be independent, then the experiment consisting of the two independent trials is called selecting a random sample of two without replacement from the population.
(a) What sample space are we thus led to for the experiment of selecting a sample of two without replacement from the population? What probability is assigned to each simple event of this sample space?
(b) Suppose n = 26 and these 26 people are named A, B, C,  • •, Xy Y, Z. You select a random sample of two without replacement from this population and report the outcome (2, 3). Which two people were selected?
(c) Generalize our discussion and show that the selection of a random sample of AT without replacement from a population of n people can be considered as a succession of N independent selections. (Of course, N < n.)
(d) With n = 26 as in (b), suppose N = 4 people were selected without replacement and the outcome reported as the 4tuple (2, 3,1, 22). Which four people were selected?
9.10. We have an ntrial experiment, each trial of which corresponds to the sample space S. Show that the null event 0 and the entire sample space for the ntrial experiment are determined by every trial of the experiment Does this seem reasonable?
10. A probability model in genetics
Probability concepts have come to play an increasingly important role, not only as the foundation of mathematical statistics, but also in formulating mathematical models for phenomena in all the sciences, biological, physical, and social. In one brief section, we can hardly hope to do more than illustrate the latter kind of application. We shall use the theory developed to this point, especially the ideas of conditional probability and independent trials, to consider (in greatly oversimplified form) some important questions arising in population genetics and involving the factors influencing evolution.
991141
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 24 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Incidentally, we are also able to illustrate the use in probability of the method of difference equations.*
We restrict our attention to a single gene which has only two forms: recessive (r) and dominant (D). We assume that each individual in the population under consideration has two such genes in his chromosomes and therefore can be classified as one of the following types:
(1) pure dominant, DD, in which both genes are of dominant form;
(2) hybrid, rD, in which one gene is recessive and the other dominant;
(3) pure recessive, rr, in which both genes are recessive. (Biologists refer to these classes as genotypes; DD individuals are called homozygous dominant, rr individuals are homozygous recessive, and rD individuals are heterozygous.)
The genetic makeup (with respect to this particular gene) of each generation is described by the proportions of individuals of this generation in the three genotypes. If we think of drawing a sample of one individual at random from this generation, then the proportion of any genotype will be the probability of the individual being of that genotype. Thus we introduce symbols as follows:
n = generation number (0, 1, 2, —),
Un = probability that individual selected from nth generation is DD, 2vn = probability that individual selected from nth generation is rD, wn = probability that individual selected from nth generation is rr.
It is clear that
(10.1) Un + 2VU + Wn = 1 (tt = 0, 1, 2, • • •)
The general problem of population genetics can be formulated in the following manner: Given the initial (n — 0) probabilities UQ} 2v0, WQ and a set of assumptions describing the dependence of future generations on this initial one (i.e., assumptions concerning the mating system, gene mutations, forces of natural selection, etc.), find the genotype probabilities forn > 1.
* Additional material on probability methods in genetics can be found in Chapter 3 of the book by Neyman listed in the references at the end of this chapter. For a systematic exposition of the construction and application of a probability model in psychology, see R. R. Bush and F. Mosteller, Stochastic Models for Learning, John Wiley and Sons, Inc., 1955. A number of articles containing probability models appear in P. F. Lazarsfeld (Ed.), Mathematical Thinking in the Social Sciences, The Free Press, 1954. The method of difference equations used in this section is expounded in the author's Introduction to Difference Equations, John Wiley and Sons, Inc., 1958.
991142
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 10 / A PROBABILITY MODEL IN GENETICS 125
We shall consider a problem of this sort in which the system is one of random (Mendelian) mating, sometimes called panmixia, modified by both selection and mutation forces. The salient features of this model are described in the context of the following rules for obtaining an individual of any generation (say the (n + l)st) if the genotype probabilities un, 2vn, Wn are known for the preceding generation (the nth).
(i) A male parent is selected at random from this population; i.e., the probabilities are un, 2vnj wn for such a parent to be DD, rD, rr respectively. (We assume that genotypes occur among males and females with the same probabilities as in the whole population.) A single gene is then selected at random from the two genes that the male parent carries. For example, if the male parent is DD, then the gene D is transmitted with probability 1 ; if the parent is rD, then genes r and D are each selected with probability J, etc.
(ii) A female parent is selected at random from the population, as in (i). A single gene is then selected at random from the two genes carried by the female parent.
The genotype of the new individual of the (n + l)st generation is determined by the union of the male and the female genes selected in (i) and (ii). We shall speak of steps (i) and (ii) as trials of the experiment in which an individual of one generation is formed from individuals of the preceding generation.
Random Mendelian mating (panmixia) with respect to the single gene under study is characterized by the assumption that the selections involved in trials (i) and (ii) are carried out at random and that these trials are independent.
As an example, let us calculate the probability of genotype rr in the (n + l)st generation of a population undergoing random Mendelian mating. An rr individual can arise only when the genes selected from both the male and female parent are recessive. Now the event J?i (male gene is r) can occur in two mutually exclusive ways: (a) the male parent is rr and then an r gene is transmitted, or (b) the male parent is rD and then an r gene is transmitted. We thus find that
(10.2) P(E1) = K)(l) + (2tO(i) = vn + wn.
Similarly, we observe that if J?2 is the event "female gene is r," then
= Vn + Wn.
991143
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 26 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
But Ei is determined by trial (i) and Ez is determined by trial (ii). Since the trials are assumed independent, we conclude by Theorem 9.1 that EI and E% are independent events. Hence
H Ez) = P(E1)P(E2).
Since EI C\ E% is the event "individual of (n + l)st generation is rr/J we have
P(Ei Pi E$ = ^nfi, and therefore
(10.3) Wn+l = On + WnY.
Although we can continue and similarly derive the other genotype probabilities in the (n + l)st generation in terms of those in the preceding generation, we turn instead to a more general model hi which the assumptions of random mating are modified by mutation and selection forces.
We first assume that the dominant gene D mutates to the recessive form r with probability a (0 < a. < 1), this mutation being independent of the source (male or female) of the gene. Let us think of the mutation occurring, if at all, after the male and female genes are selected, but before their union. To illustrate, we recalculate the probability of the event EI as follows. EI can now occur in four mutually exclusive ways : (a) male parent is DD, gene selected is D, this gene mutates to r; (b) male parent is rD, gene selected is r; (c) male parent is rD, gene selected is D, this gene mutates to r; (d) male parent is rr, gene selected is r. Thus we find
(10.4) P(EO = (u«) («(«) + (2iO(J) + (2tO(»« + («>.)(1)
= (Vn + Wn} + a(un + Vn).
Note that (10.4) reduces to (10.2) in case there is no mutation (a = 0).
We also add a selection force which affects the participation of pure recessive individuals in the mating process. Up to this point, we have assumed that all genes are viable. Now let us suppose that the fertility of rr individuals is impaired so that a proportion 0 of these individuals (whether male or female) do not have viable genes to transmit. To illustrate the impact of this assumption, we again calculate the probability of EI (male gene is r) but now on the condition F that the gene is viable; i.e., we calculate P(Ei\F). For the moment, we neglect the mutation effect. Until now P(F) has been 1, but with
991144
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 10 / A PROBABILITY MODEL IN GENETICS 127
the addition of fertility differences, we note that the male gene is viable if and only if the male parent is not one of those whose fertility is impaired. Hence
(10.5) P(F) = 1  jto«.
Also we calculate
P(Ei O F) = (2vnm + (wn 
= Vn + Wn — &Wn.
Hence
/inAN —
(10.6)
Note that (10.6) reduces to (10.2) when 0 = 0 (no ^election force or all genes viable.)
We should observe that the mutation and selection forces are opposite in effect: the mutation force is directed toward an increase of the recessive gene in the population, whereas the selection force tends to decrease the relative frequency of this gene.
Our problem can now be summarized as follows: Suppose that our system of mating is panmixia modified by the above mutation and selection forces. Let fn equal the probability (proportion) of the recessive gene r among the genes of parents in the nth generation. The genotype probabilities WQ, 2t'0, WQ are given for the initial (n — 0) generation of parents. How does fn depend upon nl Is there an equilibrium value of fn that is approached as n gets larger and larger and, if such an equilibrium proportion exists, how quickly is it reached and what is its dependence on the initial genotypic composition of the population?
We have in (10.6) calculated the probability P(Ei\F) that a viable gene produced by a parent of the nth generation is recessive, but we neglected the mutation force. Similarly we find that the probability, say pnj that a viable gene produced by a parent of the nth generation is dominant (before mutation process occurs) is given by
(10.7)
It follows that the probability that this viable gene is dominant after mutation is pn(l — a), and so we find
(10.8) fn = 1  (1  a)Pn.
991145
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 28 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
Knowing /„, we obtain the genotype probabilities in generation (n + 1):
r «„+! = (1  /J2
(10.9) 4 2rn+. = 2/n(l  /„)
j»*>
I tt'n+1 = /n,
and using these together with (10.7) and (10.8) we derive (Cf. Problem 10.4) a difference equation for the probabilities fn:
(10.10) /n+l = 1  (1  a)
Our problem can now be formulated analytically: Given numbers /o, a, #, each between 0 and 1 inclusive, find the dependence of fn (the proportion of recessive genes among parents of generation number ri) on the generation number n and the prescribed parameters /0 (the proportion of recessive genes among parents of the initial generation n — 0), a (the probability with which a dominant gene mutates to the recessive form), and ft (the proportion of rr parents in each generation who do not have viable genes).
This problem is quite difficult to solve hi all generality, but there are some important special cases that are fairly easy.
Case 1: Panmixia (a = 0, ft = 0; neither mutation nor selection.) In this case, the difference equation (10.10) reduces to (10.11) /„+!=/„ (n = 0,l,2, •••)
and we immediately conclude that fn = /o for all n. In view of Equations (10.9), it follows that
Ui = (1 — /o)2 = UZ = UZ =
2vi = 2/o(l  /0) = 2y2 = 2z;3 =  • Wl = /§ = w* = Wz =
We have In this way demonstrated the socalled HardyWeinberg law: With repeated mating under panmixia, the distribution of the three genotypes (with respect to a single gene) is fixed after one generation. Thus we see that the MendeKan laws are conservative in effect, and one may regard evolution as the study of those forces (mutation, selection, assortative mating, etc.) which tend to disturb this unchanging equilibrium of genotype proportions.
Case 2: a = 0 and 0 = 1; i.e., no mutation and all pure recessives
completely steriJa
991146
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 10 / A PROBABILITY MODEL IN GENETICS 129
Now Equation (10.10) becomes
(10.12) fn+l = 1  jq^r (n = 0, 1, 2,  • .)• If we make the substitution
(10.13) fn = ~>
Qn
then (10.12) takes on the simpler form
(10.14) gn+i = gn+l (n = 0, 1, 2, • • •)•
From (10.14), with n = 0, we find gi = gQ + 1. Then putting n = 1 in (10.14) we see that g2 = gi + I = gQ + 2. Writing (10.14) with n = 3 and then using our newly found expression for gz, we find #3 = #2 + 1 = #o + 3. By mathematical induction, we can prove
(10.15) gn = go + n (71 = 0,1,2, ••.). In view of (10.13) we thus find that
(10.16)
Equation (10.16) describes the elimination of a single gene under complete sterilization of pure recessives. Although /* decreases and approaches zero as n gets larger, this decrease is quite slow. For example, let us compute the number of generations required in order that, with complete sterility of all rr individuals, the proportion of the r gene decreases to half its initial value. We put /„ = ()/0 in (10.16) and solve to find n = 1//0. Thus if /„ = .001, then 1000 generations are required to reduce the proportion of the recessive gene to .0005. Quantitative considerations of this kind are clearly of importance in eugenics.
Other conclusions that follow from Equation (10.10) are included in the problems.
PROBLEMS
10.1. Find the proportions of the three genotypes in the first (n = 1) and second (n — 2) generation of random Mendelian mating if
(a) MO = 0, 2vQ = J, WQ = J
(b) WQ = WQ = 0, 2«o = 1
(c) WQ = 1, 2u0 = WQ = 0
(d) UQ = 2VQ = Q,WQ = 1
991147
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 30 PROBABILITY IN FINITE SAMPLE SPACES / Chap. 2
10.2. Redo the preceding problem with a mutation force added. Assume a — 0.1 is the probability that the dominant gene D mutates to the recessive form r.
10.3. Redo Problem 10.1 assuming both mutation and selection forces present. Use Equations (10.7)(10.9) and assume a = 0.1, ft  0.2. In each case write the first three terms of the sequence /0, /i, /2,
10.4. To derive the difference equation (10.10), proceed as follows, (a) Use (10.8) to write fn+i in terms of pn+i. (b) In the equation found in (a) replace p«+i by an equivalent expression obtained by using (10.7) and involving Un+i, vn+i, wn+i. (c) In the equation obtained in (b), substitute for ifo+i, Vn+it wn+i the expressions given in (10.9) and simplify.
10.5. Write out the details of the derivation of Equation (10.14) in the text*
10.6. Show by mathematical induction that (10.15) is the solution of the difference equation (10.14) for all n = 0, 1, 2,  • .
10.7. Consider the case in which all pure recessives are completely sterile (i.e., 0 = 1), but the supply of the recessive gene r is replenished by the mutation D — » r; i.e., a > 0.
(a) Show that Equation (10.10) becomes
(b) Suppose /0 = Va. Show that then /„ = Va for n = 1, 2, • • .
(c) Suppose a = 1. Find the terms of the sequence /o, fi, /s,
(d) Suppose /o ?* Va and a 9* 1. Let/» = Va + l/gn and show that gn satisfies the difference equation
(10.18) gn+l = Agn + B (n = 0, 1, 2, • • •), where
a 1 
(e) Show that Equation (10.18) is satisfied for all n = 0, 1, 2, * * • by
(f) Conclude that if/0 5^ Va and a ^ 1, then Equation (10.17) implies, (10.19) /, = \Ta +A—
" , i VI + V^M__I
1 • . /— II I— I ,
V/— i^ /— jl 7=
2Va /0  Va/\l  Va;
(n = 0,1,2,.)
991148
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


SUPPLEMENTARY READING 131
(g) From (10.19) note that since A > 1, An gets larger and larger as n increases. Conclude that/B approaches Va as n increases without bound. Thus the popuktion approaches a balanced (equilibrium) state in which the recessive gene occurs with proportion VOL.
SUPPLEMENTARY READING
1. Bizley, M. T. L.? Probability: An Intermediate Textbook, Cambridge Univ. Press, 1957.
2. Chemoff, H. and L. E. Moses, Elementary Decision Theory, John Wiley and Sons, Inc., 1959,
3. Commission on Mathematics, College Entrance Examination Board, Introductory Probability and Statistical Inference, An Experimental Course, Revised Preliminary Edition, 1959.
4. Cramer, H., The Ekments of Probability Theory and Some of its Applications, John Wiley and Sons, Inc., 1955.
5. Feller, W., An Introduction to Probability Theory and its Applications, 2nd edition, John Wiley and Sons, Inc., 1957.
6. Kemeny, J. G., H. MirMl, J. L. Snell, G. L. Thompson, Finite Mathematical Structures, PrenticeHall, Inc., 1959.
7. Munroe, M. E., Theory of Probability, McGrawHill Book Company, Inc., 1951.
8. Neyman, J., First Course in Probability and Statistics, Henry Holt and Company, Inc., 1950.
9. Parzen, E., Modern Probability Theory and its Applications, John Wiley and Sons, Inc., 1960.
10. Schlaifer, R,, Probability and Statistics for Business Derisions, McGrawHill Book Company, Inc., 1959.
11. Todhunter, L, A History of the Mathematical Theory of Probability, Chelsea Publishing Company, 1949.
12. Uspensky, J. V., Introduction to Mathematical Probability, McGrawHill Book Company, Inc., 1937,
991149
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Chapter 3 SOPHISTICATED COUNTING
1. Counting techniques and probability problems
Up to this point, we have illustrated our theory with examples that require only very direct and elementary counting procedures. We accomplished this by restricting ourselves either to experiments leading to sample spaces with a small number of elements (where we were able to count by direct enumeration) or, if the experiment had a large number of possible outcomes, to events whose elements were able to be counted by a direct application of the fundamental principle of counting. These restrictions were intentional, since our aim was to present the basic theory of probability unencumbered by difficulties due to incidental counting problems. But now we must face the fact that many interesting and important probability problems require more sophisticated counting techniques. We develop a few of these techniques in this section. Since we shall be using the fundamental principle of counting time and again, the reader may find it helpful to review the discussion on pp. 911.
The following counting problems are solved in this section. We suppose that a nonempty set A with n (distinct) elements is given.
Problem 1. For any positive integer r < n, find the number of ordered rtuples the objects of which are different elements of A. Each such ordered rtuple specifies an ordered arrangement or permu
132
991150
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / COUNTING TECHNIQUES; PROBABILITY PROBLEMS 133
tation of the r objects taken from the n elements of A. Because of this fact, the required number is denoted by P(n, r).
Problem 2. For any nonnegative integer r < n, find the number of subsets of A that have exactly r elements. (Such a subset will be called an rsubset of A.) The number of rsubsets of a set with n
elements is denoted by fn V Each rsubset specifies a selection without regard to order of r elements from the n elements in A.
Problem 3. We are given k numbered cells and k nonnegative integers HI, n^  • •, nk whose sum is n; i.e., for k > 1,
ni + n2 + • • • + nk = n.
Find the number of ways of putting the n elements of A into these k cells so that n\ elements are in the first cell, ni elements are in the second cell, —, n^ elements are in the kth cell. This number is denoted by
( n
\rii, tt2, •••,
A simple example will help to clarify these problems and the special notation we have introduced.
Example 1.1. Let A = {a, 6, c, d, e} be a set of n = 5 elements. We list the following ordered pairs (2tuples) formed by selecting two elements of A and paying attention to the order in which they are selected:
abacadaebcbdbecdcede ba ca da ea cb db eb dc ec ed.
These are the 20 ordered pairs or 20 permutations of two elements from the five elements in A. In symbols, F(5, 2) = 20. If a chairman and a secretary must be elected from among five men on a committee, there are P(5, 2) = 20 different possible results of the election. Note that we correctly count as different the results leading to chairman = a, secretary = b on the one hand and chairman = b, secretary = a on the other.
Now, however, suppose we want to elect a subcommittee of two men from the five committee members. The order in which the choices are made is now irrelevant; we care only which two men are elected. Thus we want the number of 2subsets of the set A. The
991151
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


134 SOPHISTICATED COUNTING / Chap. 3
pairs of elements in the ten possible 2subsets are enumerated in the first row of (1.1). Although ab and ba are different permutations, they determine the same 2subset of A since {a, 6} = {6, a}. We thus find
that there are ten 2subsets of A. In symbols, f ^ J = 10.
Finally, suppose four members of the committee are arranging rides to the funeral of the fifth member, say e. Three cars are available and can take 2, 1, and 1 passenger respectively. We list the possible assignments of men to cars:
car 1 ab ab ac ac ad ad be be bd bd cd cd
car 2 c d b d b c a d a c a b
car 3 d c d b c b d a c a b a
Thus we find that there are 12 ways of placing the four objects (committeemen a, 6, c, d) into three numbered cells (the three cars) so that two objects are in the first cell, 1 object in the second cell, and one object in the third cell. In symbols, we have computed
(9 , ) = 12. (We killed off committeeman e only to create a
counting problem with an answer small enough for us to easily list all the possible assignments to cars. Let the reader show that if all five committee members are being driven to a happy occasion, then there are 30 ways of assigning the men to three cars so that two ride in the first car, two in the second car, and one rides in the third car. In
symbols, ( = 30.)
\A A 1/
We turn now to derivations of general formulas that solve the three problems we have stated. But first a definition to simplify our notation,
Definition 1.1. If n is a positive integer, then the product of the integers from 1 to n is called "n factorial" and is denoted by nl. By special convention, we agree to put 0! = 1.
For example:
1! = 1 5! = 54321 = 120
2! = 2 • 1 = 2 6! = 6  51 = 720
3! = 3 * 2  1 = 6 7! = 7  6! = 5040
4! = 4  3 • 2 • 1 = 24 8! = 8 • 7! = 40,320
991152
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / COUNTING TECHNIQUES; PROBABILITY PROBLEMS 1 35
Note that in computing 6!, 7!, and 8!, we used the fact that (1.2) (n + 1)! = (n + l)(nl) for n = 0, 1, 2, • • •.
Observe also that when n = 0, Formula (1.2) reads 1! = (1)(0!) and hence is correct by virtue of our convention that 0! = 1. Although it may seem artificial now, we assure the reader that defining 0! in this way will turn out to be very convenient in the formulas that follow.
Theorem 1.1. With the notation introduced in the statements of Problems 13, we have
"The number of ordered rtuples or permutations of
(1.3)
(1.4)
(15)
P(n, r) =
(:)=
(nr)!
nl r!(nr)I
rc2,
r objects from a set of n ^objects.
"The number of rsubsets" (subsets with exactly r elements) of a set of n elements.
"The number of ways of" placing n distinct objects into k cells so that rii objects are in cell i for i = 1, 2, ,fc.
n.)
Proof. The fundamental principle of counting is the key tool in proving these formulas. To prove (1.3) note that to form an rtuple («i, a2, • • •, ar) from the n given objects we must choose a\ (task 1) from the n objects, then choose az (task 2) from the remaining n — 1 objects, and so on until we choose ar (task r) from the remaining n — r + I objects. Hence P(n, r), the number of ordered rtuples from a set of n objects, is equal to the number of ways of completing these r tasks in the stated order. By the fundamental principle we find
(1.6) P(n, r) = n(w  1) • • • (n  r + 1). Multiplying and dividing by (n — r)! we obtain the alternative form
n(n 1)  (n  r 4 l)(n  r)! (nr)!
from which (1.3) follows by observing that the numerator is indeed the product of all the positive integers from 1 to n. Note that when r = n, (1.6) becomes
(1.7) P(n, n) = n(n  1) • • • 1 = nl.
P(n,r) =
991153
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 36 SOPHISTICATED COUNTING / Chap. 3
Formula (1.3) also gives this result due to our convention that 0! = 1. Hence (1.3) is true for all positive integers r < n, as claimed.
To prove (1.4) we observe that there are as many ways of writing an ordered rtuple as there are ways of completing the following tasks in the stated order: (1) choose an rsubset from the given set of n elements and thus determine which r objects will be used to form the rtuple, and then (2) arrange these r objects in some order so that there is a first, a second, •  , an rth object specified. It follows that P(n, r), the number of ordered r~tuples, must be precisely the number of ways of completing these two tasks. The first task can be done in
f ] ways by definition of this symbol. The second task can be done
in P(r, r) = r! ways by virtue of the meaning of P(r, r) together with Formula (1.7). Hence
(1.8)
or, using (1.3),
P(n,:
r! r\(n — r)l
as claimed in (1.4). Note that this argument fails when r = 0, since P(n, 0) has not been defined. Put we can easily check that (1.4) is correct when r — 0. For a set with n elements has exactly one 0subset (the null set 0) and Formula (1.4) yields this answer, since when r = 0, (1.4) becomes
o = L
(By now the reader should be convinced that putting 0! = 1 is indeed sensible.)
Finally, to prove (1.5) we use (1.4) in conjunction with the fundamental principle of counting. To determine the n\ objects that go into the first cell we choose an nrsubset from the available n objects. We can therefore allocate HI objects to the first cell in
( ) ways. To determine the nz objects that are put in the second
\ni/
cell, we choose an n^subset from the remaining n — n± objects.
Hence we can put n2 objects into this second cell in ( l ) ways.
\ n2 /
Continuing in this way, we see that to determine the Uk objects that
991154
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / COUNTING TECHNIQUES; PROBABILITY PROBLEMS 1 37
go into the last cell we must choose an ^subset from the remaining n — (ni + ns+ ••• + nti) objects, and this can be done in
I \
n — HI — • • • — nki nk
ways. By the fundamental principle of counting, we conclude that
n
= (n \(n "" HI\ . . • (n
\nj\ n2 / \
Now we use (1.4) to simplify the product on the right. The product of the first two terms is
nl __ (n —
(n\fn ni) \
n!
The product of the first three factors is n\/n~ nA fn — HI —
re!_____________(n — ni — n^i I
nl
n — ni — n2 — When all k factors are multiplied, we similarly find that
/ n \ _ nl
\ni, n%, • • , nk) niln^l •  • nkl(n — HI — n^ — • • • — n^) !
But (ft — ni — n* — • • • — Uk)l = 0! = 1, and so the proof of (1.5) is complete.
Two special cases and an important alternative interpretation of Formula (1.5) are worth noting here. If we have n objects and n cells, then there are clearly just as many ways of putting one object in each cell as there are ordered ntuples of these n objects. Hence we expect that
and indeed, Formula (1.5) yields this expected result if we put k — n and HI = 7i2 = • • • = nn = 1.
991155
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 38 SOPHISTICATED COUNTING / Chap. 3
Also, if we put fc = 2 and n\ = r (and therefore n2 = n — r), then (1.5) becomes
r,nr
This equation merely expresses the fact that there are as many ways of placing n objects into two cells with r objects in one cell and (n — r) in the other cell as there are different rsubset s of a set with n elements. For in determining the r elements in the rsubset (cell 1), we automatically place the remaining n — r elements in the complement of the rsubset (cell 2).
A second interpretation of Formula (1.5) is introduced in the following examples.
Example 1.2. We know that there are P(5, 5) = 5! = 120 permutations of five distinct letters. But now suppose the five letters are two a's and three b's. How many different permutations are there now? We find that there are only ten:
aabbb ababb abbab abbba baabb babab babba bbaab bbaba bbbaa.
To see how to obtain the number 10 without explicitly enumerating the permutations, think not of the letters themselves but rather of the positions they occupy in the permutation. Counting from left to right, a permutation contains five positions and is uniquely determined as soon as we specify the two positions for the a's and the three positions for the b's. Thus there are just as many permutations as there are ways of putting the five positions into two cells, the first containing two positions for the a's and the second containing three
positions for the b's. But this number is what we denoted by (  ~ \ and by (1.5) is seen to be 10, as expected.
Example 1.3. To determine the number of distinguishable arrangements on one shelf of four different books, for each of which there are two copies, we note that each arrangement contains eight distinct positions and is determined as soon as we specify the two positionsfor the first book, the two positions for the second book, etc. The eight positions can be placed in four cells, each containing two positions, in
991156
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / COUNTING TECHNIQUES; PROBABILITY PROBLEMS 1 39
different ways by (1.5). Thus there are 2520 distinguishable arrangements of the eight books.
The arguments used in the preceding examples are readily generalized to prove the following theorem.
Theorem 1 .2. If we have n objects, n\ of which are of one kind, n2 of a second kind, • • • , nk of a fcth kind (where % + n2 + • • * + nk — n), then the number of distinguishable permutations of the n objects is given by
n \ = nl
, n2,  • •, nk/ "~ ni!n2! * • • nk\
Proof. We have only to observe that each permutation contains n positions and is uniquely determined as soon as we specify the ni positions for the objects of the first kind, the n2 positions for the objects of the second kind,   , the nk positions for the objects of the Mh kind. Thus there are as many permutations as there are ways of putting the n positions into k cells, the ith cell containing the n» positions for the objects of the ith kind for i = 1, 2, • • , k. But this number of ways is given in (1.5), and so the proof is complete.
We turn now to some examples illustrating the use of these counting techniques in computing probabilities.
Example 1.4. Find the probability when a bridge game is dealt that each player has exactly one ace.
There are as many ways of dealing four bridge hands as there are ways of placing 52 objects (the cards of the full deck) into four ceils (the North, East, South, and West hands) so that each cell contains 13 cards. Thus, by (1.5), there are
52 3, 13, 13, 13,
different deals in bridge. Let us choose as our sample space S a set with N elements, each denoting a different deal, and let us assign to each simple event of S the same probability 1/N. (N is actually equal to a number larger than 53 billion billion billion, but we do not need to know the precise value of N to complete this problem.) Now the event "each player has one ace" is the union of as many
\ _52L
iv / (13!)4
991157
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


140 SOPHISTICATED COUNTING / Chap. 3
simple events of S as there are different deals for wrhich North, East, South, and West each have one ace. If we call this number x, then the required probability is x/N by Theorem II.4.7. To determine x, note that there are as many deals for which each player has one ace as there are ways of completing the following tasks in the stated order: (1) deal the four aces, one to each player, (2) deal the remaining 48 cards, 12 to each player. By (1.5), task 1 can be done in
4 \ 4!
/
i i i i /  /i m ways
"3
and task 2 in
48!
/ 48 \ \12, 12, 12, 12/ ~~
(12 !)4
By the fundamental principle of counting, the number of deals for which each player has one ace is
4!48! x " (12!)4
and the required probability, say p, is given by
_£_ _ 4I48!(13!)4 P N (12!)452! '
To compute p} wre first use Table 21 to find the logarithm of p:
log p = log 4! + log 48! + 4(log 13!)  4(log 12!)  log 52! = 1.3802 + 61.0939 + 4(9.7943)  4(8.6803)  67.9067 = .9766 = 9.0234  10.
Now we use Table 22 to estimate the value of p. Since the mantissa .0234 is between .0000 and .0414, wre know that p is between .10 and .11. Thus, odds against the event "each player has one ace" are approximately 9 to 1.
Example 1.5. From five married couples, four people are selected. What is the probability that two men and two women are chosen?
There are f , J ways of selecting a subset of four people from all ten people. Since
/10\ _ 10!_ _ 10987 _ \ 4 / ~ 4!6! ~ 4321 ~
991158
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / COUNTING TECHNIQUES, PROBABILITY PROBLEMS 141
TABLE 21. COMMON LOGARITHMS OF FACTORIALS
n log nl n log nl n log nl
1 0.0000 26 26.6056 51 66 1906
2 0.3010 27 28.0370 52 67.U007
3 0.7782 28 29.4841 53 60 0:H)9
4 1.3S02 29 30.9465 54 71 3(383
5 2.0792 30 32.4237 55 73.1037
6 2.8573 31 33.9150 56 74 8519
7 3.7024 32 35.4202 57 70.6077
8 4.6055 33 36.9387 58 78 3712
9 5.5598 34 38.4702 59 80 1420
10 6.5598 35 40.0142 60 81.9202
11 7.6012 36 41.5705 61 83.7055
12 8.6803 37 43.1387 62 85.4979
13 9.7943 38 44.7185 63 87.2972
14 10.9404 39 46.3096 64 89.1034
15 12.1165 40 47.9117 65 90.9163
16 13.3206 41 49.5244 66 92.7359
17 14 5511 42 51.1477 67 94.5620
18 15.8063 43 52.7812 68 96.3945
19 17.0851 44 54.4246 69 98.2333
20 18.3861 45 56.0778 70 100.0784
21 19.7083 46 57.7406 71 101.9297
22 21.0508 47 59.4127 72 103.7870
23 22.4125 48 61.0939 73 105.6503
24 23.7927 49 62.7841 74 107 5196
25 25.1907 50 64.4831 75 109.3946
TABLE 22. COMMON LOGARITHMS*
6
0000 0414 0792 1139 1461 1761 2041 2304 2553 2788
3010 3222 3424 3617 3802 3979 4150 4314 4472 4624
4771 4914 5051 5185 5315 5441 5563 5682 5798 5911
6021 6128 6232 6335 6435 6532 6628 6721 6812 6902
6990 7076 7160 7243 7324 7404 7482 7559 7634 7709
7782 7853 7924 7993 8062 8129 8195 8261 8325 8388
8451 8513 8573 8633 8692 8751 8808 8865 8921 8976
9031 9085 9138 9191 9243 9294 9345 9395 9445 9494
9542 9590 9638 9685 9731 9777 9823 9868 9912 9956
* For example, log 1.2 = .0792, log .12 log .76 = 9.8808  10 = .1192.
: 9.0792  10  .9208,
991159
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


142 SOPHISTICATED COUNTING / Chap. 3
we take as sample space a set S with 210 elements and assign each simple event of S the same probability ^f^y. Now we can select two men from the five available men in
= 10 ways. Similarly, there are ten ways of selecting two
women. By the fundamental principle of counting, there are 10  10 = 100 ways of selecting two men and two women. Hence the required probability is %%% or ££.
Example 1 .6. A coin is tossed five independent times. What is the probability of the event E that we get exactly three heads?
We define Si = {H, T} as sample space for a single toss and, since we have no information about the coin, let us put
P({H}) = p, P({T}) = q = 1  p,
where p is some number between 0 and 1 inclusive. For the fivetoss experiment, we must define as sample space S the Cartesian product
S = S1XS1XS1XS1X Si.
Because of the assumed independence of the tosses, the probability of each simple event of S is determined by the product rule given in Section II.9. For example,
P({HHHTT}) = P({H})P({H})P({H})P({T})P({T}) = p'g* and similarly
P({HHTHT}) = p  p • q  p • q = p\\
In fact, any simple event whose sole element corresponds to an outcome resulting in three heads and two tails will have probability psg2. The number of such simple events is the same as the number of 5tuples containing exactly three H's and two T's. Such a 5tuple is uniquely determined as soon as we select the positions (i.e., numbers identifying which are the tosses) that resulted in heads. We can select
three positions from the available five in ( ~ J = 10 ways. Since the
event E is the union of these ten simple events, each with probability p3g2, we have
If the coin is fair, then
i p = q =  and P(E) = £§• = .31, approximately.
991160
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / COUNTING TECHNIQUES/ PROBABILITY PROBLEMS 143
But if the coin is biased so that, let us say, p = f and g = f, then
P(E) = •£& = .16, approximately.
This cointossing example is typical of an important class of problems that we will study in Chapter 5. Note especially that we have here an example in which (assuming p 9^ q) the simple events of S are not assigned the same probability.
Example 1.7. What is the probability that a poker hand will have one pair?
A poker hand is a 5subset of the set of 52 cards in the full deck and
so there are ( Jj different poker hands. If for the moment, we call
this number A", then our sample space S has N elements and each simple event of S is assigned probability I/AT. Using Formula (1.4) and some arithmetic, wre find that N = 2,598,960.
Now a poker hand with one pair has two cards of the same face value (i.e., two aces, two kings, etc.) and three cards whose face values are all different and different from that of the pair. We obtain a unique poker hand with one pair by completing the following tasks in order: (1) choose the face value for the pair from the 13
(13\ J = 13 ways. (2)
Choose two cards with the face value selected in (1). This can be = 6 ways. (3) Choose the three face values for the
other three cards in the hand. Since there are 12 face values from
(12\ J = 220 ways. (4) Choose
one card (from the four available) of each face value chosen in (3). This can be done in 43 = 64 ways. By the fundamental principle of counting, there are
13  6 • 220 • 64 = 1,098,240 poker hands with one pair. Hence the required probability is
1,098,240 .0 2,598,960 = 42'
As poker players know, and as this answer shows, it is not at all unusual to have a onepair hand. Only hands with no pan: at all have a higher probability of occurring. (See Problem 1.25.)
991161
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 44 SOPHISTICATED COUNTING / Chap. 3
Let us conclude by outlining the procedure we followed in answering the probability questions posed in the preceding problems.
(1) Define a sample space S for the experiment and assign probabilities to its simple events. This may involve counting the elements of S (as in Examples 1.4, 1.5, and 1.7) or it may require other noncounting considerations (as in Example 1.6).
(2) Determine P(E) by calculating the sum of the probabilities of the simple events whose union is the event E. In the problems under discussion here, this requires counting the number of elements of E. To do this, it is often helpful first to construct a sequence of tasks having the following property: each way of completing the tasks in the specified sequential order produces an experimental outcome corresponding to exactly one element of E and conversely, each element of E corresponds to an outcome produced by completing the tasks in exactly one way. (We did this in Example 1.4 where a bridge deal for which each player has one ace was produced by completing two tasks, and also in Example 1.7 where each poker hand with one pair was thought of as produced by completing four tasks in order.) The problem of counting the number of elements in E is thereby reduced to that of counting the number of ways of completing these tasks.
(3) Now use one or more of the formulas discussed in this section to count the number of ways of completing each task. Then invoke the fundamental principle of counting to determine the number of ways of completing all the tasks in the stated order. This number is the number of elements in E, and P(E) can thus be evaluated.
The problems that follow will help the reader develop his ability to count by means of the formulas presented in this section. He thus becomes able to find probabilities in a wide class of more complicated but more interesting experiments than heretofore considered.
PROBLEMS
1.1. Evaluate:
<« I «> (S)
1.2. How large must n be before nl exceeds (a) a thousand? (b) a million? (c) a billion? (d) a trillion? [Hint: Use Table 21.]
991162
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / COUNTING TECHNIQUES, PROBABILITY PROBLEMS 145
1.3, Compute to two decimal place accuracy (using logarithms) the value of 60\
12V1V/6V
(70) (70)
1.4. Let r be a positive integer. For any number x, let
Show that
(a) if x is an integer and x > r, then (x)T = P(%, T).
(b) (l)r=(l)'r!
(d) (J)P = (DM2*'
1.5. (a) One straight line is determined by two points in a plane. Three
lines are determined by three noncolinear points. Six lines are determined by four points, no three of which are colinear. How many lines are determined by n points, no three of which are colinear?
(b) A triangle has no diagonals; a quadrilateral has two diagonals; a pentagon has five diagonals. How many diagonals does a polygon of n sides have?
1.6. The 11 digits 1, 2, 2, 2, 3? 3, 3, 3? 4, 4, 4 are permuted in all distinguishable ways. How many permutations (a) begin with 22? (b) begin with 343?
1.7. Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a sixdigit number. If the numbers corresponding to all possible permutations are listed in order of increasing magnitude, which is the 417th?
1.8. The six digits 1, 1, 1, 2, 3, 3 are permuted and, as in the preceding problem, we list the corresponding sixdigit numbers in order of increasing magnitude.
(a) How many numbers start with the digit 2?
(b) How far down in the list is the number 321,311?
1.9. We have two each of n different objects, 2n objects altogether. How many distinguishable selections of four objects are there for which (a) all four objects are different? (b) two are alike and two different?
(c) two are alike and the other two are also alike? (d) The total number of distinguishable selections of four objects from these 2n ©bjects is equal to six times the number of 4subsets of the set of n different objects. Find the value of n.
991163
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


146 SOPHISTICATED COUNTING / Chap. 3
1.10. A group of ten boys and ten girls is divided into two groups of ten each* Find the probability that each group contains as many boys as girls.
1.11. A bookstore clerk has ten books, five each of two titles, to place on a bookshelf. If he places them at random so that all distinguishable arrangements are equally likely, what is the probability that (a) five copies of one title follow five copies of the other title on the shelf? (b) the two titles alternate on the shelf?
1.12. You need four eggs to make omelets for breakfast. You find a dozen eggs in the refrigerator but do not realize that two of these eggs are rotten. What is the probability that of the four eggs you choose (a) none are rotten? (b) exactly one is rotten? (c) exactly two are rotten?
1.13. IB the preceding problem, suppose you break the four eggs into a saucer and discover that you have chosen at least one rotten egg.
(a) What is the conditional probability that both rotten eggs are in the saucer?
(b) If you choose four other eggs from the remaining eight eggs, what is the conditional probability that they will all be good?
1.14. Refer to Example 1.6 of the text and find the probability that the coin falls heads exactly k times, where k = 0, 1, 2, 3, 4, 5.
1.15. Baseball team A plays team B ten times in a given month. Assume that team A is better than team B and has probability f of winning and probability f of losing each game. If the games are considered as ten independent trials, find the probability team A wins (a) exactly six games, (b) exactly seven games, (c) a majority of the games.
1.16. A pack of ten cards consists of three aces, two kings, two queens, and three jacks. We shuffle the deck and pick one card. Let this trial be performed eight Independent times. What is the probability that an ace is selected twice, a king three times, and a jack three times?
1.17. Find the probability that in eight independent rolls of a fair die, the numbers 1, 3, and 5 turn up two, three, and three times, respectively.
1.18. From a panel of 20 seniors, 15 juniors, ten sophomores, and five freshmen, a committee of five is selected at random. What is the probability that the committee consists of two seniors and one from each of the other classes?
1.19. In the preceding problem, suppose you know that the committee contains exactly one freshman. What is the conditional probability that there are also two seniors, one sophomore, and one junior on the committee?
991164
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / COUNTING TECHNIQUES; PROBABILITY PROBLEMS 147
1.20. There are ten defective and 60 good transistors in a lot from which you select a sample (without replacement) of 12. Calculate with two decimal place accuracy the probability that the sample contains (a) no defectives, (b) exactly one defective, (c) exactly two defectives, (d) exactly three defectives, (e) exactly four defectives, (f) exactly five defectives.
1.21. In the preceding problem, suppose the sample of 12 was selected with replacement. Find the probability that the sample contains three defectives and compare with the corresponding answer for sampling without replacement. Do the same for 0, 1, 2, 4, and 5 defectives.
1.22. We scramble the letters of the word "Muhammadan" and then arrange them in some order.
(a) What is the probability that the three a's will be consecutive letters?
(b) What is the probability that the three a7s will be consecutive and the three m's will also be consecutive?
(c) What is the probability that no three consecutive letters are alike?
1.23. The 11 letters of the word "Mississippi" are scrambled and then arranged in some order.
(a) What is the probability that the four i's are consecutive letters in the resulting arrangement?
(b) What is the conditional probability that the four i's are consecutive, given that the arrangement starts with "M" and ends with "s"?
(c) What is the conditional probability that the four i's are consecutive, given that the arrangement ends with four consecutive esses?
1.24. A poker player holds a pair of aces and a king, queen, and jack. He discards three cards, holding his pair, and draws three more cards from the deck of 47 cards. What is the probability that his hand contains (a) three aces after the draw? (b) two pairs, aces high, after the draw? (Note: When a hand is spoken of as containing a pair of aces, three aces, etc., we will mean that it contains no higher count. Thus, to say a hand contains three aces means that it contains exactly three aces and two different cards. To say a hand contains two pairs, aces high, means that it contains a pair of aces, a different pair, and a fifth card of a still different face value.)
1.25. In Example 1.7 we found the probability of a onepair poker hand. Now find the probability of the following poker hands:
(a) no pair (five different face values, not in sequence, not same suit.)
991165
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


148 SOPHISTICATED COUNTING / Chap. 3
(b) two pairs (one pair of each of two different face values plus a card of a third face value.)
(c) three of a kind (exactly three cards of one face value plus two different cards.)
(d) straight (five cards in sequence, but not aH of the same suit.)
(e) flush (five cards of the same suit but not in sequence.)
(f) full house (three cards of one face value, and two cards of another face value.)
(g) four of a kind (four cards of one face value). See Example 1.2.3. (h) straight flush (five cards in sequence and of the same suit.)
1.26. In sevencard stud poker, your first three cards are of the same suit. What is the probability that you will find at least two more cards of the same suit among the other four cards in your hand?
1.27. Let North be dealt 13 cards from a bridge deck and suppose Si is the sample space corresponding to this deal (trial).
(a) Show that dealing hands to all four players in a bridge game can be considered as a fourtrial experiment in which the trials are identical but not independent. What is the sample space S for the fourtrial experiment?
(b) Calculate the probability of the event E that North has exactly one ace, considering E as a subset of S.
(c) Calculate P(E), but now considering E as a subset of Si.
(d) Give a precisely worded and complete explanation of why your answers in (b) and (c) are the same. (Hint: Refer to Section II.9.)
1.28. In a bridge game, what is the probability that you and your partner together have exactly k aces, where k = 0, 1, 2, 3, 4?
1.29. When a bridge hand is dealt, what is the probability of (a) a 5431 distribution, Le., of a hand containing five cards of one suit, four of another, etc.? (b) a 4432 distribution? (c) a 4333 distribution?
1.30. You and your partner in bridge are declarers and hold nine spades, including the ace and king. The defenders hold four spades, including the queen. What is the probability that the distribution of the four spades in the opposing hands is (a) four in one hand, none in the other? (b) three in one hand, one in the other? (c) two in one hand, two in the other?
1.31. Continuing the preceding problem, you know that the queen will fall when you lead the ace and king if the four spades are divided equally between the opposing hands or if the queen is the only spade in one of them. Show that odds for the queen's falling on the lead of the ace and king are approximately 1.13 to 1.
991166
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / BINOMIAL COEFFICIENTS 149
2. Binomial coefficients
The numbers f 1 introduced in the preceding section have many
interesting and important properties that we will need to know for our later work. Although it is a slight digression at this point, we pause here to develop some of these properties.
Our first task is to explain the title of this section. The reader is familiar with the formulas
(2.1) (x + yY = z2 + %xy + y*
(2.2) (x + 2/)3 = x3 + 3o%
(2.3) (x + yY = x* + 4x*y + $xY + 4z*/3 + I/4.
For any positive integer n, (x + y)n is the product of n equal factors:
(2.4) (x + y)n = (x + y)(x + y)    (x + y).
In seeking a general formula for (x + y)n, it is helpful first to identify the coefficients in the special cases (2.1)(2.3). Indeed, the reader should check that these can be written as follows, thus suggesting our first theorem:
(2) «•,!„
(x + yY  (Q) * + (J) *y + (2) «V + (3) ^ + (J)
= 2
r=0
Theorem 2.1 . (The binomial theorem.) If n is any positive integer and x and y are any numbers, then
or, more concisely, (2.5) (*
991167
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 50 SOPHISTICATED COUNTING / Chap. 3
Proof. To compute (x + y)n, we choose either the letter x or the letter y from each of the n factors hi (2.4) and multiply these n choices. If we do this for all possible choices of x's and ?/'s and add the results, we obtain (x + y)n. For example, we get the product xn by choosing x from each factor, we get the product xn~ly whenever we choose x from all but one factor, we get xn~zy* whenever we choose x from all but two factors, etc.
For a given integer r (0 < r < n), the product xn"~Tyr is obtained whenever we choose exactly r y's (and therefore n — r x's.) To determine our choice uniquely, we have only to decide from which r
fn\ of the n factors we select y's. Hence there are ( J choices, each lead
ing to the product xn~ryr and so the term ( J xn^yr appears in the
expansion of (x + y)n. Since r is any integer from 0 to n inclusive, the theorem is proved.
Because the numbers ( J appear as coefficients in the expansion of a power of a binomial, they are called binomial coefficients.
Example 2.1 . To expand (1 + f)n we put x — 1, y = t in (2.5) and
find
If we now put t = 1, there follows the interesting identity
Since f J is the number of rsubsets of a set with n elements, we
see that the righthand side of (2.7) is the number of 0subsets (which is 1, there being only one null set) plus the number of 1subsets plus the number of 2subsets, etc. This sum is therefore the total number of subsets and so (2.7) supplies another proof of Theorem 1.2.1, which says that a set with n elements has 2n subsets.
Example 2.2. The binomial theorem can be used to compute an approximate value of (.99)6. For with x = 1 and y = — .01 in (2.5) or equivalently, with t = .01 in (2.6), we obtain
991168
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / BINOMIAL COEFFICIENTS (.99)6 = (1  Ol)6
1 51
TABLE 23
0 1 2 3 4 5 6
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
= 1  .06 + 0015  .00002 + • • • + .000000000001 = .941, to three decimal places.
A convenient device for calculating and displaying the binomial coefficients is known as Pascal's triangle.* The first few rows are illustrated in Table 23. The row for n = 0 lists the one coefficient in the expansion of (x + t/)°, the row for n = 1 lists the two coefficients in the expansion of (x + y)1, the row for n = 2 lists the three coefficients in the expansion of (x + y)2, and so on. Since every set with n elements has exactly one null subset (0) and exactly one wsubset (namely, itself), we find Ts under the column headed r — 0 and also along the hypotenuse of the triangular array.
We observe that there is a simple relation among numbers in the triangle. For if we start at any number not on the hypotenuse and move to the right one number and then drop down to the row below we note that the sum of the first two numbers is precisely the number in the row below. For example, starting at the left in the row for n = 4 we obtain the numbers in the row for n — 5 as follows:
1 + 4 = 5, 4 + 6 = 10, 6 + 4 = 10, 4 + 1 = 5.
The following result proves that our observation is generally true and can thus be used to extend the table one row at a time and
thereby to compute binomial coefficients ( J for larger and larger values of n.
Theorem 2.2. For any positive integers r and n with r < n,
* Pascal (16231662) was far from the first to form and study this triangular array of numbers, but somehow his name has become attached to it. For interesting historical notes, see C. B. Boyer, "Cardan and the Pascal Triangle," American Mathematical Monthly, vol. 57 (1950), 387390.
991169
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 52 SOPHISTICATED COUNTING / Chap. 3
Proof. Although it is possible to give a direct algebraic proof bywriting the binomial coefficients in terms of factorials (see Problem
2.5e), we prefer the following argument. The I J rsubsets of a set
with n elements can be divided into those that include a given element, and those that do not. The number of rsubsets including the
given element is f _ j, since fixing one element leaves us free to select r — 1 others from the remaining n — 1. The number of rsubsets that do not include the given element is ( J, since we are
now choosing r from n — 1 elements. Since we have now accounted for all rsubsets, formula (2.8) is proved.
Two other properties of the binomial coefficients are apparent from the Pascal triangle. The symmetry in each row of the triangle is due to the identity
™ (:)(„,)
which expresses the fact that every selection of an rsubset is automatically a selection of its complementary (n — r) subset, and vice versa. It follows that
(n\ o
and so in each row of the triangle, the first and last numbers are equal, as are the second and next to last, and so on.
We also observe that the binomial coefficients in any row increase as we move to the right and then eventually start to decrease. We leave for the problems the proof that this is generally true, and instead derive another important identity involving binomial coefficients.
As we shall see in the example that follows, it is convenient to
extend the definition of the binomial coefficient ( ] to values of n
\r/
and r when the symbol no longer has any combinatorial meaning. Indeed, it is possible and useful to have f J defined even when n i
is
991170
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / BINOMIAL COEFFICIENTS 1 53
not an integer. (See Problem 2.9.) But here we make only the definition that for any positive integer n and r also an integer,
(2.10) rj = 0 if either r > n or r < 0.
Since a set with n elements has no subsets with more than n elements and also does not have any subsets with a negative number of elements, Definition (2.10) is quite reasonable.
Example 2.3. We are given n defective and m acceptable items, n + m items altogether, from a production line. They are mixed up and we are to draw a subset of r items from the lot. We count the number of possible rsubsets in two different ways, and thus derive
a useful identity. First, there are clearly ( J rsubsets that can
be drawn from the n + m items. To count again, note that the rsubsets can be classified according to the number of defective items they contain. We can select k defectives (and therefore r — k acceptable items) in
( 7 ) ( __ T ) ways.
If we let k vary over all possible values and add these numbers, then we obtain all the rsubsets. Hence
or
A 6) (r *)
Note that some of the terms in the sum may be zero, since we cannot have more defective and acceptable items in the rsubset than there are defective and acceptable items in the whole lot. For example, if the number of acceptable items m happens to be equal to r — 2, then
( J and ( _i ) both are zero according to (2.10). The value of
the definition made in (2.10) lies in the fact that it allows us to write the sum in (2.11) without worrying about terms that should be omitted; instead of omitting them we made sure they would be zero.
991171
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 54 SOPHISTICATED COUNTING / Chap. 3
Identity (2.11) will be used in Chapter 5 in connection with the socalled hypergeometrie distribution in probability.
The method we used to prove Theorem 2.1 can be extended to prove the following result.
Theorem 2.3. (The multinomial theorem.) Let n be any positive integer and x\, x%, • • • , Xk any k numbers. Then
xfxf
(2.12) (xl + xz + • • • + XkY = S ( ^ ^ _ m \
where the sum is taken over all nonnegative integers nx, n», • • •, n* such that Tii + fl2 +  • + nk = n.
Proof. We again have n factors to multiply, but now each is the multinomial (xi + x% + • • • + Xk) instead of the binomial (x + y) in (2.4). From each factor we must choose £1, or ar2, • * •, or Xk, multiply our n choices and add these products for all possible choices. For example, we get the product Xi by choosing x\ from each factor, we get the product xl~~2x*Xk by choosing x\ from n — 2 of the factors, x* from one factor and Xk from another factor, etc.
For given nonnegative integers ni, n%, * • •, nk (whose sum is n) we get the product xfxf  •  xlk whenever we choose exactly
Til Xi S, 7^2 #28, * • *, nk Xk S
from the n available factors. There are as many ways of making such a choice as there are ways of placing the n factors into k cells, the first cell containing the ni factors from which we choose Xi, the second cell containing the n*> factors from which we choose x% etc. By (1.5) there are therefore
(2.13)
choices, each leading to the product xfxf •  • £t*. Thus we have the multinomial theorem.
The numbers (2.13) are called multinomial coefficients. Since, as we noted in (1.10), a binomial coefficient is a special case of a multinomial coefficient, we see that the multinomial theorem becomes the binomial theorem if we put k = 2.
Example 2.4. To expand (p + q + r)s we write
991172
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / BINOMIAL COEFFICIENTS 1 55
(p + q + r)3 = S [ ) p™q™rn*,
F * \ni,n2,nzJF * 9
where the sum is taken over all nonnegative integers HI, n%, n% such that HI + n» + n3 = 3. Hence
= p* + 0, q > 0, r > 0, and p + q + r = 1. Now perform this trial three independent times, for instance, by choosing a random sample of three people with replacement from the population and asking each the question. Then the terms in the expansion of (p + q + r)* give the probabilities of all the possible combinations of answers. For example, the probability that all three people answer "yes" is p3, the probability that exactly one person answers "yes" and two people answer "don't know" is 3pr2, etc.
PROBLEMS
2.1. Expand by the binomial theorem.
(a) (p + q)* (b) (1  z)4 (c) (a  36)4
2.2. What is the coefficient of a563 in the expansion of (2a — 6)8?
2.3. By means of the binomial theorem, evaluate to three decimal places, (a) (1.01)7 (b) (1.02)10 (c) (.98)*
991173
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 56 SOPHISTICATED COUNTING / Chap. 3
2.4. Identify each expansion and thus evaluate each sum without computing individual terms in the sum.
(a) 2 (4}vr& (b) 2
2.5. Write the binomial coefficients in terms of factorials and thus prove the following identities.
2.6. Use the law of formation (2.8) to extend the Pascal triangle in Table 23 to n = 10.
2.7. By writing the binomial coefficients in terms of factorials, derive the recursion formula
This formula enables us to compute the numbers in any row of the Pascal triangle one by one, starting from f j = 1. Compute the binomial coefficients for n = 10 this way.
2.8. Consider the binomial coefficients ( n j with n fixed and r = 0, 1, 2,
• • • , n. With this order, show that ( J is greater than its predecessor
if r < f (n + 1) and is smaller if r > J(n + 1). Show also that if n is an even integer, then there is one largest binomial coefficient, but if n is odd, then there are two equal binomial coefficients that are larger
than all the others. (Hint: Consider the ratio of ( n } to ( n „ ) and
\r/ \riy
determine when this ratio is greater than, less than, or equal to 1.)
991174
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


SUPPLEMENTARY READING 1 57
2.9. Using the notation introduced in Problem 1.4, define the generalized
binomial coefficient f J for any number x and any positive integer r
by the equation
rl
(a) Show that this reduces to the familiar definition if z Is a positive integer. (Consider the case x < r as well as x >r.)
(b) Show that (~~^\ = (~1)»2~2« (2n\ 2.10. Prove the following identities:
2.11* In the expansion of (p + g + r + s)10, compute the coefficient of (a) plQ (b) p*tf> (c) psg2rs4.
2.12. Expand (p + q + r)4 by the multinomial theorem and give a probability interpretation to each term in the expansion assuming p, qt and r are nonnegative numbers with sum 1.
SUPPLEMENTARY READING
In addition to the references listed at the end of Chapter 2 the following may also be consulted.
1. Borel, E. and A. Cheron, TMorie Mathematique du Bridge d la Porlee de Tons, GauthierVillars, Paris, 1940.
2. Jacoby, 0., How to Figure the Odds, Doubleday and Company, Inc., 1950.
3. Levinson, H. C., The Science of Chance, BInehartand Company, 1950.
4. Riordan, J., An Introduction to Combinatorial Analysis, John Wiley and Sons, Inc., 1958.
5. Whitworth, W. A., Choice and Chance, Hafner Publishing Co., 1959 (reprint of 5th edition, 1901).
6. Whitworth, W. A., DCC Exercises, G. E. Stechert and Co., 1945.
991175
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Chapter 4 RANDOM VARIABLES
1. Random variables and probability functions
When we perform an experiment, we are often interested not in the particular outcome that occurs, but rather in some number associated with that outcome. For example, in the game of "craps" a player is interested not in the particular numbers on the two dice, but in their sum; in bridge, one often concentrates on the number of honor points in the hand rather than on the hand itself; in selecting a random sample of students from a certain college, we may want to compute the proportion of freshmen in the sample; in tossing a coin 50 times, we may be interested only in the number of heads obtained, and not in the particular sequence of heads and tails that constitutes the result of the 50 tosses; etc.
In all these examples, we have a rule which assigns to each outcome of the experiment a single real number. The mathematician says that a function is thereby defined. Indeed, a function is specified whenever we are given a set of elements (the domain of the function), together with a rule by which one and only one number is associated with each element of the domain. The number which the rule assigns to an element of the domain is called the value of the function for (or at) that element. The set of all values of a function is called the range of the function.
The reader is already familiar with the function concept. For ex
158
991176
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / RANDOM VARIABLES; PROBABILITY FUNCTIONS 1 59
ample, the equation y = x2 determines a function whose domain is the set of all real numbers and whose range is the set of nonnegative real numbers; i.e., to each real number x is associated the (necessarily nonnegative) real number y = #2. As another familiar example, think of the function which assigns to each circle the number which is the circle's circumference. The domain of this function is the set of all circles. For any element (circle) in the domain, the value of this function is the number 2xr, where r is the radius of the circle. For yet another example, consider the function whose domain is the set of all people to whom the federal tax laws apply, and which assigns to each such person the number which is his taxable income for a given year.
In probability theory, certain functions of special interest are given special names.
Definition 1.1. A function whose domain is a sample space and whose range is some set of real numbers is called a random variable. If the random variable is denoted by X and has the sample space S = {01, 02, * • *, On} as domain, then we write X(ok) for the value of X at the element o^. Thus X(ok) is the real number that the function rule assigns to the element ok of S.
The reader may find it helpful to think of a function in terms of a machine. In Figure 17, we picture a machine for the random variable X. The , A
., , .  ,t . ,. ,. Input oki'S
possible inputs of the functionmachine
are the elements of the sample space S. Each such element o& is "processed" by the machine, and what emerges is the output number X(ok). The set of all possible input elements is the sample space "~^sl output X(o } S, the domain of the function. The set of different output numbers is the range lgure of the random variable X.
Let us now look at some examples of random variables.
Example 1.1. Let S = {1, 2, 3, 4, 5, 6} and define X as follows: Z(l) = X(2) = Z(3) = +1, Z(4) = JC(5) = Z(6) = 1.
Then X is a random variable whose domain is the sample space S and whose range is the set {1, — 1}. X can be interpreted as the gain of a
991177
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


160 RANDOM VARIABLES / Chap. 4
player in a game in which a die is rolled, the player winning $1 if the outcome is 1, 2, or 3 and losing $1 if the outcome is 4, 5, or 6.
Example 1.2. Two dice are rolled and we define the familiar sample space
S={(1, !),(!, 2),..,(6,6)}
containing 36 elements. Let X denote the random variable whose value for any element of S is the sum of the numbers on the two dice. Then the range of X is the set containing the 11 values of X:
2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12.
Each ordered pair of S has associated with it exactly one element of the range, as required by Definition 1.1. But, in general, the same value of X arises from many different outcomes. For example, X(ok) = 5 if Ok is any one of the four elements of the event
{(1,4), (2, 3), (3, 2), (4,1)}.
A given input element in Figure 17 always leads to exactly one output number, but the same output number may be obtained from more than one input element.
Example 1.3. A coin is tossed, and then tossed again. We define the sample space
S = {HH, HT, TH, TT}.
If X is the random variable whose value for any element of S is the number of heads obtained, then
Z(HH) = 2, Z(HT) = Z(TH) = 1, Z(TT) = 0.
More than one random variable can be defined on the same sample space. For example, let Y denote the random variable whose value for any element of S is the number of heads minus the number of tails. Then
7(HH) = 2, F(HT) = 7(TH) = 0, 7(TT) = 2.
Naming a numericalvalued function defined on a sample space a random variable is singularly inappropriate. Like the alligator pear that is neither an alligator nor a pear and the biologist's white ant that is neither white nor an ant, the probabilist's random variable is neither random nor a variable. But this terminology has become standard by now, and we shall continue to use it, trusting that the
991178
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / RANDOM VARIABLES; PROBABILITY FUNCTIONS 161
reader will constantly keep In mind its true meaning, as given in Definition 1.1. Suppose now that a sample space
S = {0i, 02, •• •> on}
is given, and that some acceptable assignment of probabilities has been made to the simple events of S. Then if X is a random variable defined on S, we can ask for the probability that the value of X is some number, say z. The event that X has the value x is the subset of S containing those elements 0% for which X(ok) = x. If we denote by f(x) the probability of this event, then
(1.1) f(x)=P({ok*S\X(ok) =x}). Because this notation is cumbersome, we shall write
(1.2) f(x) = P(X = 3),
adopting the shorthand "X = x" to denote the event written out in
(i.i).
Definition 1.2. The function / whose value for each real number x is given by (1.2), or equivalently by (1.1), is called the probability function of the random variable X.
In other words, the probability function of X has the set of all real numbers as its domain, and the function assigns to each real number x the probability that X has the value x.
Example 1.4. Continuing Example 1.1, if the die is fair, then
/(I) = P(X = 1) = i, /(I) = P(X = 1) = i and /(#) = 0 if x is different from 1 or —1.
Example 1.5. If both dice of Example 1.2 are fair and the rolls are independent, so that each simple event of S has probability ^, then we compute the value of the probability function at x = 5 as follows:
/(5) = P(Z = 5) = P({(1, 4), (2, 3), (3, 2), (4,1)}) = &
This is the probability that the sum of the numbers on the dice is 5. We can compute the probabilities/(2), /(3),   , /(12) in an analogous manner. These values are summarised in the following probability table:
991179
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


162
RANDOM VARIABLES / Chap. 4
X 2 3 4 5 6 7 8 9 10 11 12
/(*) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Let us agree, as here, to include in such probability tables only those numbers x for which f(x) > 0. Since we include all such numbers, the probabilities f(x) in the table add to 1. From the probability table of a random variable X, we can tell at a glance not only the various values of X, but also the probability with which each value occurs. This information can also be presented graphically, as in.
6/36 * t
5/36 «• 4 i
4/36 , I
3/36 < <
2/36 
1/36 ' 1 1 ,
0 123456789 101112 x
Figure 18
Figure 18, where the probability chart of the random variable X is drawn. In the probability chart, the various values of X are indicated on the horizontal xaxis, and the length of the vertical line drawn from the #axis to the point with coordinates (x, /(x)) is the probability of the event that X has the value x.
We are often interested not in the probability that the value of a random variable X is a particular number, but rather in the probability that X has a value less than or equal to some number. For example, if you have ten items in inventory, you may want to know the probability that your inventory stocks will be sufficient to fill incoming orders. If X is the total number of items ordered, you are therefore interested in the probability of the event X < 10. As another example, if X denotes the percentage of votes cast in opposition
991180
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / RANDOM VARIABLES/ PROBABILITY FUNCTIONS 163
to a school bond levy, then the school board is interested in the probability that the levy is approved, i.e., that X will be less than or ^equal to some critical number that separates victory from defeat. In general, if X is defined on the sample space S, then the event that X is less than or equal to some number, say x, is the subset of S containing those elements Ok for which X(ok) < x. If we denote by F(x) the probability of this event (assuming an acceptable assignment of probabilities has been made to the simple events of S), then
{1.3) F(x) =P({okeS\X(ok) <*}).
In analogy with our agreement in (1.2), we adopt the shorthand 4iX < x" to denote the event written out in (1.3), and we then can write
(1.4) F(x) = P(X < x).
Definition 1.3. The function F whose value for each real number x is given by (1.4), or equivalently by (1.3), is called the distribution Junction of the random variable X.
In other words, the distribution function of X has the set of all real numbers as its domain, and the function assigns to each real number x the probability that X has a value less than or equal to {i.e., at most) the number x.
As our next example illustrates, it is an easy matter to calculate the values of F, the distribution function of a random variable X, when one knows /, the probability function of X. The distribution function can be presented in graphical or tabular form, as we also show.
Example 1.6. Let us continue with the dice experiment of Example 1.5. The event symbolized by X < 1 is the null event of the sample space S, since the sum of the numbers on the dice cannot be at most 1. Hence
F(l) = P(X < 1) « 0.
The event X < 2 is the subset {(1, 1)}, which is the same as the went X = 2. Thus,
The event X < 3 is the subset {(1, 1), (1, 2), (2, 1)}, which is seen to be the union of the events X = 2 and X = 3. Hence,
991181
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


RANDOM VARIABLES / Chap. 4 = P(Z < 3) = P(X = 2) + P(X = 3)
164
_ 1 I 2 _ 3 — TS" ^ T5" ~ ¥¥•
Similarly, the event X < 4 is the union of the events X = 2, X = 3, and X = 4, so that
= P(Z < 4) = P(X = 2) + P(X = 3) + P(Z = 4)
Continuing in this way, we obtain the entries in the following distribution table for the random variable X:
(1.5)
X 2 3 4 5 6 7 8 9 10 11 12
F(x} 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36
But remember that the domain of the distribution function F is the set of all real numbers. Hence, we must find the value F(x) for all numbers x, not just those hi the distribution table. For example, to find jF(2.6) we note that the event X < 2.6 is the subset {(1, 1)}, since the sum of the numbers on the dice is less than or equal to 2.6 if and only if the sum is exactly 2. Therefore,
F(2.6) = P(X < 2.6) = A.
In fact, F(x) = £$ for all x in the interval 2 < x < 3, since for any such x the event X < x is the same subset, namely {(1,1)}. Note that this interval contains x = 2, since F(2) = ^g, but does not contain x = 3, since F(S) = ^. Thus, F(x) = jfc for x = 2.999* • , no matter how many nines we write down, but at x = 3, the value of F jumps to F(3) = •/$. Similarly, we find F(x) = ^ for all x in the interval 3 < x < 4, but a jump occurs at x = 4, since F(4) = ^; then F(x) = fa for all x in the interval 4 < x < 5, but a jump occurs at x = 5, since F(5) = ^; etc.
These facts are shown on the graph of the distribution function in Figure 19. The graph consists entirely of horizontal line segments. (A function having such a graph is appropriately called a step function.) We use a heavy dot in Figure 19 to indicate which of the two horizontal segments should be read at each jump (step) in the graph. Note that the magnitude of the jump at x = 2 is/(2) = £$, the jump
991182
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / RANDOM VARIABLES/ PROBABILITY FUNCTIONS
165
l«36/36 30/36 24/36 18/36 12/36 6/36
O 123456789 10 1112 *
Figure 19
at x = 3 is /(3) = ^, the jump at x = 4 is /(4) = &, etc. Finally, since the sum of the numbers on the dice is never less than 2 and always at most 12, we have F(x) = 0 if x < 2 and F(x) = 1 if x> 12.
If one knows the height of the graph of F at all points where jumps occur, then the entire graph of F is easily drawn. It is for this reason that we shall, as in (1.5), always list in the distribution table only those revalues at which jumps of F occur.
If we are given the graph of the distribution function F of a random variable JT, then reading its height at any number x, we find F(x), the probability that the value of X is less than or equal to x. Also, we can determine the places where jumps in the graph occur, as well as the magnitude of each jump, and so we can construct the probability function of X. Thus, we can obtain the probability function from the distribution function, or vice versa.
We have made our observations up to this point on the basis of some special examples, especially the twodice example. We now turn to some general statements that apply to all probability and distribution functions of random variables defined on finite sample spaces.
Theorem 1.1. If a finite sample space S is given, if an acceptable assignment of probabilities is made to its simple events, and if a random variable X is defined with domain 8, then /, the probability function of X, has the following properties:
991183
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


166 RANDOM VARIABLES / Chap. 4
(i) f(x) > 0 for all x, but there are at most a finite number, say N, of xvalues for which /(#) > 0.
(ii) If £1, £3, • • •, XN are aZZ the ^values for which f(x) is positive, i.e.,
(1.6) /(%) > 0 for k = 1, 2, .. •, tf,
then
(1.7)
S /(a*) = 1.
X Xi 2"2 ... XN
m fM /fe) ... fM
We leave the proof of this theorem for the problems. The probability table of the random variable X thus has the following form:
(1.8)
Under these circumstances, it is customary to say that X is a random variable whose possible values are #1, x% • • •, xNj and that the value xk occurs with probability/(as*) for k = 1, 2, • • , JV. This language, which we shall use from now on, should bring to the reader's mind the probability table in (1.8), whose entries satisfy (1.6) and (1.7).
Theorem 1.2. With the hypotheses of Theorem 1.1, the distribution function F of the random variable X has the following properties:
(i) There is a number, say m, such that F(x) = 0 if x M.
(ii) F is a nondecreasing function; i.e., F(x) > F(y) if x > y.
(iii) F is a step function with & finite number of jumps or steps; i.e., the graph of F is made up of a finite number of horizontal line segments. The value of F at each step is given by the height of the higher of the two Une segments forming that step. These steps occur at xi9 x%, • • , XN, and the magnitude of the jump at Xk is f(xk) for i = l,2, ..,#.
Proof. To prove (i), we let m be the smallest and M the largest of the possible values Xi, x%, •••, x# of the random variable X. Then if x < m} the event X < x is the null set 0, and so
F(x) = P(X
991184
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sect / RANDOM VARIABLES; PROBABILITY FUNCTIONS 167
On the other hapd, if x > M, then the event X < x Is the entire sample space S, and so
F(x) = P(X < x) = P(S) = 1.
To prove (ii), note that if x > y, then the event X < y is a subset of the event X < x. Hence, by Theorem II.4.2, we have
P(A^ < y} < P(X < x),
which was to be proved.
To prove (m), suppose x3 and Xb are neighboring xvalues, with Xj < xh. Then the event X < x is the same for all x In the interval Xj < x < Xk. Hence, F(x) Is the same number for all such x, and the graph of F Is a horizontal line segment in this interval. At the righthand endpoint of the interval, we find
P(X < Xk) = P(X < Xk) + P(X = x,), so that the jump at x = Xk is
P(X < xk)  P(X < xk) = P(X = x,) = /(x,), as claimed. ,^'*
In Theorems 1.1 and 1.2, we stated our hypotheses very carefully in order to make clear that one must have a sample space, an acceptable assignment of probabilities to its simple events, and a random variable defined on the sample space, before talking about the probability function or the distribution function of the random variable. Nevertheless, in the probability literature (and starting in the next section in this book) one often sees definitions and theorems that begin with the words, "Let X be a random variable with probability function /," or "Let X be a random variable whose possible values Xi, x2, • • *, Xjv occur with probabilities /(xO, /(x2), • • •, /(£#)» respectively," no mention being made of a sample space or assignment of probabilities to simple events. To understand why this is an acceptable state of affairs, we must first realize that the converse of Theorem 1.1 is true.
Theorem 1,3, Let a function / with properties (I) and (ii) in Theorem 1.1 be given. Then there is a finite sample space S, an acceptable assignment of probabilities to the simple events of S, and a random variable X whose domain is S, such that / is the probability function of X.
Proof. Define S = {xi, x2, •••,XJNT} and let P({x&}) = f(xk) for
991185
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


168 RANDOM VARIABLES / Chap. 4
i = 1, 2, •••,#. This is an acceptable assignment of probabilities, because we have assumed that (1.6) and (1.7) hold. Now define the random variable X as the identity function on $; i.e., to each element xjctSwe assign the number X(xk) = Xk. Then
P(X = xk] = P({xk}) = /(%) for k = 1, 2,   , N and
P(X = x) = P(0) = 0
if x is not one of the numbers x\, x%, • • •, Xjv. Hence / is the probability function of X, and the theorem is proved.
We also must realize that infinitely many different random variables can have the same probability function. (See Problem 1.13.) This possibility leads to the following oftused terminology.
Definition 1.4. Two or more random variables are said to be identically distributed if and only if they have equal (i.e., identical) probability functions (and hence identical distribution functions).
Thus, whenever we want to make definitions or prove theorems that depend only on the probability function of a random variable, we are actually making definitions and proving theorems for any one of an infinite set of identically distributed random variables. The random variables in this set are all different, but this doesn't concern us if we are interested only in their probability functions. Hence, we do not mention the different sample spaces on which they are defined, knowing by virtue of Theorem 1.3 that their common probability function does indeed determine a sample space and an acceptable assignment of probabilities to its simple events. Moreover, the random variable defined on this sample space, as determined in the proof of Theorem 1.3, can serve as the prototype of all identically distributed random variables with the given probability function. This rather lengthy argument serves to explain why, in the following sections, there is so little mention of the underlying sample spaces and probability assignments to simple events, when we talk of random variables and their probability and distribution functions.
PROBLEMS
1.1. An experiment consists of three independent tosses of a fair coin. Let X be the random variable whose value for any outcome is the number of heads obtained.
991186
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec.1 / RANDOM VARIABLES, PROBABILITY FUNCTIONS 169
(a) Find the probability function of X, and construct a probability table and a probability chart.
(b) Find the distribution function of X and draw its graph.
1.2. Repeat the preceding problem, but now (a) let X be the random variable whose value for any outcome is the number of heads minus the number of tails, (b) let X denote the gain of a player who wins $2 if the first head occurs at the first toss, wins SI if the first head occurs at the second toss, loses $1 if the first head occurs at the third toss, and loses $2 if all three tosses are tails.
1.3. There are two defectives in a lot of eight articles. A sample of four articles is drawn at random (without replacement) from the lot. Let X denote the number of defectives in the sample.
(a) Determine the probability function of X and construct a probability table.
(b) Determine the distribution function of X and draw its graph.
1.4. The annual income of six people A, B, C, D, E, F is given in the following table. A committee of k people is selected from these six people.
Person A B C D E F
Income (in $1000's) 3 3 4 5 6 6
where k = 1, 2, • • , 6, and the random variable Xk is defined as the average income of the k committee members. For each value of k: (a) determine the probability function of Xk and construct the corresponding probability chart; (6) determine the distribution function of Xk and draw its graph, (c) Compare the six probability functions, noting especially how they change as k increases.
1*5. The random variable X has a probability function / of the following form, where k is some number:
/(*) =
(a) Determine the value of k.
(b) Find P(X < 2), P(X < 2), F(0 < X < 2).
(c) What is the smallest value of x for which P(X < z) >
(d) Determine the distribution function of X.
.5?
991187
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


170 RANDOM VARIABLES / Chap. 4
1.6. Let X denote the number of hours you study during a randomly selected school day. Suppose the probability function of X has the following form, where k is some number:
.1 if x = 0
kx if x = 1 or 2
k(5  x} if x = 3 or 4 0 other\\ iss.
(a) Find the value of k.
(b) Draw the probability chart.
(c) What is the probability that you study at least two hours? Exactly two hours? At most two hours?
(d) What number of hours is such that you study at least this number of hours with probability at least .70?
(e) Determine the distribution function of X.
(f) What is the conditional probability that you study three hours, given that you do study?
1.7. The distribution function of a random variable X is given as follows:
0 if z< 1
1 if 1 3.
(a) Draw the graph of F.
(b) Find P(X < 1), P(X = 1), P(l < X < 2), P(l < X < 2), P(l < X < 2), P(X < 3), P(2 < X < 3.5), P(1.5 < X < 2.7).
(c) Determine the probability function of X, and construct a probability table.
1.8. An urn contains three green and two red balls. Find the probability function and construct the probability table for each of the following random variables.
(a) The number of red balls in a random sample of three balls drawn with replacement.
(b) The number of red balls in a random sample of three balls drawn without replacement.
(c) The number of balls that are drawn (one by one, with replacement) in order to get a red ball.
(d) The number of balls that are drawn (one by one, without replacement) in order to get a red ball.
1.9. A bridge hand is dealt from a full deck. Let X denote the number of spades in the hand. Determine the probability function of the random variable X.
991188
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / RANDOM VARIABLES; PROBABILITY FUNCTIONS 171
1.10. Let X be a random variable whose possible values Xi, x%,   •, XN occur with probabilities/(xi), /(x2), •  •, f(xN), respectively. If F is the distribution function of X, show that
F(x)= S f(xk),
XJb<£
where the sum is taken over all ^values for which Xk < x. (Because values of F are obtained by successive additions of /values, F is often called the cumulative distribution function of X.)
1.11. Let / and F be the probability and distribution function, respectively, of a random variable X. Show that for any numbers a and b (a < 6),
(a) P(a < X < b) = F(6)  F(a)
(b) P(a < X < 6)  F(6)  F(a) +/(a)
(c) P(a < X < b)  F(6)  F(a) + /(a)  /(&)
(d) P(a < Z < 6) = F(6)  F(a)  /(&)
112. (a) Let £i, rc2j • * *, XN be the possible values of a random variable X defined on a sample space S. Show that if no simple event of S is assigned zero probability, then
{X = xi, X = x2, • • , Z = £#}
is a partition of S. (b) Prove Theorem 1.1.
1.13. A fair coin is tossed and you win $2 if it falls heads, win $1 if it falls tails. Call your gain Xi. A fair die is rolled and you win $2 if a 1, 2, or 3 shows, win $1 if a 4, 5, or 6 shows. Call your gain X^
(a) Show that X\ and X% are different random variables, but have the same probability function. (Note: Two functions are equal if and only if they have the same domain and the same value for each element in their common domain.)
(b) Show that there are infinitely many different random variables that have the same probability function as X\.
1.14. Let S = {01, o2,  • •, on} and let E be any event of S. Define XE, the characteristic random variable of event E, as follows:
1 if ok e E
XE(°k} ~ ^ 0 otherwise.
In other words, XE is equal to 1 if E occurs, and XE is equal to 0 if E does not occur. Prove the following properties of characteristic random variables:
(a) X0 is identically 0; i.e., X0(ok) = 0 for k = 1, 2, • • •, n.
(b) Xs is identically 1; i.e., Xs(ok) = 1 for & = 1, 2, • • •, w.
991189
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 72 RANDOM VARIABLES / Chap. 4
(c) If E = F, then XE = XF, and conversely. (To say XE = XF means X£(o*) = XF(0k) for & = 1, 2, • • ., n.)
(d) If ^ C F, then Z* < ZJP, and conversely. (To say XE < XF means Xs(0k) < Xp(ok) for k — 1, 2, • • •, w.)
(e) XE + XE' is identically 1. (The value of XE + XE' at ofc is denned to be XE(ok) + XE<(ok).)
(f) XEC\F = XsXp. (The value of X.EXV at Ok is defined to be
(g) X"#U,F — XE + XF —
2m The mean of a random variable
In many problems, the random variable under study has a rather complicated probability function. It is therefore desirable to be able to describe some features of the random variable by means of a few numbers that can be computed from its probability function. For some purposes, these numbers, rather than the entire function, are all that is needed. In this section, we concentrate on a number, called the mean of the random variable, that is a measure of location in the sense that it roughly locates a "middle77 or "average77 value of the random variable.
There are other oftenused measures of location, in particular, the median and the mode of a random variable. But these are of lesser importance than the mean, and so we ask the interested reader to learn about them in the problems. (See Problems 2.212.22.)
Definition 2.1 . Let X be a random variable whose possible values a?i, £2, • • *, XN occur with probabilities f(xi), /(x2),  • , /(£#), respectively. The mean of X, denoted by E(X)y is the number
(2.1) E(X) = I ofc/Cfc);
k = l
i.e., the mean of X is the weighted average of the possible values of Xy each value being weighted by the probability with which it occurs.
Let us note that the concept of weighted average is a familiar one. When a student computes his average grade in a course in which his six grades are 75, 90, 75, 87, 75, and 90, he divides the sum of all his grades by the total number of grades:
75 + 90 + 75 + 87 + 75 + 90 492 00
6  T = 82'
991190
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 173
But he can also write
75 + 90 + 75 + 87 + 75 + 90 _ 75(3) + 87(1) + 90(2) 6 6
from which we see that the average grade is the weighted average of the student's grades, each grade having as weight the proportion (or relative frequency) with which it occurs among all the grades.
The choice of the letter E to denote a mean is due to the fact that the concept of mean was first introduced with reference to games of chance, where the mean of the gain of a player is called his mathematical expectation. The mean is also called the expected value of X, but it is important to realize that this term is misleading, since the mean is not a value that we expect the random variable to assume. In fact, E(X) can be different from all the possible values of the random variable X, as the following example shows. We should not be surprised by this, since it occurs in the illustration concerning grades, where the average grade was different from all the actual grades.
Example 2.1, Let X denote the number of points obtained in a throw of a fair die. Then the possible values of X are 1, 2,   , 6, and each occurs with probability J. Hence, applying (2.1),
E(X) = 1() + 2() + 3(» + 4() + 5(i) + 6() = }.
Example 2,2. Let X denote the sum of the numbers on two fair dice. In Example 1.5, we computed the probability function of the random variable X. Now we find
n(A) + 12(&),
or
E(X) = 7.
Example 2.3. A florist stocks a perishable flower which costs him 50 cents and which he prices at $1.50 on the first day it is in his shop. Any flowers not sold that first day are worthless and are thrown away. Let X be the random variable denoting the number of flowers that customers order on a randomly selected day. The florist has
991191
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


174 RANDOM VARIABLES / Chap. 4
found that the probability function of X is given by the following probability table:
X 0 1 2 3
/(*) .1 A .3 .2
How many flowers should the florist stock in order to maximize the mean (or expected value) of his net profit?
For k = 0,1,2,3, let Yk be the random variable denoting the florist's net profit when he stocks k flowers. We determine the probability function of each of these random variables, compute E(Yk) for each, and thus determine the value of k for which E(Yk) is largest.
If he stocks no flowers, then his profit YQ is equal to zero with probability 1, and so E(Yo) = 0. If he stocks one flower, then he loses 50 cents if no flowers are ordered and makes a net profit of 150 — 50 = 100 cents if at least one customer orders a flower. Hence the probability function of YI is given by the table
yi 50 100
F(F! = yi) .1 .9
and so
= 50(.l) + 100(.9) = 85 cents.
Let the reader check that the probability tables of F2 and F3 are given by
2/2 100 50 200
P(Y, = ife) .1 .4 .5
2/3 150 0 150 300
P(F3 = jfr) .1 .4 .3 .2
from which we compute E(Y*) = 110 cents and E(Yz) = 90 cents. Thus the florist maximizes his mean net profit by stocking two flowers. (See Problem 2.7.)
If X is a random variable defined on a sample space S and x& is a possible value of X, then it often happens that we are interested less
991192
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 175
In xk itself than in some number determined by z&, like 5%, 3x& — 2, x*, etc. (As a simple example, if X is the number of units demanded of a product that sells for $7 and costs $2 per unit, then the demand Xk determines the profit 5%.) In such cases, we must first understand that a new random variable is determined. Then we can turn to methods of calculating the mean of this new random variable.
Definition 2.2. Let X be a random variable defined on the sample space S, and suppose g is a numericalvalued function whose domain includes the range of X. Then the composite function of g with X, denoted by g(X}9 is defined as the function whose value for any element Ok e S is the real number g(X(ok))*
Let us review this definition in terms of the function machines in Figure 20. We start with any element o& e S, and first obtain the
Input oA€S
Output
Output X(ok) becomes Input X(ok)
Figure 20
value X(ofe). Now we have assumed in Definition 2.2 that any output number of the Xmachine can serve as an input number of the <7machine. In particular, we can use X(o&) as input for the (/machine, and thereby obtain the value of g at X(o&). This final output number is therefore g(X(ok)). The two machines taken together in the given order can be thought of as one composite machine that takes the input element Ok e S and produces the output number g(X(ok)). The composite function g(X) defined by this composite machine is therefore a random variable whose domain is S. If we let Y = g(X), then the random variable Y assigns to each element o& e S the number
These ideas are illustrated in the next example.
Example 2.4. Suppose we toss two fair coins and take S = {HH, HT, TH, TT} as sample space. Let X denote the number of
991193
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


i 76 RANDOM VARIABLES / Chap. 4
heads obtained and let us take g(x) = (x — I)2, so that F = g(X) = (X — I)2. In words, F is the square of the deviation of the number of heads from 1. For each element ok e S, we obtain the value of the random variable X and then find the corresponding value of the random variable F:
Ok P(M) X(ok) Y(od
HH 1/4 2 (2  I)2 = 1
HT 1/4 I (1  I)2 = 0
TH 1/4 I (1  I)2 = 0
TT 1/4 0 (0  I)2 = 1
We thus obtain the probability functions of X and F:
X 0 1 2
P(X  x) 1/4 1/2 1/4
y 0 I
P(Y = y} 1/2 1/2
And now we can compute the mean of Y by applying Definition 2.1. We find
(2.2)
E(Y) = E[(X  1)] = 0 • j + 1 • i = J.
Our next result shows that we can compute the mean of F = g(X} directly from the probability function of X without first finding the probability function of F.
Theorem 2.1. Let X be a random variable whose possible values 1*1, #2, • • ,XN occur with probabilities /(^i),/fe)7  • •,/(##), respectively. If F = 0C30, then the mean of the random variable F is given by
(2.3)
Proof. Let the possible values of F be 7/b 2/2,  • •, ?/*,/. (We know that J/ < AT, since it can happen that F has the same value for two different values of X.) Then by the definition of E(Y] we have
991194
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 177
E(Y) = 2 y,P(Y = y,)
j«l
= S v,P(g(r> = y,)
J = l
M
= S &PCX" = a:, where g(x) = y3}.
3 = 1
Now the probability of the event in this last expression is the sum of the probabilities of one or more (disjoint) events of the form X — Xk: where xk is a possible value of X. And for each of these events we know that y3 = g(xk). As j varies from 1 to M, we include terms of the form g(xk)P(X = Xk) for each possible value ar&. Hence
E(Y] = £ g(xk}P(X = 3*), which is precisely what we set out to prove.
Example 2.5. Let us illustrate the use of (2.3) by computing for the random variable Y = (X — I)2 in Example 2.4. We find
E(Y) = E[(X  I)2] = (xk 
as in (2.2).
Computing E[g(X)] by means of Formula (2.3) is generally much easier than first determining the probability function of the random variable g(X) and then computing its mean by using Definition 2.1. In our later work, for example, we shall use the formulas
(2.4) E(X*) =  xtf(xk],
*«i
(2.5) E[X  E(X)} = S Da  E(X)}f(xk),
k=l
(2.6) E(\X  E(X)Y) = 2 [x*  E(Xm(xd,
1 = 1
which are obtained from (2.3) by putting g(x) in turn equal to x2, x  E(X) and [x  E(X)]\
Example 2.6. If X denotes the number of points obtained in a roll of a fair die (see Example 2.1), then from (2.4) we find
991195
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 78 RANDOM VARIABLES / Chap. 4
32() + 42(J) + 52(J) + 62() = V
Theorem 2.1 leads to a number of important results to which we now turn.
Theorem 2.2. If a and b are any numbers, then (2.7) E(aX + 6) = aE(X) + 6.
Proof. We let g(x) = ax + bm (2.3) and find
E[g(X}] = £(aX + 6) =  (ax, + &)/(**)
&=i
# .v
= 2 azfc/fe) + S bf(xk)
= a S a^/fo) + 62 /(a?*)
Jbl k = l
the last equality following from the definition of E(X)y as given in {2.1), together with the fact, expressed in (1.7), that the sum of all the probabilities /(XA) is 1.
As special cases of (2.7), we have
E(X + b) = E(X} + b and E(aX) = aE(X).
In words, adding a fixed amount to every value of a random variable changes the mean of the random variable by this same amount, and multiplying every value of a random variable by the same factor multiplies the mean by that factor. It is comforting that our formulas yield these very appealing and reasonable results. If we put a = 1 and 6 = —E(X) in (2.7), we obtain
{2.8) E[X  E(X)} = E(X)  E(X) = 0.
Now X — E(X) denotes the algebraic or signed deviation of X from its mean. This deviation is positive, zero, or negative depending upon whether the value of X is greater than, equal to, or less than the number E(X}. Thus, Formula (2.8) asserts that the mean deviation of X from its mean is zero.
It is possible to give a mechanical interpretation of some of our formulas. We think of N particles distributed along the xaxis at the points Xi9 x%, *  , XN. The particle at point x& has mass f(xk) for k = 1, 2, ••,#. Then (1.6) and (1.7) express the facts that each particle has positive mass and the total mass of all N particles is 1.
991196
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 1 79
With this interpretation, the sum in (2.1) defines what the physicist calls the center of gravity of the system of N particles. Thus, the center of gravity is the weighted average of the ccvalues, each having weight equal to the mass concentrated at that rvalue.
The number Xk — E(X) is the signed distance of the particle at xk from the center of gravity. If wre imagine the xaxis as a lever suspended on a fulcrum placed at the center of gravity, then Xk — E(X) is positive if x^ is to the right of the fulcrum and negative if xk is to the left of the fulcrum. (See Figure 21.) The moment about this ful
Mass: f(xj)
Position: ^ x2
Figure 21
cram of the particle of mass /(#&) at Xk is the product of its mass and its signed distance from the fulcrum. (This signed distance is called the moment arm in mechanics.) The total moment (about the fulcrum) of the entire system of N particles is therefore precisely the sum in (2.5). When this total moment is zero, the lever is in equilibrium; i.e., it balances and does not turn about the fulcrum. Formula (2.8) is therefore merely the expression of the following property of the center of gravity: a distribution of mass particles is in equilibrium with respect to motion about a fulcrum placed at the center of gravity of the system. It is possible to show further that the center of gravity is the only location for a fulcrum if the lever is to be in equilibrium and not turn. t (See Problem 2.14.)
In our concluding example, we find it convenient first to determine the distribution function of X", then the probability function, and finally E(X). This example therefore serves as a quick review of some of the material presented up to now in this chapter,
Example 2.7. In a certain city, there are 25 officials with cityowned limousines carrying license plates numbered 1, 2, 3, • • •, 25. In a 5minute period in front of city hall we observe two official cars, Let us interpret this as a random sample of two cars drawn with replacement from the population of 25 cars. Let X denote the larger license plate number observed. (If the two numbers we observe are
991197
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


180 RANDOM VARIABLES / Chap. 4
the same, then X is just the number observed.) We want to find the mean of the random variable X.
The possible values of X" are clearly the integers 1, 2, • • • , 25. The event X < k occurs if and only if both license plate numbers are less than or equal to L Hence for k = 1, 2, •  * , 25,
< ft) = = r
But
/(fc) = P(X = k} = P(X < k)  P(x < k  1).
Therefore for ft = 1, 2, • • , 25,
625 625 625
Having found the probability function of X, we can use (2.1) to compute E(X). The values xi, x2,   •, XN in (2.1) are now just the integers 1, 2, • • • , 25. We thus find that
25 25 9Z __ 1
— y kffif} — y if •
~~~ ^ i\J \lvj "~~~ jfj tV
fc=l As1
25 25
But the sum of the first N positive integers and the sum of the squares of the first N positive integers are given by the formulas
(2.9) 1, =
&1
It follows that
1 F2 (25) (26) (51) (25) (26)1 _ 429 ~ 625 L 6 ~~ ~ '
)1 J
Thus the mean of the larger of the two observed license plate numbers is 17.16.
A problem related to this one, but considerably more difficult, is the following. Suppose you do not know how many people are attending a convention, but you do know that as each person entered he was given an identification tag with a number on it. The tags are numbered serially from 1 to N, where N is the unknown number in attendance. You select a random sample of ten people, let us say, and observe that the largest number on their badges is 261. What estimate do you then make of the total attendance at the convention?
991198
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 181
This is a problem in statistical estimation in which some characteristic of a population (the total number of people) is to be estimated on the basis of information (the largest of the ten selected badge numbers) obtained from a sample drawn from the population. In Example 2.7, we did a very simple problem of sampling theory, in which we answered a probability question about a sample on the assumption that we knew everything about the population from which the sample is drawn.
PROBLEMS
2.1. Let X denote the number of heads obtained in three independent tosses of a fair coin. (See Problem 1.1.)
(a) Find E(X).
(b) Determine the probability function of the random variable Y = X  E(X) and then verify (2.8) by computing E(Y).
(c) Determine the probability function of the random variable Z — [X  E(X)]* and then calculate E(Z). Check your result bv also using Theorem 2.1 to compute E(Z).
2.2. A coin (perhaps biased) is tossed. Let X denote the number of heads obtained. Determine the probabilitv function of the random variable Y = X(l  X).
2.3. A thousand tickets are sold in a lottery in which there is one top prize of $500, four prizes of $100 each, and five prizes of $10 each. A ticket costs $1. If X is your net gain when you ouy one ticket, find E(X).
2.4. In roulette, the wheel has the 37 numbers 0, 1,2, —, 36 marked on equally spaced slots. A player bets $1 on a given number. He receives $36 from the croupier if the ball comes to rest in this slot; otherwise, he gets nothing. If X is the player's net gain, find E(X).
2.5. Refer to Problem 1.3 and find the mean number of defectives in the sample.
2.6. Two defective tubes get mixed up with two good ones. You select and test one tube at a time until you have discovered both defectives. Let X be the number of tubes selected when the second defective is discovered. Determine the probability function of X and compute the mean of X. (Of. Problem II.5.9.)
2.7. In Example 2.3, we assumed that there is no loss caused by inability to fill a customer's order. In actual practice, the florist might consider that turning a customer away for lack of stock is equivalent to sustaining a monetary loss, because customers may give their future business
991199
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 82 RANDOM VARIABLES / Chap. 4
to another florist, good will is lost, etc. Suppose the florist counts each customer turned away for lack of stock as equivalent to a 50 cent loss, (a) Show that he should still stock two flowers, (b) How large must be the equivalent monetary loss of each customer turned away for lack of stock before the florist maximizes his mean net profit by stocking three flowers?
2.8. Player A pays B $1 and two fair dice are rolled. A receives $2 from B if one six appears, $4 if two sixes appear, and he gets nothing if no six appears. Let X denote player A's net gain, (a) Find E(X). (b) What must A pay B as entrance fee (instead of SI) in order to have E(X) = 0?
2.9. Player A bets $1 against B's 16 that if two cards are dealt from a standard deck, both cards will be of the same color. If X is player A's net gain, what value of b Is required to make E(X) = 0? With the value of b so determined, what is E(Y) if Y is player B's net gain?
2.10. Compute the means of the random variables defined in Problem 1.8.
2.11. Suppose you have convinced a friend to play the following game with you. A fair coin is to be tossed until the first head appears, but the game is over if no head appears after 20 tosses. Your friend agrees to pay you $2 if a head turns up on the first toss, $22 (= $4) if the first head comes up on the second toss, • • , S220 (= $1,048,576) if the first head comes up on the twentieth toss. You receive nothing if the 20 tosses yield no head. What entrance fee should you pay your friend before the game to make your net gain have mean zero?
2.12. A store sells an item which yields a profit of $3 per item. If the item is out of stock, customers buy elsewhere. At the end of one day, the store manager notices that there are only five items left in stock. The number of items demanded by customers each day is a random variable D such that E(U) = 12. Assuming additional stock is unavailable, let X denote the profit lost due to the managers failure to reorder. Find E(X). WTiat theorem have you used?
2.13. Let X denote the sum of the numbers obtained when two fair dice are rolled. Find E(X*). Is E(X*) = [#(Z)p? (Refer to Example 2.2.)
2.14. Show that
N
if £ (xt ~ c)/fe) = 0, then c = E(X). &=i
2.15. In the carnival game known as chuckaluck, a player pays an amount e, his entrance fee for playing the game. He selects one number from the six numbers 1, 2, • • •, 6 and then rolls three dice. If all three dice show the number the player selected, the player is paid four times hi& entrance fee; if two of the dice show the number, the player is paid
991200
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / THE MEAN OF A RANDOM VARIABLE 183
three times his entrance fee; and if only one of the dice shows the number, the player receives an amount equal to twice his entrance fee. If his number does not show up, then he receives nothing. Let X denote the player's net gain in a single play of this game. Assuming the dice are fair: (a) determine the probability function of the random variable X] (b) compute E(X) and thus show, in particular, that if the entrance fee is SI, then the player sustains a mean loss per game of about 8 cents.
2.16. After working together on many jobs, four people A, B, C, and D are each asked to write on a slip of paper the name of thai) person (from among his three coworkers) who is most cooperative. Let X denote the number of people who are considered most cooperative by none of their coworkers. Assuming that each person selects one of his coworkers at random and writes his name on the slip of paper, find E(X)~
2.17. A drunk reaches home and wants to open his front door. He has five keys on his key chain and tries them one at a time and at random. He is alert enough to eliminate unsuccessful keys from subsequent selections. Let X denote the number of keys he tries in order to find the one that opens his door. Find E(X).
2.18. Refer to Problem 1.4 and show that E(Xk) has the same value for
2.19. An urn has ten balls, numbered from 1 to 10. You are offered the following options:
(1) Pay $1, draw a ball from the urn, and be paid a number of dollars equal to the number on the ball.
(2) Pay $1, draw a ball from the urn. If the number on the ball is. greater than 5, then be paid a number of dollars equal to the number on the ball. If the number on the ball is 5 or less, then put the ball back in the urn, pay $3, draw another ball from the urn and be paid a number of dollars equal to the number on the ball.
Let Xi and X% be your net profit when you accept options 1 and 2T respectively, (a) Determine the probability functions of Xi and X*. (b) If you want to maximize your mean net profit, which option do you accept?
2.20. (a) One number is selected at random from the first ten positive in
tegers. Let X denote the number obtained. Find E(X). (b) Two numbers are selected at random (with replacement) from the first ten positive integers. Let X denote the larger of the two numbers obtained. Find E(X^
991201
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


1 84 RANDOM VARIABLES / Chap. 4
(c) Three numbers are selected at random ' with replacement) from the first ten positive integers. Let X denote the largest of the three numbers obtained. Find E(X).
(d) Redo Parts (b) and (c)7 assuming the numbers are selected without replacement.
2.21. Let X be a random variable with distribution function F. As we know, the graph of F is a step function. Imagine vertical line segments drawn connecting the lower and upper pieces of the graph at x\, x% • • •, AV (where jumps occur) and call the new graph the extended graph of F. Select any probability p on the vertical axis and consider the horizontal line at this height. The ^coordinate of any point where this horizontal line intersects the extended graph of F is called a lOOpth percentile of the random variable X. A 25th percentile is called a lower quartile; a 75th percentile is an upper quartile; a 50th percentiie is a median of X.
(a) In terms of the construction just described, state when there is a unique lOOpth percentile and when there are more than one.
(b) Show that a median of X can equivalently be defined as any number m such that P(X < m) > J and P(X > ra) > . Formulate a corresponding definition for a lOOpth percentile of X.
(c) Let X denote the sum of the numbers on two fair dice. Show that the median of X is 7, the lower quartile is 5, and the upper quartile is 9. (Cf. Example 2.2.)
(d) Consider the random variable X defined in Example 2.3. Show that E(X) = 1.6. Also show that any number between 1 and 2 inclusive is a median of X.
2.22. A possible value of X that occurs with a probability at least as large as the probability of any other value of X is said to be a mode (or modal value) of the random variable X.
(a) Let X be the sum of the numbers on two fair dice. Show that the mode of X is 7 so that this random variable happens to have its mean, median, and mode all equal.
(b) A and B match pennies four times. On each match A wins one penny with probability  and loses one penny with probability . Let X denote the number of times during the course of the game that A is ahead. Find the mean, median, and mode of the random variable X.
2.23. Suppose the probability function/is symmetrical about the line x = a, i.e., /(a + x) = /(a  x) for all x. Show that E(X) = a.
991202
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION
185
3m The variance and standard deviation of a random variable
The mean of a random variable X is an "average" value of X; it gives us no information about the variability of the values of X. For many purposes, we also require a measure of this variability, of the "spread" or "dispersion" of the values of the random variable. This requirement is especially apparent as soon as one realizes that random variables with different probability functions can have equal means. For example, we have tabulated below the probability functions of four different random variables.
(3D
(3.2)
(3.3}
(3.4)
X i 2 3 4
/ibO 1/8 2/8 3/8 2/8
X 1 0 4 5 6
f*&) 1/8 2/8 3/8 1/8 1/8
X 4 5 6 7
/•(*) 1/8 2/8 3/8 2/8
X 2 4 6 8
MX) 1/8 2/8 3/8 2/8
= 2.75
2.75
XJ = 5.75
= 5.50
The corresponding probability charts are drawn in Figure 22.
The reader can check that Xi and X^ have equal means. To distinguish Xi from X% requires that we have a measure of the extent to which the values of the random variables spread out along the horizontal axis. We would certainly expect of such a measure that it would be larger for X% than for Xi, reflecting the fact that graph (b) is more spread out than graph (a), in Figure 22.
The random variable Z3 is obtained by adding 3 to each value of Xi; i.e., X$ = Xi + 3. As we showed in the preceding section, the mean is thereby increased by 3. A glance at graphs (a) and (c) in Figure 22 shows they are identical, except that graph (c) is 3 units
991203
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


186 RANDOM VARIABLES / Chap. 4
further along the reaxis. But the graphs show the same variability in the values of Xi and Z3, and we therefore expect our measure of dispersion to be equal for Xi and ^3.
3/8 < 3/8 • 4
2/8  i 2/8 t
1/8 n   1/8 T T T T ,<*. n "1 T T t 11,,,
1012345678 x (a)
1012345678 *
(b)
3W J I /4W.
3/8  3/8 t
2/8 < 2/8 •• <
1/8 0 r ..... 1 1/8 T ^ n T T T I I 1 1
101234567aa: 1012345678*
(c) (d)
Figure 22
On the other hand, X* is obtained by multiplying each value of Xi by 2; i.e., X4 = 2Xi. The mean is thereby also doubled, but now graph (d) is more spread out on the axis than graph (a) in Figure 22. We therefore expect our measure of dispersion to be larger for Z4 than for Xi. Graphs (b) and (d) are harder to compare by eye, and it is clear that we must now leave these special examples and somehow obtain a numerical measure of dispersion that will apply to any random variable.
A first attempt to formulate a precise definition of the spread of the values of a random variable might proceed as follows. Choose some central or average value of X, say E(X). For each possible value xk of the random variable X, the number xk — E(X) measures the deviation of xk from E(X). Compute this deviation for k = 1, 2,  • , N. Finally, form the weighted average of these deviations, using as weight for the fcth deviation the probability f(xk] with which the
991204
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION 1 87
value xk (and hence the deviation xk — E(X)) occurs. We are thus led to the number
(3.5) 2 [xk  E(X)]f(xk) = E[X  E(X)], &=i
which, disappointingly, is useless as a measure of spread, since we showed in (2.8) that it is zero for all random variables.
A second attempt would follow the realization that the sum in (3.5) is the weighted average of the algebraic or signed deviations xk — E(X) and that, after being properly weighted, these deviations, some positive and some negative, add to zero. When measuring the spread of the values of a random variable, we should be concerned with the magnitude of xk — E(X), but not with its sign. In other words, we
991205
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


188
RANDOM VARIABLES / Chap. 4
TABLE 24
Xk AM srttaO *« (Xk  Ml)2 (**)%<*»
1 2 3 4 Sums: 1/8 2/8 3/8 2/8 1/8 4/8 9/8 8/8 7/4 3/4 1/4 5/4 49/16 9/16 1/16 25/16 49/128 18/128 3/128 50/128
1 22/8 120/128
4 22
4 120 erf = Var(JQ = S fob  /*i)2/i(£fc) =  = .9375 &=i 128
ffl = VVarCXi) = .97, approx.
Proceeding in this way, we compute the following values:
= .9375
991206
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION 189
squared. It is in order to have a measure of dispersion in the same units as the values of X that we define the standard deviation as the square root of the variance.
We turn now to some general results concerning the variance of a random variable. Each term in the sum (3.7) that defines Var(Jf) is nonnegative, and so the entire sum is either zero or a positive number. The sum is zero if and only if each term in the sum is zero. Since f(xk) > 0, this means xk = MX for all k. We have therefore proved the following result.
Theorem 3.1 . For any random variable X, we have
(3.9) Var(X) > 0,
the equality holding if and only if there is only one possible value of Xj this value therefore occurring with probability 1.
Although the calculations summarized in Table 24 are not difficult, it would be gratifying to have a simpler way of computing the variance of a random variable than by the use of the defining equation (3.7). Our next result gives us the required formula.
Theorem 3.2. The variance of X is obtained by subtracting the square of the mean of X from the mean of X^. In symbols,
(3.10) Var(X) = E(X2)  A. Proof. We expand the summand in (3.7) and obtain
Var(Z) = S (zl
= I x!/(xfc)  2/xz I **/(**) + A I /(**)•
jfc*l fc«l k = l
The first sum is E(X2) by (2.4), the second sum defines /^x, and the probabilities in the third sum add to 1. Hence
Var(Z) = E(X*)  2& + &, from which (3.10) follows immediately.
It is important to distinguish clearly between the "mean square" E(Z2) and the "square mean" A = [E(X)]* in Formula (3.10). This formula is usually used to compute Var(JT). In Table 25, we summarize the calculations involved in finding the variance of the random variable Xi whose probability function is given in (3.1). A comparison with Table 24 will show how much simpler it is io use (3.10) rather than (3.7) to compute the variance.
991207
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


190
RANDOM VARIABLES / Chap. 4
TABLE 25
Xk *(*> x,/lfe) 4AW)
1 1/8 1/8 1/8
2 2/8 4/8 8/8
3 3/8 9/8 27/8
4 2/8 8/8 32/8
Sums: 1 22/8 68/8
4 22
A = i 8
4 o 68
j&^,A]J = ^ •£>fcjl\%k) === == o.O
oi = Var(Yi) = E(Xi)  MI = 8.5  (2.75)2 = .9375
o! = VVar(^Ti) = .97, approx.
Example 3.1. If X denotes the number of points obtained in a. roll of a fair die, then we computed E(X) — % in Example 2.1 and ^ in Example 2.6. Applying (3.10) we find
Var(Z) = *£ 
and
a* =
= 1.7, approx.
In Theorem 2.2, we studied the effect on the mean of changing each value of a random variable by (1) adding or subtracting a fixed number, and (2) multiplying or dividing by a fixed number. Changesof the first kind are known as changes in the location of the origin on the horizontal axis of the probability chart of the random variable; changes of the second kind are known as changes in the scale on this. axis. For example, in Figure 22 the graph of Z3 in (c) can be obtained from the graph of Xi in (a) by a change in location of the origin: if we shift the number 0 (and all other numbers) three units to the left, then with this relabeling of the axis graph (a) becomes graph (c). But the graph of X4 in (d) is obtained from the graph of Xi in (a) by a, change in scale: if we make each unit on the axis in (a) two units, then with this relabeling graph (a) becomes graph (d). It is as if the axis in (a) measured the values of Xi in units of quarts, let us say, whereas the axis in (d) measured the same variable in units of pints* In the following theorem, we study the effect on the variauce and the:
991208
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION 191
standard deviation of changes in location of the origin and changes in scale.
Theorem 3.3. If a and 6 are any numbers, then
(3.11) Var(aZ + 6) = a2 Var(X),
(3.12) 0 and Va2 = —a if a < 0; i.e., Va2 ==  a .)
From (3.11) and (3.12) we conclude that
Var(Z + b) = Var(Z), ax+, = «, and
Var(oX) = a2 Var(Z), oaZ =!«!«•
In words, adding a fixed amount to every value of a random variable has no effect on the variance and the standard deviation of the random variable, but multiplying each value of a random variable by the same factor a multiplies the variance by a~ and the standard deviation "by \ a \.
Because of its importance in our later work, we state the following special case of Theorem 3.3. The proof is left for the problems.
Theorem 3.4. Let X be any random variable with mean jur and standard deviation ax > 0. Let the random variable X* be defined as follows:
991209
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


192 RANDOM VARIABLES / Chap. 4
(3,13) X* = X ~~ Mz
ox
(X* is called the standardized random variable corresponding to X.) Then
(3.14) JE7(X*) = 0 and Var(X*) = 1;
i.e., the standardized random variable has mean 0 and standard deviation 1.
Example 3.2. In a manufactured lot, there is a proportion p of defective items. An item is chosen at random from the lot. Let X have the value 1 if the selected item is defective, and 0 otherwise. Thus the possible values of X are 1 and 0, and these occur with probability p and q = 1 — p, respectively. Hence
jjLx=l'p + Q'q = p, E(X*) = I2 • p + O2  q = p,
A = E(X*)  i&  p  p2 = p(l  p) = pff,
and the standardized random variable corresponding to X is
v* % — p
X* =  J=*"
Vpq
Example 3.3. To each value of a random variable X there corresponds a value of the corresponding standardized variable X*, and vice versa. If the value of X is a test score, then X* is the corresponding standard score. To interpret the standard score, we solve (3.13) for X and obtain
X = n
Thus, if we are told that the value of the standard score X* is some number, say z, then the corresponding value of the actual score X is z standard deviations removed from the mean, being above the mean if z > 0 and below if z < 0. A standard score of +2 means an actual score of 2 standard deviations above the mean score, etc.
The following theorem, due to the Russian mathematician P. L. Chebyshev (18211894), gives us further insight into the significance of the standard deviation as a measure of the dispersion of the values of a random variable about the mean.
Theorem 3.5. Let X be a random variable with mean \ix and standard deviation ax > 0. Let c be any positive number. Then the
991210
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION 193
probability that a value of X occurs that differs from px by more than c is less than crlr/c2. In symbols,
(3.15) P(\X  »x\ > c)< ~
C
Proof. We start with Formula (3.7) for the variance of X:
N
Since each term of this sum is nonnegative, omitting some terms cannot increase the value of the sum. Therefore, if we delete all terms (if any) for wiiich \xk — px\ < c, we obtain
tfl > 2* &k — pxYffak) A
where the asterisk indicates that the summation extends only over those k for which z& — p%\ > c. It follows that we further decrease this sum if we replace each \xt — PX\ by c; i.e.,
ol > 2* c2/(x,) = c22*/(x,).
k k
But
"A L k Hence
and the result follows by dividing both sides by c2.
From Formula (3.15) we see that with c fixed, the smaller the variance of X, the lower the probability that a value of X occurs that deviates from px by more than c. Thus the variance in this sense controls the spread or dispersion of the values of the random variable X about the mean. To be somewhat more precise, it is convenient to obtain an alternate form of (3,15) by substituting zo% for c. One thus obtains
(3.16) P(\X  px\ > z 1  T
Formula (3.17) can be written more succinctly if we introduce the standardized random variable X* defined in Theorem 3.4. We obtain
991211
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


194 RANDOM VARIABLES / Chap. 4
(3.18) P(X* < z) > 1 ~ ~
Formulas (3.15)(3.18) are alternate forms of Chebyshev's Inequality.
If 0 < z < 1, then the inequality does not yield any useful information. For then l/z2 > 1, and (3.16) merely asserts the obvious fact that a probability is less than a number greater than 1.
But if z > 1, then Chebyshev's inequality gives us some information about the probability function of X. For example, if we put z = 2 in (3.17), then P(\X — MX < 2/ix) > f. In words, the event that a random variable assumes a value that is within two standard deviations of its mean has probability greater than f. Put differently, a total probability of more than f is accounted for by values of X in the interval [MX — 2 f, P(3 < X* < 3) > f, etc.
Either way, we see how the spread or dispersion of the values of the random variable X about the mean MX is controlled by the standard •deviation ax
Theorem 3.5 is extraordinarily general; the probability statements given by Chebyshev's inequality apply to any random variable. One pays a price for such generality, since one cannot expect an inequality that applies to all random variables to be especially sharp and defini"tive when applied to some specific random variable. (See Problem 3.17.) Nevertheless, Chebyshev's theorem is an important analytic tool in the theory of probability. We shall have occasion to use it in a later section when we prove the socalled law of large numbers.
PROBLEMS
3.1. A questionnaire sent to four families yields the following information.
Number of Children 2 3 0 2
"Own TV set?" Total Income
A yes $10,000
B yes 5,000
C yes 8,000
D no 5,000
991212
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / VARIANCE AND STANDARD DEVIATION 195
One of these families is chosen at random. Let X have the value 1 if the family owns a television set and the value 0 otherwise, let Y be the income of the family, and let Z be the number of children in the family. Find the mean, variance, and standard deviation of each of these random variables.
3.2. Suppose 70 percent of the voters favor a certain proposal, 30 percent being opposed. A voter is selected at random and we let X = 0 if he is opposed, X = 1 if he is in favor. Find E(X) and Var(Ar).
3.3. Let X denote the sum of the numbers obtained when two fair dice are rolled. Find the variance and standard deviation of X. (Cf. Problem 2.13.)
3.4. (a) Consider the random variable X defined in Example 3.2. Show that Var(Z) < J. For what value of p is Var(Z) = J? (b) Generalize
(a) by showing that if X is any random variable such that E(X~) = E(X), then Var(Z) < i
3.5. (a) Let Xk be the number of heads obtained when a fair coin is tossed
k independent times. For k = 1, 2, 3, 4, calculate the variance and standard deviation of Xk
(b) Redo part (a), but now assume the coin is biased so that the probability isp(0
991213
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


196 RANDOM VARIABLES / Chap. 4
matical formulation of the following parallel axis theorem: The moment of inertia of a mass system about any given axis is the moment of inertia of the system about a parallel axis through the center of gravity, plus the moment of inertia about the given axis if all the mass were concentrated at the center of gravity.)
3.9. The random variable X is given and we define a new random variable 7 = g(X), as in Definition 2.2. If g(x) = a + bx + ex2, show that E(Y) = a + bE(X) + c[E(X)]2 + c Var(X).
3.10. An urn contains six balls. Three have 1's on them, one has a 2, and two have 3's. One ball is drawn from the urn and then, without replacing the first, another is drawn. Let Xi be the number on the first ball and Xz the number on the second ball. Find the standard deviations of X\ and Z2.
3.11. A subject is shown a deck of three cards numbered 1, 2, and 3. The cards are shuffled and placed face down on the table. The subject is asked to call the order of the cards. Let X denote the number of correct calls made by the subject. Consider the following possible ways that a subject might guess:
(1) He chooses one card and calls it three straight times.
(2) He makes three independent guesses. For example, he can roll a fair die and guess the first card is 1 if a 1 or a 2 comes up, guess the first card is 2 if a 3 or 4 comes up, and guess the first card is 3 if a 5 or a 6 comes up. The die is then thrown twice more to determine the subject's second and third calls.
(3) He chooses at random one permutation from among all the permutations of the numbers 1, 2, 3 and calls his guesses in the order specified by the selected permutation. (Cf. Problem II.3.9.)
For each of these methods of guessing, find (a) the probability function of the random variable X, (b) the mean of X} and (c) the standard deviation of X.
3.12. Let X denote the number obtained when one number is selected at random from the numbers 1, 2, 3, * • • , N. Show that
3.13. Another measure of the spread* of the values of a random variable is the mean absolute deviation defined as the number
*For an interesting discussion of measures of variability and their use to measure risk in a portfolio of securities, see H. Markowitz, Portfolio Selection John Wiley and Sons, Inc., 1959, pp. 286297.
991214
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 197
Compute the mean absolute deviation of the random variables whose probability functions are given in (3.1)(3.4) and compare with their standard deviations.
3.14. Prove Theorem 3.4.
3.15. A random variable X has mean 100 and standard deviation 10. X* is the standardized random variable corresponding to X. (a) What value of X* corresponds to each of the following values of X: 85, 100, 103? (b) What value of X corresponds to each of the following values of X*: 2, 1, 0.4, 1.3?
3.16. In the proof of Theorem 3.5, where was the hypothesis ax > 0 used? Is (3.15) true if ax = 0?
3.17. For each of the following random variables, calculate
P(\X  M < 2«)
for z = 1.5 and 2 = 2, and compare these probabilities with the corresponding estimates given by Chebyshev's inequality.
(a) X, the number of points obtained in a roll of a fair die.
(b) X, the sum of the number of points on two fair dice.
(c) X, the number of heads obtained when four fair coins are tossed.
3.18. You are told that no possible value of a random variable X is more than one standard deviation from the mean; i.e., all possible values are in the interval [/zx — ast MX + crj]. Show that X either has only one possible value, this value therefore occurring with probability 1, or X has two possible values, each occurring with probability f.
3.19. Let X be any random variable and consider the statement
P(z < X* < z) > p.
For each of the following values of p find the smallest value of z (according to Chebyshev's inequality) that makes the statement true: p = 0.5, p = 0.9, p = 0.95, p = 0.99.
3.20. The random variable X is given and the new random variable Y = g(X) is defined. Suppose that the possible values of F are all nonnegative and that not all are zero. For any positive number, say c2, prove that
P(Y > c»)< ^P*
C"
[Note. This formula generalizes Chebyshev's theorem, for we obtain (3.15) as a special case if we put g(x) = (x — /ix)2.]
4, Joint probability functions; independent random variables
When an experiment is performed, we are often interested in more than one characteristic of the resulting outcome. If 13 cards are
991215
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


198 RANDOM VARIABLES / Chap. 4
dealt from a full deck, we might be interested in the number of spades in the hand and the number of aces; if a person is selected from a certain population, we might want to record his height and weight, his IQ test score and the average number of hours he watches television, etc. In such cases, we are interested not only in studying each characteristic separately, but also in determining interrelationships that exist among the characteristics.
In mathematical terms, we are given a sample space S and n random variables defined on S, where n is an integer greater than or equal to 2. In this section, we study the bivariate case (n = 2), concluding with some remarks on the more general multivariate case (n > 2) . We begin with an example that serves to prepare the way for the formal development that follows.
Example 4.1. A fan* coin is tossed three independent times. W» choose the familiar set
S = {HHH, HHT, HTH, THE, HTT, THT, TTH, TTT}
as sample space and assign probability % to each simple event. We define the following random variables:
"k°ss is a tail,
1 if the first toss is a head, Y = the total number of heads, Z = the absolute value of the difference between the number
of heads and tails.
(Note that when we define random variables in this way, the equality sign is used as shorthand for "is the random variable whose value for any outcome (element of S} is". The distinction between the random variable and the value of the random variable should be kept clearly in mind even when, as here, the customary notation is somewhat misleading.)
We list in Table 26 the values of these three random variables for each element of the sample space S. Consider first the pair X} F. We want to determine not only the possible pairs of values of X and F, but also the probability with which each such pair occurs. To say, for example, that X has the value 0 and F the value 1 is to say that the event {THT, TTH} occurs. The probability of this event is therefore f or J. We write
P(X = 0, Y = 1) = J,
991216
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS
199
TABLE 26
Element of S Value of X Value of Y Value of Z
HHH 1 3 3
HHT 1 2 1
HTH 1 2 1
THE 0 2 1
HTT 1 1 1
THT 0 1 1
TTH 0 1 1
TTT 0 0 3
adopting the usual convention in which a comma is used in place of C\ to denote the intersection of the two events X = 0 and Y = 1. We similarly find
P(X = 0, F = 0) = P({TTT]) = i
P(X = 1, F = 0) = P(0) * 0, etc.
In this way we obtain the probabilities of all possible pairs of values of X and Y. These probabilities are conveniently arranged in Table 27, the socalled joint probability table of X and Y.
TABLE 27
X ^^^x^^ 0 1 2 3 P(X = x)
0 1/8 1/4 1/8 0 1/2
1 0 1/8 1/4 1/8 1/2
P(Y = y) 1/8 3/8 3/8 1/8 1
We can also represent these results graphically as in Figure 23. In Figure 23 (a) a heavy dot is located at each point (x, y) for which P(X = x, Y = y) is positive, and this probability appears next to the dot. In Figure 23 (b) we draw a three dimensional chart in which P(X — x, Y = y) is the height of a vertical line drawn above the point (x, y) in the horizontal xy plane.
991217
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


200
RANDOM VARIABLES / Chap. 4
JPCK*
3  •1
2 i •I
1 i •i
1 1 t _
0 12 x
(a)
The event F = 0 is the union of the mutually exclusive events (X = 0, Y = 0) and (X = 1, Y = 0). Hence
p(7 = 0) = P(X = 0, 7 = 0) + P(X = 1, F = 0)
= I + o = i
In Table 27, this probability is obtained as the sum of the entries in the column headed y = 0. By adding the entries in the other columns, we similarly find
P(Y =
§, P(7 = 2) = f, P(F = 3) = .
In this way we obtain the probability function of the random variable Y from the joint probability table of X and F. Since values of this probability function are written in the lower margin of the joint table, the function is commonly called the marginal probability function of F, in spite of the fact that the adjective "marginal" is redundant. By adding across the rows in the joint table, one similarly obtains the (marginal) probability function of X.
We add one final observation about the random variables X and F. It is clear from the meaning of X and F that knowing the value of X changes the probability that a given value of F occurs. For example, P(F = 2) = f . But if we are told that the value of X is 1, then the conditional probability of the event F = 2 becomes . For, by the definition of conditional probability,
_ ., P(z • i. r  2) ti
P(F = 2Z = 1) =
P(X = 1)
991218
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 201
As we expect, the events X = 1 and 7 = 2 are not Independent: knowing that the first toss results in a head increases the probability of obtaining exactly two heads in the three tosses.
What we have done for the pair X, Y can also be done for X, Z. We give the results only, asking the reader to check our calculations. The joint probability table of X and Z is given in Table 28. We have, as before, written in the margins the rowsums and the column
TABLE 28
X ^\^^ 1 3 P(X = x)
0 3/8 1/8 1/2
1 3/8 1/8 1/2
P(Z = z) 3/4 1/4 1
pums which determine the (marginal) probability functions of X and 7, respectively. In Figure 24, we graph these results as we did for Y, Y in Figure 23.
I •*
•I
(a)
Figure 24
Finally, let us observe that the events X = 0 and Z = 1 are independent, since we find P(X = 0, Z = 1) = , and since this is equal to the product of P(X = 0) =  and P(Z = 1) = f. This is reflected in Table 28 by the fact that the probability appearing in
991219
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


202 RANDOM VARIABLES / Chap. 4
the cell determined by the row labeled x = 0 and the column labeled z = 1 is the product of the marginal totals for that row and column. Indeed, this multiplication property holds for each of the four entries in the joint probability table of X and Z. A comparison with Table 27 will show that the entries in the cells of the joint probability table of X and F are not products of the corresponding marginal probabilities. Thus the random variables X and Y have a relationship to each other that is different from that shown by X and Z. According to the definitions to be given below, we say that the random variables X and 7 are dependent, but that X and Z are independent.
With this particular example understood, we can now proceed to discuss the general case of any two random variables defined on the same sample space.
Definition 4.1 . Let a sample space S = {01, o2, • • , on} be given together with an acceptable assignment of probabilities to its simple events. Let X and Y be random variables defined on S. Then the function h whose value at the point (x, y) is given by
(4.1) h(x,y)
= P({0ieS  X(o() = x and F(of) = y})
is called the joint probability function of the random variables X and Y. (The domain of the function h is the set of all ordered pairs of real numbers, although h has nonzero values for only a finite number of such pairs.)
Let us suppose that X has possible values Xi, x2j • • *, XM and probability function /. Then
M
(4.2) /fo) = P(X = xi) > 0, £/(*/) = I
Similarly, if Y has possible values z/i, j/a, • • • , yn and probability function g, then
(4.3) g(yk) = P(Y = jte) >0,  g(yk) = 1.
& = 1
With this notation, the joint probability table of X and F is defined as the doubleentry array in Table 29. The probabilities listed in this table have the following properties:
(4.4) hfa, yk) > 0 for j = 1, 2,   , M;k = 1, 2, • • , N.
991220
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS (4.5) 2 h(x}, yk} = 1.
203
(4.6)
(4.7)
S h(x» yd = /(%)
k=l
M
for j = 1, 2, • • • , M.
= 1,2,
The inequality in (4.4) expresses the obvious fact that the probability of the joint occurrence of the events X = x3 and Y = yk is nonnegative. We note, however, that although we have assumed in (4.2) and (4.3) that the events X = x3 and Y = yk occur with positive probability, we must allow in (4.4) for the possibility that the intersection of these events is the empty set, and thus has probability zero.
TABLE 29
P(X = x)
h(xi,yk)
XM
h(xu,yir)
P(Y = y]
In (4.5), we meiely observe that the sum of all MN probabilities in the joint probability table (not including the entries listed in the margins) is 1. This sum can be calculated in any number of ways, but two methods are especially noteworthy. We can sum each row first, then add the rowsums; i.e.,
(4.8)
= 2 2
S yi
or we can sum each column first, then add the columnsums; i.e.,
991221
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


204 RANDOM VARIABLES / Chap. 4
N M N
(4.9) 2 h(xj} yk) = 2 2 A(z/, ?/*) = S g(yk) = 1.
allj,Jfe fc = lj = l k = l
In the first method, the rowsums are the probabilities with which the possible values of X occur. This fact is recorded in (4,6) and rollows from the observation that the event X = x3 occurs whenever one of the joint events (X = xy, Y = y) occurs for some value y of the random variable F. For different values of y, these joint events are clearly mutually exclusive, and so
This equality is equivalent to (4.6), since only possible values of F can contribute to the sum. We similarly can show that the sum of the entries in any column of the joint table is the probability with which the value of F determining that column occurs. This fact is recorded in (4.7).
Thus we see that from the joint probability table we can recover the probability functions of the random variables X and F by adding rows and columns. Since the resulting probabilities f(x/) for j = 1, 2, •  , M and g(yt) for k = 1, 2, • * •, N are written in the margins of the table, these probabilities are known as marginal probabilities, and / and g are referred to as the marginal probability functions of X and F, respectively. In both cases we note that the adjective "marginal" is technically redundant.
Let us now turn to the task of defining the important concept of independent random variables, to which we alluded at the end of our discussion in Example 4. 1. (We know the meaning of independent events and independent trials from Chapter 2.) The following definition seems reasonable, in view of our observations in Example 4.1.
Definition 4.2. Two random variables X and F defined on the same sample space S are said to be independent if and only if
(4.10) P(X = x,, Y = yk) = P(X  xj)P(Y = yk)
for j = 1, 2, • • *, M and k = 1, 2, • • , N. In other words, "X and F are independent random variables" means that the events X = Xj and F = yk are independent events for all pairs of possible values Xj and yk. Random variables that are not independent are said to be dependent.
Equivalently, we see that X and F are independent random variables if and only if
991222
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 205
(4.11) h(xj, yk) = f(x3)g(yk),
i.e., if and only if the joint probability table assumes the form of a multiplication table in which h(xj, yk), the entry in any row and column, is the product of f(xj), the probability in the row margin, and g(i/k), the probability in the column margin.
With this definition before us, a quick glance at Tables 27 and 28 shows that in Example 4.1, as we anticipated, X and Y are dependent random variables, but X and Z are independent.
Example 4.2. An urn contains three red and two green balls. A random sample of two balls is drawn (a) with replacement, and (b) without replacement. In either case, we define
0 if the first ball is green
1 if the first ball is red,
0 if the second ball is green
1 if the second ball is red.
We find the two joint probability tables given in Table 30. Note that,
TABLE 30
X
0 1 P(X  x)
0 1 2223 5*5 5*5 3233 5*5 5*5 2 5 3 5
P(Y = y) 2 3 5 5 1
0 1 P(X = x)
0 1 2123 5*4 5*4 3232 5*4 5*4 2 5 3 5
P(Y = y) 2 3 5 5 1
(a) with replacement
(b) without replacement
in (a), X and Y are identically distributed and are independent random variables. In (b), X and Y are also identically distributed, but now they are dependent random variables. Although it is always possible to derive the probability functions of X and Y from their joint probability function, as this example demonstrates it is generally impossible to reconstruct the joint probability table if only the marginal probabilities of X and Y are known.
991223
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


206 RANDOM VARIABLES / Chap. 4
Further insight into the reasonableness of our definition of independence of random variables is obtained by looking at conditional probabilities. Suppose we are interested in the event that X has the value Xj, given that 7 has the value yk. We find directly from the definition of conditional probability that
(Recall that in (4.3) we have assumed g(yk) ^ 0.) If we write (4.13) f(x3\yk) = P(Z = zyF = t/fc),
then for fixed Jc, we have a function defined with domain the set of possible values of the random variable X. To distinguish clearly the function from its value, we shall write /( • yk) for the function and f(x3 1 2/jb) for the value of the function at x = x}. We have N such functions, one for each possible value yk of Y. In terms of a functionmachine, the inputs of the /( •  ^machine are the possible values of the random variable X. If x3 is the input, then the corresponding output number is the conditional probability /(#y  yk) given in (4.13). We now show that each of the functions /( • 1 yk) is a probability function. By Theorem 1.3, it suffices to show that the values of the function are nonnegative and that these values add to 1. Since f(x3  yk) is defined in (4.13) as a probability, it is clearly nonnegative. Furthermore,
(414) /«
Hence /( •  yk) is a probability function. It is important enough to deserve a special name.
Definition 4.3. Let yk be any possible value of Y. The function /(• 2/i) whose domain is the set of possible values of X and whose value ffa  yk) is given by (4.13), or by (4.12), is called the conditional probability function of X} given Y = yk. The conditional probability function of F, given X = x3, is similarly defined as the function g( • \ x3) whose value at yk is given by
(4.15) g(yk \ x3) = P(Y = yk\X = Xy) =
Example 4.3. Refer to Example 4.2 and suppose that F has the value 1, i.e., that the second ball drawn is red. We want to determine
991224
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 207
the conditional probability function of X when the balls are drawn (a) with replacement, and (b) without replacement. We must therefore calculate /(O  1) and/(I  1) in both cases. The probabilities we need in order to use Formula (4.12) can be read directly from Table 30, and we obtain the following results, (a) With replacement:
/(I 1) =
(b) Without replacement:
/(o! i) =
Observe that in case (a), where X and Y are independent, the conditional probability function of X given Y = 1 has the same values as the probability function of JC; Le.,/(0 11) = /(O) and/(111) = /(I). But in case (b), where X and Y are dependent, knowing that the second ball drawn is red changes the probabilities of drawing a red or green ball on the first draw. For example, /(I) = f is the probability that the first ball is red in the absence of any information. When we are told that the second ball drawn is red, the conditional probability that the first ball is red decreases to /(I  1) = f.
The following result reformulates the definition of independence of random variables in terms of the conditional probability function, We leave the proof for the problems.
Theorem 4.1. The random variables X and Y are independent if and only if, for every possible value t/& of F, the conditional probability function of X given Y = yk and the (marginal) probability function of X have equal values for each possible value of X] i.e.,. if and only if
(4.16) ffa \ t/*) = /(zy) for j = 1,2,  * , M;k = 1,2,   , N.
This result shows that X and Y are independent whenever knowing the value of Y does not change the probability with which X has any of its values, or equivalently (see Problem 4.14), whenever knowing
991225
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


208 RANDOM VARIABLES / Chap. 4
the value of X does not change the probability with which Y has any of its values.
We shall return to conditional probability functions in a later section, but now we take up two results that are strongly suggested by our intuition. The first of these asserts that if X and Y are independdent, then so are u(X) and v(Y) for any functions u and v. For example, if X and Y are independent, then with u(x) = x — MX and v(y) = y — Mr this theorem will permit us to conclude that X — jux and Y — \i? are independent; with u(x) = x2 and v(y) — y2, that X~ and Y2 are independent; etc.
Theorem 4.2. Let X and Y be independent random variables. Let u and v be functions for which u(X) and u(7) are defined in the sense of Definition 2.2. Then u(X) and 0(7) are also independent random variables.
Proof. According to Definition 4.2, to prove u(X) and 0(7) are independent it suffices to prove that
P(u(Z) = x, i,(7) = y) = P(u(X) = x)P(v(Y} « y) for every pair of numbers x and y. Now
P(u(X) = x, t>(7) = y) = 2* P(X = */, 7 = Ifr),
where the asterisk indicates that the sum is to be taken over only those values of j and k for which ufa) — x and v(yk) = y. Since X and 7 are independent by hypothesis, we can apply (4.10) to obtain
P(u(X) = x, t>(7) = ?/) = 2* P(JC = xy)P(F = yjb)
« P(u(X) = x)P(t;(7) = y), and the proof is complete.
Our next result concerns two random variables X and 7 such that the value of X is determined by the first trial and the value of Y is determined by the second trial of a twotrial experiment. If the trials are independent (as defined in Section II.9), then we would be dissatisfied with our theory if it did not enable us to prove that X and 7 are independent random variables. For example, let X and 7 denote respectively the sum obtained in the first and second rolls of a pair of dice. If the two rolls (trials) are independent, then we expect that X and 7 are independent random variables. We leave for the reader
991226
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 209
the task of showing that the following result is an immediate consequence of Theorem II.9.1.
Theorem 4.3. Let an experiment consist of two independent trials. If the value of a random variable X is determined by the first trial and the value of a random variable Y is determined by the second trial, then X and Y are independent.
We conclude this section by recording for later use the extension of some of our results to the case where more than two random variables are defined on the same sample space. First we make the natural extension of Definition 4.2.
Definition 4.4. Let n be any positive integer greater than 1. The random variables FI, T"T2, • • , Vn defined on a sample space S are said to be independent if and only if
(4.17) P(7i = V!, F2 = »2,   , Vn = Vn)
« P(Vi = V1)P(V2 « *) '.  ' P(Vn = Vn)
for all combinations of possible values v\ of FI, y2 of F2, • • , vn of VnIn other words, "Fi, F2,  • •, Vn are independent random variables" means that V\ = vi, Fa = 02, • • •, Fft ~ vn are independent events (in the sense of Definition II.8.3) for all possible values vi, 02, • * •, vn.
Corresponding to Theorems 4.2 and 4.3 we have the following results whose proofs we leave for the reader.
Theorem 4.4. Let FI, F2, •••, Vn be independent random variables. Let «i, uz, — , un be functions for which wi(Fi), t/2(F2), • • •, un(Vn) are defined in the sense of Definition 2.2. Then Wi(Fi), uz(Vi), • • •, ^n(F«) are also independent random variables.
Theorem 4.5. Let an experiment consist of n independent trials. If the value of random variable F, is determined by the j'th trial for j = 1, 2, *  , n, then the random variables FI, F2, • • •, Fn are independent.
We continue our study of joint probability functions in the next section.
PROBLEMS
4.1. Let 7 and Z be the random variables defined in Example 4.1. Construct the joint probability table of Y and Z, sketch the corresponding threedimensional probability chart, and determine whether or not F and Z are independent.
991227
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


21 0 RANDOM VARIABLES / Chap. 4
4.2. Modify Example 4.1 by redefining 1 as the algebraic difference of the number of tails from the number of heads. Construct the joint probability table of X, Z and sketch the corresponding threedimensional probability chart. Are X and Z independent?
4.3. Suppose three indistinguishable objects are distributed at random into three numbered cells. Let X be the number of empty cells and Y the number of objects in the first cell. Construct the joint probability table of X and 7. Are X and Y independent?
4A. Let X be the larger of the two numbers and Y be the sum of the numbers showing when two fair dice are rolled. Construct the joint probability table of X and 7. Are X and Y independent random variables?
4.5. Let X denote the number of spades and F the number of hearts in a bridge hand. Write a formula for
h(x, y)  P(X = x,Y = y) and prove that X and 7 are dependent.
4.6. Three cards are drawn from the 12 face cards of an ordinary deck. Let X be the number of red jacks and 7 be the number of red queens. Construct the joint probability table of X and 7, sketch the corresponding threedimensional probability chart, and show that X and 7 are identically distributed but dependent random variables.
4.7. A fair coin is tossed four independent times. Let X be the number of heads obtained in the first two tosses and 7 the number of heads obtained in the last two tosses. Construct the joint probability table of X and 7 and sketch the corresponding threedimensional probability chart. Show that X and 7 are independent by using Definition 4.2 and also by invoking Theorem 4.3.
4.8. The joint probability function of X and 7 is given by
MS, y) = A(z2 + 2/2) for x = 0, 1, 2, 3 and y = 0, 1.
(a) Show that the marginal probability function of X is given bfc
/(*) = A(2z2 + 1) for x = 0, 1, 2, 3.
(b) Show that the marginal probability function of 7 is given by
7) for y = 0,l.
(c) Show that the conditional probability function of X given 7 = y is given by
for* = °> x> 2> 3 and y = °> L
991228
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 4 / JOINT PROBABILITY FUNCTIONS 211
(d) Show that the conditional probability function of F given X = x is given by
Q(y \ 3 = 1^^ for x = 0, 1, 2, 3 and y = 0, 1.
4.9. We select one of the integers 1, 2, 3, 4. 5. After discarding all integers (if any) less than the selected integer, we draw one of the remaining integers. (For example, if we select 3 first, then the second draw is made from the integers 3, 4, 5.) Let X and Y denote the numbers obtained on the first and second draws, respectively.
(a) Construct the joint probability table of X and F.
(b) Determine the marginal probability functions of X and of F.
(c) Determine the conditional probability function of F given X = 3,
(d) Determine the conditional probability function of X given F = 3.
(e) Find P(X + Y > 7) and P(Y  X > 0).
4.10. Suppose X has only one possible value, this value therefore occurring with probability 1. Show that X and F are independent for any random variable F.
4.11. For the random variables X and Z defined in Example 4.1 of the text, determine the value of P(X < xt Z < z) for all real numbers x and z. (Hint. Refer to Figure 24 (a) and divide the xz plane into a number of regions such that P(X < x, Z < z) has the same value at all points of any one region.)
4.12. The joint distribution function of X and F is defined for all real numbers x and y by the equation
H(x, y) = P(X R and y > R.
991229
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


212 RANDOM VARIABLES / Chap. 4
(e) Above each point (x< y) in the xy plane, imagine a point drawn at the height H(x, y). The set of all points drawn in this way is a surface which is the threedimensional graph of the joint distribution function H. Describe the kind of surface one obtains.
4.13. Let X and F be independent random variables. Choose any two rows of their joint probability table. Show that there is a number (which will depend on the rows you choose) such that the probabilities in one row are obtained by multiplying the corresponding probabilities in the other row by this number.
4.14. Let g(yk \ x3) denote the conditional probability of Y = yu given A" = x,\ Show that if f(x3 \ yk) = f(xt) for all possible values x, of X and yk of F, then for all these values also g(yk \ £3) = g(yk)*
4.15. (a) Prove Theorem 4.1.
(b) Prove Theorem 4.3.
(c) Show that the converse of Theorem 4.2 is false b}' giving an example of two dependent random variables X and F for which X3 and F2 are independent.
4.16. (a) Show from Definition 4.4 that if 7i, F2, • • •, F« are independent,
then any smaller number of random variables taken from these n are also independent.
(b) Let an experiment consist of n independent trials. For any positive integer k < n, we can think of this experiment as made up of two supertrials, the first k trials being the first supertrial and the last n — k trials being the second supertrial. Show that these supertrials are independent, and hence conclude from Theorem 4.3 that if X is a random variable determined by the first k trials and F is a random variable determined by the last n — k trials, then X and Y are independent.
5. Mean and variance of sums of random variables; the sample mean
We shall see in this section that if two random variables X and F are defined on a sample space S, then there are automatically many other random variables also defined on 8. In particular, the sum X + F and the product XY turn out to be especially important. We also extend our results to the case where more than two random variables are defined on S, and are then able to prove some theorems of great interest in the branch of statistics known as sampling theory.
Our first task is to develop the bivariate analogue of Theorem 2.1 as an aid to computing means of random variables that are functions
991230
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 21 3
of X and F. Suppose z is a numericalvalued function whose domain is a set of ordered pairs of real numbers and let z(x, y) denote the value of z at the ordered pair or point (z, y). If the domain of z includes all the ordered pairs of values of X and Y, then for each element o» € S we can first find the corresponding values X(ot) and F(o»), and then evaluate z at the point (X(o%)} 7(0*)). See Figure 25. In
I 3Hw
Figure 25
this way, to Oi e S (the input) we make correspond the (output) real number z(X(ot), F(oa)), and thus we have a new random variable defined on S. This random variable is denoted by z(X, F). For example, if z(x, y) = x 4 y, then
z(X, Y)=X+Y
is the sum of the random variables X and F; if
(\ f ? a} \
then
z(X, Y) = (X
is the product of the deviations of X and Y from their respective means; etc. In the example that follows, we illustrate how to determine the probability function of the random variable z(X, Y) from the joint probability table of X and 7. The mean of g(Z, F) can then easily be computed.
Example 5.1. Consider the random variables X and Y of Example 4.1. The possible values of X and F, together with their joint probabilities, are given in Table 27. Let z(x, y) = x + y so that U = z(Xj F) = X + F. From the joint probability table, we can
991231
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


214 RANDOM VARIABLES / Chap. 4
determine the possible values of U as well as the probability with which each value occurs. For example,
P(U = 2) = P(X = 0, F  2) + P(X = 1, F = 1) =  + 1 = 1.
In this way, we obtain the entries in the following probability table for the random variable U = X + Y:
u 0 1 2 3 4
P(U == u) 1/8 1/4 1/4 1/4 1/8
From this table, we calculate the mean of U:
E(U) = E(X + F) = 0() + 1(1) + 2(1) + 3(1) + 4(1) = 2.
From the marginal probability functions of X and F, also given in Table 27, we find that
E(X) = 0(» + 1() = 1, E(Y) = 0() + l(f) + 2(f) + 3(1) = f.
Observe that E(X + F) = E(X) + E(Y), a result that we will soon establish for all random variables X and F.
If we define z(x, y] as the product rather than the sum of x and y, then V — z(X, F) = XY is a random variable whose probability table is similarly found:
V 0 1 2 3
P(V = v) 1/2 1/8 1/4 1/8
Now we compute the mean of F,
E(V) = E(XY) = 0(1) + Observe that E(XY) ^ E(X)E(T).
+ 2(1) + 3(1) = 1.
The reader should note that what we do to determine the probability function of z(X, F) is collect all possible pairs of X and F values that lead to the same value of z(X, F). But it is more convenient not to do this when we want to compute the mean of z(X} F). The following result tells us how to compute E[z(X, F)] directly from the joint probability table of X and F without first determining the probability function of z(X, F). The proof is similar to that of Theorem 2.1, and we leave it for the problems.
991232
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 21 5
Theorem 5.1 . Let X and Y be random variables with joint probability function h. Then
(5.1) E[z(X, F)] = 2 z(xh
In words, we find E[z(X, F)] by moving from cell to cell in the joint probability table of X and F, multiplying the value of z(X, F) corresponding to each cell by the probability appearing in that cell> and then adding these products for all cells.
Example 5.2. Refer to Example 5.1 and let us illustrate the use of Formula (5.1) by recalculating the means of X + Y and XY. We find directly from Table 27, moving across the first row and then the second,
E(X + F) = 0() + l(i) + 2() + 3(0) + 1(0) + 2()
= 2, as before.
There is of course no need to write down terms that have zero factors. Indeed, any cell in the joint probability table for which either zfo» yk) — 0 or h(x3't yk) = 0 can be skipped in computing E[z(X, F)]. For example, we note that XY has the value 0 for five of the eight cells in Table 27. Hence we skip these and find, as in Example 5.1,
Theorem 5.1 enables us to prove the following extremely important and oftenused results.
Theorem 5.2. Let X and F be any random variables defined on a sample space S. Then
(5.2) E(X + F) = E(X) + E(Y).
In words, the mean of the sum of two random variables is equal to the sum of their means.
Proof. According to Formula (5.1) we have E(X + F) = S (xf + yk}h(xh yk)
allj,fc
= 2 x3h(xhyk}+ S
In the first term on the right, we sum over rows and then add the rowsums; in the second term, we sum over columns and then add the columnsums. We recall (4.6) and (4.7) and find
991233
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


216 RANDOM VARIABLES / Chap. 4
E(X + Y) = S xj 2 hfa, yk} + S 0* 2 Afe w)
j = l fc=i fc*l y = i
ilf tf
= S Z//fo) + 2 2/J0G/*)
= JE(X)+E(Y).
Combining this result with Formula (2.7), we see that for any constants a and 6,
(5.3) E(aX + 67) = aE(X) + bE(Y}.
Still more general is the following theorem.
Theorem 5.3. Let n be any positive integer. If X\, X^ ••, Xn are any random variables defined on a sample space S, and if ai, a2, • • • , an are any constants, then*
(5.4) E(alXl
= alE(Xl)
Proof. The result is true for n == 1 and n = 2 by Formula (5.3). The theorem is proved (by mathematical induction) as soon as we show that if the theorem is true for any positive integer, say n = k, then it is also true for the next integer, n = k + 1. Let us therefore assume that (5.4) is true for n — k. That is, letting Y = a:Xi + • • • + dkXk, we are assuming
E(Y) =a1#(X1)+ .. +akE(Xk).
The key idea of the proof is the observation that the sum of k + I random variables can be thought of as the sum of two random variables to which (5.3) can be applied. In particular,
E(a1X1 +  h akXk + ak+lXk^} = E(Y
+ ak+lE(Xk+l}.
But this last equality shows that (5.4) is true for n — k + 1, and so the proof is complete.
* Strictly speaking, the sum diXi + a^Xz + • • • f a>nXn appearing in Formula (5.4) has been defined only if n = 1 or n = 2. We make the natural definition that the sum for any positive integer n is the random variable whose value at each 0i € S is the number aiXi(ot) f 0^X2 (ot) + • • 
991234
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN Example 5.3. We apply (5.4) to derive a useful identity:
E[(X  K)(Y  /xr)] = E(XY  MzF  »Y
= E(XY)  »XE(Y)  UL7E(
Except for sign the last three terms are equal. Hence (5.5) E[(X  K)(Y  w)] = E(XY) 
217
We turn now to some results leading to a formula for the variance of a sum of random variables.
Theorem 5.4. Let X and Y be independent random variables defined on a sample space S. Then
(5.6) E(XY) = E(X)E(Y).
In words, the mean of the product of two independent random variables is equal to the product of their means.
Proof. By Theorem 5.1 we write
E(XY) = .
2 : ally,*
But the assumed independence of X and F means, according to (4.11), that h(x3; yk) = f(xj)g(yk) for all j and k. Hence
M N
= 2 Xjf(x3) 2
3 = 1 Jbl
= E(X)E(Y).
It is very important to note that the converse of Theorem 5.4 is false. As the following example shows, it is possible for (5.6) to be true for random variables X and Y that are dependent.
Example 5.4. Suppose X has probability table
X I 0 1
P(X = x) 1/4 1/2 1/4
Let Y = X2. Then X and Y are surely dependent, since the value of X determines the value of F. This dependence is obvious from the joint probabilities of X" and F, as given in Table 31.
991235
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


218
RANDOM VARIABLES / Chap. 4
TABLE 31
X ^^^^ 0 1 P(X = x)
1 0 1/4 1/4
0 1/2 0 1/2
1 0 1/4 1/4
P(Y = y) 1/2 1/2 1
Nevertheless, the reader can quickly check that E(X) = 0, E(Y) = , and £(JST) = E(X*) = 0, so that (5.6) holds. Recalling that (5.6) did not hold for the dependent random variables in Example 5.1, we conclude that (5.6) holds for all pairs of independent random variables and some but not all pairs of dependent random variables.
It is convenient to record here the following corollary: (5.7) E[(X  K)(Y  MF)] =0 if X, Y are independent.
This follows immediately from the identity in (5.5) if we apply Theorem 5.4.
We are now able to state a rule for finding the variance of the sum of two independent random variables.
Theorem 5.5. Let X and F be independent random variables defined on a sample space S. Then
(5.8)
Var(Z + F) = Var(Z) + Var(F).
Tn words, the variance of the sum of two independent random variables is equal to the sum of their variances.
Proof. By definition of variance, we have
Var(Z + F) = E([(Z + F)  E(Z + F)]2) = E([(X  Mz) + (F  MF)?),
where we have rearranged terms in the bracket after using (5.2). Now we perform the indicated squaring operation to obtain
Var(Z + F) = E[(Z  Mx)2 + 2(Z  Mx)(F  w) + (F  jur)2] r 9) = E[(Z  Mx)2] + 2E[(X  «x) (7  Ay)]
+ E[(Y  Mr)2],
991236
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 219
this last equality resulting from the use of (5.4). The middle term on the right vanishes according to (5.7). The other two terms on the right are, by Definition 3.1, precisely Var(X) and Var(F). We have therefore completed the proof.
Now if X and F are independent, then so are aX and bY for any constants a and b. (This obvious fact is technically a consequence of Theorem 4.2.) Hence we can apply Theorem 5.5 to aX and 6F, and so find that
Var(oZ + bY} = Var(oZ) + Var(6F).
Now we use (3.11) to conclude that for any numbers a and 6, if X and Y are independent, then
(5.10) Yar(aX + bY) = a2 Var(X) + 62 Var(F).
Still more general is the following result, whose proof we leave for the problems.
Theorem 5.6. Let n be any positive integer and suppose Zi, X2, *  * , Xn are independent random variables defined on a sample space S. Then for any constants a\, a2, •**,«« we have
(5.11) Var(aiZi + a«X* +  • • + a JQ = S a?
1 = 1
In particular (if ai = a2 = • • • = an = 1), the variance of the sum of any finite number of independent random variables is equal to the sum of their variances. Note that the corresponding result for the mean holds for any random variables, independent or dependent.
Example 5.5. A deck of cards numbered 1,2, • • •, n is shuffled and placed face down on the table. As each card is turned, a subject tries to guess what number it will be. Suppose the subject does not remember from one card to the next and calls his guesses independently and at random; i.e., he has the same probability 1/n for a correct guess at each trial (guess) of this ntrial experiment, and the trials are independent. (One way of doing this would be for the subject to have a duplicate deck and make each of his guesses by selecting one card at random from his deck. Note that our independence assumption means that the subject draws each card from the full duplicate deck; i.e., he is drawing a random sample of n cards with replacement from the deck of n cards. The corresponding problem in which his guesses are determined by drawing a random sample without replacement is discussed in Problem 6.8 of the next section.)
991237
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


220 RANDOM VARIABLES / Chap. 4
Let X denote the random variable whose value for any outcome of the ntrial experiment is the number of correct guesses made by the subject. We shall find the mean and variance of X by expressing X as a sum of n random variables, and then using Formulas (5.4) and (5.11).
For k = l} 2,  • , n, let
{0 if the kth guess is wrong f probability 1  J 1 if the fcth guess is correct ( probability ~ V
Then X = Xi + X2 + •  • + Xn, since the value of the sum is equal to the number of 1's in the sum, and hence is precisely the number of correct guesses made by the subject, or the value of X. Now for k = 1, 2,   •, n, we find
(5.13)
Hence, by (5.4),
E(X) = E(X1 + Xa + • • • + Xn) = E(Xl)
= 1 + 1+.. .+1
n n n
= 1.
Thus we see that the mean number of correct guesses is 1, and therefore does not depend upon n, the number of cards in the deck.
To compute Var(X), we note that X* is determined by the fcth trial of the experiment. Since the trials are independent, we conclude by Theorem 4.5 that Xi, Z2, •  •, Xn are independent random variables. Hence (5,11) is applicable. Since for k = 1, 2, • • , n,
(5.14)  I  (1Y  "
n \nj
we obtain
= Var(JSTi) + Var(Z2) + •   + Var(Zn) n  1
= n
991238
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS/ SAMPLE MEAN 221
In Chapter 5, we shall determine the probability function of X, but note that our method allows the calculation of the mean and variance of X without knowing this probability function. (Cf. Problem 3.11, Part 2, which is the special case of this problem when there are only three cards in the deck.)
We turn now to an application of our theorems to experiments made up of a number of independent repetitions of the same trial. Such experiments and random variables associated with them can be interpreted in a number of ways and, in particular, supply a mathematical model for repeated measurement in the sciences and for sampling with replacement in statistics.
Suppose a bowl contains N chips, each chip having a number on it. Some chips may have the same number on them, and we let Xi, £2, • * •, XM (M < N) be all the different numbers in the bowl. Suppose the number x3 occurs // times, so that the relative frequency with which this number appears in the bowl or the proportion of all chips having this number is fj/N. It follows that
(5.15)
M
M f
or S ~ = yitf
If one chip is selected at random from the bowl and we denote by X the random variable whose value is the number on this chip, then the probability function of X is given by the following table:
X Xi X* ... xx
P(Y = x) fi/N A/A fu/N
Thus we have a special probability function whose value for any x3 is the proportion of chips with x, on them among all chips in the bowl;
(5.16) /fe) = P(Z = a*) = for j = 1, V • , If.
The reader can verify that our definitions of mean and variance, when applied to this particular random variable X, yield the formulas
(5.17) ' fe = E(X) = ^ S x^,
(5.18)
crj = Var(X) = p 2 (x, 
991239
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


222 RANDOM VARIABLES / Chap. 4
(5.19) *J = Var(Z) = ^ S xffj  &,
the last equality arising by use of (3.10).
It is customary in this context to say that we have a population of N chips and then to call px and dx the population mean and the population variance of X. As we know (Theorem 1.3), we can consider X" defined on the sample space S = {xi, x%, • • •, XM} whose simple events are assigned probabilities as given by the probability function of X, i.e., P({XJ}) = fs/N forj = 1, 2, • • , M.
From the population of N chips, we now draw n chips, replacing each before the next draw. We consider this as an experiment made up of n independent trials, each trial being defined by the sample space S and the trials being independent by virtue of our assumption that we are sampling with replacement. Our mathematical counterpart for this ntrial sampling experiment is the sample space given by the Cartesian product set S X S X • * • X S (n S's), together with an assignment of probabilities in accordance with the product rule discussed in Section II.9.
For k = 1, 2, • • •, n, let X& be the random variable whose value is the number on the ftth chip drawn from the bowl. Thus we have n random variables Xi, X% • • •, Xn defined on the Cartesian product sample space. Since each trial is an exact duplicate of any other, these random variables all have the same probability function. Furthermore, since Xk is determined by the fcth trial and the trials are independent, it follows that Xi, X^  • •, Xn are independent random variables. We summarize by saying that the random variables Xij X%, —, Xn are independent and identically distributed, each with mean jux and variance ax.
It is thus clear that our sampling experiment is completely determined as soon as we know the common probability function of the XkB. For then we are given the possible sample values that can arise in each trial, together with their probabilities. In other words, it is meaningful to talk of a population specified by the probability function of a random variable X. Indeed, for this reason sampling with replacement is often referred to as sampling from a probability function.
The random variable 2f given by
(5.20) X = ** + *«+•••+ X*
n
991240
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS/ SAMPLE MEAN 223
is called the sample mean of X. The value of Z for any selection of n chips is just the arithmetic mean or average of the numbers on the chips. An experimenter who has incomplete knowledge of the composition of the bowl could nevertheless draw his random sample from the population and obtain a value of the sample mean X. For example, if we think of the chips as corresponding to N people in a given population and the number on each chip as the income of the corresponding person, then the value of X is just the average of the n incomes selected in the random sample. Or, if the numbers on the chips are N possible measurements of some quantity, say the length of a bar to the nearest thousandth of a centimeter or the tune to the nearest tenth of a second that it takes a rat to complete a maze, then the value of X is just the average of n such measurements. Before studying the random variable Z in general, we pause to present a particular example to help fix these ideas.
Example 5.6. Suppose our population is specified by the probability table
X I 0 2
P(X = x) .1 .5 .4
The reader can easily verify that the population mean and variance are given by
MX = 7, A « 1.21.
TABLE 32
Sample Probability of Drawing This Sample Value of X For This Sample
1, 1 .01 1
1,0 .05 i
1,2 .04 i
0, 1 .05 1
0,0 .25 0
0,2 .20 i
2, 1 .04 }
2,0 .20 i
2,2 .16 2
991241
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


224 RANDOM VARIABLES / Chap. 4
In Table 32, we list all possible samples of size n = 2 taken with replacement from this population, the probability of obtaining each sample, and the corresponding value of the sample mean X. We thus find the following probability table for X, the sample mean:
X 1 i 0 * 1 2
P(X = Z) .01 .10 .25 .08 .40 .16
And the reader can again verify that ^ jo ^ variance of X, are given by
Mz = 7, 4 = .605.
°f %j and 4*
We observe that
and 4 = ~>
i.e., the mean of the sample mean X is equal to the population mean of Xj and the variance of the sample mean is equal to the population variance of X divided by the sample size. In other words, although the values of A" and of X have the same average, the values of X spread less about this common average than do the values of X. A similar procedure for samples of size n = 3 (there are now 27 possible samples) yields the following probability table for the sample mean X. (Note that we do not complicate our notation by explicitly indicating the sample size when we write the symbol for the sample mean. It is therefore important to keep the sample size clearly in mind when writing X.}
z 1 i i 0 i ! 1 1 2
P(X = Z) .001 .015 .075 .137 .120 .300 .048 .240 .064
We compute the mean and variance of X and find
MX = 7, 4 = .403., so that for samples of size 3,
2 ® Y
* = 
991242
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 225
We see that X and X again have the same mean, but as compared with the values of X (which can be considered values of X for samples of size 1) or the values of X for samples of size 2, the values of X for samples of size 3 show less spread about the common mean nx. This fact corresponds to our intuitive feeling that we improve our estimate of the population mean as we take averages based on larger and larger samples from the population. We return to this point and make it precise in the theorems that follow.
With the results of this example before us, it should come as no surprise that the following general theorem holds.
Theorem 5.7. Let n be any positive integer and let Xi, X^ •  •, Xn be n independent, identically distributed random variables, each with mean jux and variance a*. If
/rrniX "Vs Xi ~h X% "4" ' ' * "t" Xn
(5.21) X =>
then
(5.22) fif = MX and o = — *
n
In words, for sampling with replacement from a population given by the probability function of a random variable X, the mean of the sample mean X is equal to the population mean of X, and the variance of the sample mean is equal to the variance of X divided by the sample size.
Proof. We first apply (5.4) to obtain
_ 1 " n
Similarly, applying (5.11) with Oi = • • • = an = 1/n, we find
i 4" * * * "f" Xn
 
* * * "f" Xn \
  J
= \ [Var(ZO +  •  + Vax(Zn)] n"
991243
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


226 RANDOM VARIABLES / Chap. 4
Note that the standard deviation of the sample mean is the standard deviation of X divided by the square root of the sample size:
(5.23) o* =
n
Thus, as the sample size n increases, the values of the sample mean 3T tend to become more concentrated about the mean yx.
Observe that Theorem 5.7 was proved without finding the probability function of X. For applications of this theorem to statistical problems, it becomes important to know more about X than its mean and standard deviation. Unfortunately, we must leave these interesting matters at this point, for they lead to probability problems that cannot be formulated using finite sample spaces.
We can however use Theorem 5.7 to prove the following result •which is a special form of the socalled law of large numbers.
Theorem 5.8. Let a population be specified by a random variable X with mean px and standard deviation ax Let X be the mean of a random sample of size n drawn with replacement from this population. Let c be any positive number. Then as n increases without faound,
(5.24) P&x  c < X < MX + c)
approaches 1. In other words, by choosing the sample size n sufficiently large, the probability that the value of the sample mean differs from the population mean by at most c can be made as close to 1 &s we like. Or, more colloquially, since c can be taken as small as we please: by choosing the sample size sufficiently large, we can be as sure as we like (short of certainty) that the value of the sample mean will be as near the population mean as we like.
Proof. To the random variable X we apply Chebyshev's inequality (3.15) and find that
P(*»>c)<&
We use (5.22) to write /% and 1 
991244
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 227
Eut as n increases, the quantity o^/wc2 decreases and approaches
2
zero. Hence 11 approaches 1 as n gets larger and larger, and so
TIC"
P(\X — px\ < c), which is just the probability in (5.24), can be made as close to 1 as we like by choosing n sufficiently large. This completes the proof.
Example 5.7. Let the value of X corresponding to each person in a certain population be that person's annual income in thousands of dollars. Suppose pz = 6.5 and ex = 2.1. A random sample of n persons is drawn with replacement from this population and a value of X, the average income of these n persons, is obtained. We want the probability to be greater than .9 that this value differs from the population mean by at most .5. How large must the sample size n be?
We seek the smallest value of n for which (5.26) P(X  px\ < .5) > .9.
Putting c = .5 and 1  —•
n
Hence (5.26) will be true if HM/n is less than .1 or if n > 176.4. The desired closeness of the sample mean income and the population mean income is therefore achieved with a sample of size n — 177. (This is a most conservative figure, since it applies no matter what probability function X has. In more advanced work, one derives the approximate form of the probability function of 3T, and it is then possible to show that a sample of size n = 50 will suffice in this example.)
We conclude by showing how the law of large numbers can be used to supply a theoretical counterpart to our intuitive feeling that if an event A occurs / times in n identical trials and if n is large, then f/n, the proportion of times A occurs, should be near the probability P(A) of the event A. We let the random variable X have the value 1 if the event A occurs, and have the value 0 otherwise. Thus X has the following probability table:
X 0 1
P(X = x) 1  P(A) F(A)
991245
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


228 RANDOM VARIABLES / Chap. 4
We note that px = P(A). Also X, the mean of a random sample of size n drawn with replacement from the population specified by the random variable X, is just the proportion of times the event A_occurs. (For Xi + — + Xn is the number of times A occurs and X equals this number divided by n.) According to Theorem 5.8, by taking n sufficiently large, the probability can be made arbitrarily close to 1 that the proportion of times A occurs will be as close as we like to the probability of A. (This fact, due to James Bernoulli, dates back to 1713.) It is in this form that we find support for the interpretation of probabilities as proportions in a large number of repeated independent trials.*
PROBLEMS
5.1. Suppose X and 7 have the following joint probability table:
X ^^x^^ 1 2 3 P(Z = x)
1 .1 .1 0 .2
2 .1 .2 .3 .6
3 .1 .1 0 .2
P(Y = y) .3 .4 .3 1
(a) Determine the probability function of X + Y, and thus compute E(X H 7). Check your answer by using (5.2).
(b) Determine the probability function of XY and thus compute E(XY). Then check your answer by using (5.1) to find E(XY) directly from the joint probability table.
(c) Show that (5.6) is true but that X and 7 aie dependent.
5.2. Let X and 7 have the joint probability table given in the preceding problem. In each of the following parts, a function z is defined by giving its value z(x, y) for any real numbers x and y. Determine the probability function of the random variable z(X, 7) and calculate E[z(Xj 7)] from the probability function, and also by using (5.1).
* There are other interpretations of probability. See for example, the discussions in L. J. Savage, The Foundations of Statistics, John Wiley and Sons, Inc., 1954, especially pp. 34, 5668, and in E. Nagel, Principles of Ihe Theory of Probability, International Encyclopedia of Unified Science, Vol. 1, No. 6. University of Chicago Press. 1939.
991246
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS; SAMPLE MEAN 229
if *£ y
if x > y
(c) z(x, y) = x/y
(d) z(x, y) = y/x
(e) z(x, y) = x2 + y*
(f) z(x, y) = Vz2 + i,2
5.3. Which of the following are true for all random variables X and Y defined on a sample space? For those that are true for some but not all X and F, find a pair X, F for which the statement is true and another pair for which it is false. (Cf. the preceding problem.)
(a) J5[min(X, F)] = min[#(Z), E(Y)]
(b) E[max(X, F)] = m
(c) E(Z/F) = E(X)/E(Y)
(d) E(Z/F) = 1/E(Y/X)
(e) E(Z2 + F2) = #(Z2) +
(f) [E(VX* + F2)]2 = ^(Z2 + F2)
5.4. In the text, corollary (5.7) is proved using the identity in (5.5). Prove the corollary without using this identity. (Hint: Use Theorem 4.2.)
5.5. X has mean 50 and standard deviation 12. F has mean 30 and standard deviation 5. X and F are independent. Find the mean and standard deviation of (a) X + F, (b) X  F, (c) 3X + 2F. (Note:
crx
5.6. Start with the definition of Var(J$T) given in (3.6) and use the theorems of the present section to prove that Var(Z) = E(X2)  A. (Cf. Theorem 3.2 and its proof.)
5.7. (a) Interpret the result E(X + 6) = E(X) + 6 established in Section 2
as a special case of Theorem 5.2. What then is the random variable F?
(b) Interpret the result E(aX) = aE(X) established in Section 2 as a special case of Theorem 5.4. What then is the random variable F and why (as needed to apply Theorem 5.4) are X and F independent?
5.8. Prove Theorem 5.1.
5.9. Generalize Theorem 5.4 by proving that if Zi, X% • * • , Xn are independent (n any positive integer), then
•  Xn) =
991247
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


230 RANDOM VARIABLES / Chap. 4
5.10. Let Fi, F2, F3 be independent random variables. Define X = Vi + V* and F = Fi + F3. Show that
E(XY)  E(X)E(Y) = Var(Fi).
5.11. (a) Let Xi, X% • • •, Xn be independent random variables. Show that
if k is any positive integer less than n and 7^ = aiZi + * • * + a^Z^, then Yk and Z&H are independent, (b) Prove Theorem 5.6 by mathematical induction.
5.12. A random sample of size n is drawn with replacement from a population and we find the sample mean *X has mean ju_f and standard deviation cjr. What happens to ^ and &% if the sample size is quadrupled?
5.13. Let a population be specified by the following probability table of the random variable X:
X 0 1 2
P(X = x) 1/4 1/2 1/4
(a) Find /iz and
991248
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 5 / MEAN AND VARIANCE OF SUMS/ SAMPLE MEAN 231
comes (Xvalues) are thus given coded incomes (Fvalues) of —3, the four people with $5000 incomes are given coded incomes of 0, etc. Since the Fvalues are small numbers, it is relatively easy to compute MF and a\ by using (5.17)(5.19) with x3 replaced by y,. By using the coding equation relating Y and X, you can easily find fix and u\ from MF and oi.)
(b) Determine the probability function of JT, the sample mean, based on samples of size 2 drawn with replacement from the population of ten people. Then compute JJLX and
991249
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


232 RANDOM VARIABLES / Chap. 4
6m CovarSance and correlation; the sample mean (cont.)
Suppose we are given the joint probability table of two random variables X and Y denned on the same sample space S. Each of these variables has a mean and a variance, but the joint probability table is not needed to compute MX, <&, and JUF, oy; these numbers are determined by the (marginal) probability functions of X and F. In this section, we define some numbers that measure how the possible values of X are related to the possible values of F; such numbers will depend upon the joint probability function of X and F.
We are led to our first definition by reviewing the proof of Theorem 5.5. There we showed that
(6.1) Var(X + F) = Var(X) + Var(F) + 2E[(X  Mz)(7  Mr)L
and since X and F were assumed to be independent, we invoked (5.7) to conclude that the last term in (6.1) vanishes. But (6.1) is a result worth having, since it holds for dependent as well as independent random variables. We therefore want now to study the last term in (6.1), a term that we so hurriedly skipped over hi the preceding section. As usual, a special symbol and name are introduced.
Definition 6.1. Let X and F be random variables defined on a sample space S. The covariance of X and F, denoted by CovpT, F), is defined as the number given by
(6.2) Cov(Z, F) = E[(X  Mx)(F  MF)] or equivalently, because of the identity in (5.5),
(6.3) Cov(X, F) = E(XY)  ww. Using this notation we rewrite (6.1) and obtain
(6.4) Var(Z + F) = Var(Z) + Var(F) + 2 Cov(Z, F).
Let us also note here that Definition 6.1 treats X" and F symmetrically, i.e.,
(6.5) Cov(X, F) = Cov(F, X),
and since X — &% and F — /IF each have mean zero,
(6.6) Cov(Z  MX, F  MF) = Cov(Z, F).
Many problems require a generalization of (6.4) to more than two random variables. For example, by the definition of variance we
991250
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVAR1ANCE AND CORRELATION 233
Var(Zi + X2 + Z,) = E([Xl + X, + Z3  E^ + X2 + X3)]2). If we use (5.4) and write jj,3 for E(XJ)9 then (6.7) Var(Zi + Z2 + Z8)
= tfdXZx  Ml) + (Z2  MS) + (Z8  ju3)]2).
Let us now recall from algebra (or by using the multinomial theorem of Chapter 3) that
(ai + a2 + a8)2 = a? + a + a
3
= 2 a,2 + 2 2 a,afc,
where the last sum is understood to include the (9) = 3 cross
products a3ak with 1 < j < k < 3. Applying this to (6.7) by putting aj = Xj — My, we firid
Z2 + Z8)
 £[2 (X,  ^Y + 22 (Zy 
y = l allj.A
y<&
Now using (5.4) again and the definition of variance and covariance, we obtain
(6.8) Var(Zi + Z2 + Z8) = 2 Var(Z,) + 22 Cov(Zy, Z»).
yi any,*
j 1), then
(6.9) Var(Zi + • • • + ZB) = 2 Var(Zj) +22 CovtX,, Xk),
the last sum including the f ^) terms Cov(Z/, Z*) with subscripts satisfying 1 < j < k < n.
In the preceding section (pp. 221226) we discussed some questions involving sampling with replacement from a population specified by the probability function of a random variable. In. particular, we derived Formulas (5.22) for the mean and variance of the sample
991251
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


234 RANDOM VARIABLES / Chap. 4
mean X based on random samples of size n drawn with replacement. We are now able to prove the corresponding results for the important case where random samples of size n are drawn without replacement from a finite population of N elements.
We suppose, as before, that we have a population of N chips and that each has a number on it. Let #1, £2, • • • , XM be all the different numbers, and suppose x3 appears on /y chips f or j — 1, 2, * • , M. Then, precisely as in (5.16)(5.19) of the preceding section, we define the random variable X and the population mean jux and population variance c&.
From this population of N chips we draw one chip at random, then another at random from the remaining N — 1 chips, and so on until a random sample of size n (n < N) is drawn without replacement from the population. We again let Xk be the random variable whose value is the number on the fcth chip drawn. Thus the sample mean 3T is given by (5.20), as before. Our task is to compute the mean and variance of X .
The random variables Xly X2, •  • , Xn are independent when the sample is drawn with replacement. What makes our present analysis more complicated is the fact that, in sampling without replacement, these random variables are dependent. For example, we have
(6.10) P(XX = a*) = for j = 1, 2, • • • , M,
but knowing the outcome of the first draw changes the probability of getting the number % on the second draw:
(6.11) P(Z2 = avIXx * «,) = y P(X* = x,Xx = X,) =
But, in spite of being dependent, the random variables Xiy X2,  * • , Xn are, as in the preceding section, all identically distributed with probability function the same as that of X] i.e., for k = 1, 2,   •, n, we have
(6.12) P(Xk = xj) = ^ for j = 1, 2, •  •, M.
This means that in the absence of information about the preceding draws, the probability that the fcth draw results in a chip bearing the number x3 is just the proportion of chips bearing this number among all chips in the population. For the first draw this is clear and
991252
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVARIANCE AND CORRELATION 235
recorded in (6.10). We now prove it is true for the second draw. By Formula (II.6.1),
M
P(X, = xj) = Ji P(X, = z,Xi = xk)P(X1 = %).
Because of (6.11), we must isolate the term with k = j. Then P(X» = x,) = P(Z2 = x,\Xt = xi)P(X1 = i,)
+ j* P(X, = xi\Xi = xk)P(Xl = xk),
k = l
where the asterisk means that the sum does not include the term with 7c = j. Continuing,
 1) (^  !) = #' as claimed.
We leave the proof that (6.12) is also true for fc = 3, 4, • • , n for the problems.
From (6.12) it follows that E(Xk) = MX for k = 1, 2, ••,», and so we apply (5.4) to obtain
as in the case of sampling with replacement. The fact that Xi, Xzj •   , Xn are no longer independent does not influence the calculation of /if , since (5.4) holds for dependent as well as independent random variables.
It is in the calculation of the variance of X that the dependence of the Xk's complicates matters. We must use (6.9) and so need to compute Cov(Xj, X&). (We know that Var(Zj) = o, as given in (5.18), since X$ has the same probability function as X for j = 1,2, • • • , n.) A saving grace is the fact that
Cov(Z,, Z*) = CovCTi, X$ for all j ^ k. This eguality follows from the observation (see Problem 6.6) that
991253
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


236 RANDOM VARIABLES / Chap. 4
each pair of random variables taken from Xi, X*, • • • , Xn has the same joint probability function as any other pair. It therefore suffices to compute Cov(.3fi, J$f2), and it is to this task that we now turn our attention.
By the definition of covariance, we have
Again we isolate the terms with j = k and indicate their absence by placing an asterisk on the summation symbol. Then
M Cov(Zi, X,) = 2 fa  Mx)2P(3Ti = xh X, = xi)
+ 2* (xi 
(6.13)
To evaluate this last sum, we use the following device. Since E(Xi) = w, we know that E(X3  JLCZ) = 0, i.e.,
2 (xs  MX)/, = 0. j = i
Now square both sides of this equation to find
M
2 (xs  vx)2ff + 2* (xs  nx)(xk  K)fjt = 0.
3 = 1
The second sum is therefore the negative of the first. We use this in (6.13) and obtain
Cov(Xlj Z2)
I M I M
1) y?x(Xj ~ vxWM* ~ 1}  N(N1)£ fe "" Mx)2/'
1 ^
N(N  1) j
^,
the final equality following from (5.18). Now at last we can apply Formula (6.9) to find «r:
991254
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVARIANCE AND CORRELATION
237
4 =
 <& /#  n\ n\N 1J
With this lengthy calculation, we have completed the proof of the following important theorem.
Theorem 6.2. From a population of N elements with mean /ix and variance o^, a random sample of size n is drawn without replacement. Let X be the sample mean. Then
t& ix\ (6.14)
j 2 0: and 4 = —
Before discussing these results we give a numerical example.
Example 6.1 . Suppose we have a population of N = 5 people and know the IQ score of each. Table 33 summarizes the available information concerning the population. By using Formulas (5.17)(5.19)7
TABLE 33
Number of People
IQ Score with This IQ Score
80 100 1 2
130 2
the reader can check that for this population the mean IQ score and the variance of the IQ scores are
(6.15) RT = 108 and
991255
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


238 RANDOM VARIABLES / Chap. 4
so that the standard deviation of IQ scores in the population is
991256
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVAR1ANCE AND CORRELATION 239
Note that we have here a particular example to which Theorem 6.2 applies. The results in (6.16) can be checked against those predicted by the theorem. The values of /xx and I.
This follows by observing that 1.
(3) In both sampling with and without replacement, the sample variance a\ decreases as the sample size n increases. For the factors mentioned in (2) are not only less than 1, but also decrease as n increases. Furthermore, in sampling without replacement, if the sample exhausts the population (n = TV), then 1, o is smaller when the sample is drawn without replacement than when it is drawn with replacement. For the vari
N — n ances differ by the factor —» which is less than 1 when n > L
A — 1
991257
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


240 RANDOM VARIABLES / Chap. 4
(5) Also, since
1
Nn __N
N  l ~ _ l' 1 N
this factor is close to 1 whenever N is very large compared to n. For then n/N and l/N are close to zero. Thus, if samples of size n are drawn without replacement from a population of N elements, and if the population size is very large compared to the sample size n, then 0. On the other hand, if values of X are above MX whenever values of Y are below Mr and vice versa, then Cov(Z, F) < 0. If X and Y are independent, then we know by Theorem 5.4 that Cov(^T, F) = 0.
By a suitable choice of two random variables, we can make their covariance any number we like. For example, if a and 6 are constants, then
Cov(aX, 6F) = E(aXbY) = abE(XY) 
from wThich it follows that
(6.17) Cov(aZ, 6F) = ab Cov(X, F).
It is now clear that if Cov(Z3 F) ^ 0, then by varying a and 6 we
991258
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVARIANCE AND CORRELATION 241
can make Cov(aX, 5F) positive or negative, as small or as large as we please.
It is more convenient to have a measure of relation that cannot vary so widely. We shall prove shortly that the covariance of X* and F*, where X* and Y* are the standardized random variables corresponding to X and Y (as defined in Theorem 3.4), can vary only between —1 and +1.
Now by (6.17),
Cov(Z*, F*) = Cov (—
1 Cov(Z  MA, Y 
crxcry Cov(X,
this last equality following from (6.6). We are thus led to the following definition.
Definition 6.2. Let X* and F* be the standardized random variables corresponding to X and F. The covariance of X* and F* is called the correlation coefficient of X and F and is denoted by p(X, F). In symbols,
(6.18) p(Z, F) = Cov(X*, F*) =
If 0 and cry > 0, then the correlation coefficient p(X, F) is zero if and only if Cov(Jf, F) = 0. In the exceptional case when one or both of the random variables have standard deviation zero, we know (see Problem 4.10) that X and F are independent and hence Cov(X, F) = 0. Thus P(Z, F) = 0 and Cov(Z, F) = 0 are equivalent conditions: X and F are uncorrelated if and only if their covariance is zero.
Before commenting on this definition, let us see how to compute a correlation coefficient.
Example 6.2. Let X and F be random variables with joint probabilities as given in Table 27 on p. 199. In Example 5.1, we found that
991259
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


242 RANDOM VARIABLES / Chap. 4
m = I, MF = i, and #CXY) = 1. Thus Cov(X, F) =  and we know that X and Y are correlated. We leave for the reader the verification that E(Z2) =  and E(F2) = 3 so that <& = 1 and cr = . We now apply (6.18) to find
vl • 5
Example 6.3. In Example 5.4, we defined two random variables X and F and found that they were functionally dependent (Y = X2), but that (5.6) was true; i.e., Cov(X, F) = 0. We therefore have an •example of random variables that are uncorr elated but not independent. We conclude that one must exercise great care in interpreting the •covariance or the correlation coefficient as a measure of relationship between values of X and F. In particular, the fact that the correlation coefficient is zero does not mean that X and F are unrelated, for we have just seen that p(X, F) = 0 but X and F are as strongly related as they can be : knowing the value of X we are certain of the value of F, since F = X2.
Although it is merely a rephrasing of Theorem 5.4, we emphasize the point made in the last example by the following statement.
Theorem 6.3. If X and F are independent random variables, then they are uncorrelated, but not conversely.
We turn now to some properties of the correlation coefficient.
Theorem 6.4. The correlation coefficient of X and F is a number between —1 and +1 inclusive, i.e.,
(6.19) 1 < P(Z, F) < 1.
Proof. Consider the variance of Z* + F*, where X* and F* are the standardized random variables corresponding to X and F, respectively. By (6.4),
Var(Z* + F*)  Var(X*) + Var(F*) + 2
But Var(X*) = Var(F*) = 1 and Cov(X*, F*) = p(X, F) by definition. Hence
(6.20) Var(X* + F*) = 2[1 + p(Z, F)].
Since Var(Z* + F*) > 0, it follows that 1 < p(X, F). Similarly, the reader can show that
991260
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec, 6 / COVARIANCE AND CORRELATION 243
(6.21) Yar(X*  F*) = 2[1  p(X, Y)]
from which we conclude, again since the variance is nonnegative, that p(X, F) < 1. Thus the theorem is proved. (For another proof see Problem 6.16.)
It is important to understand the meaning of the extreme values p(Xj F) = ±1. Now the strongest relation exists between X and F when the value of F is uniquely determined as soon as the value of X" is known. In such a case, F is some function of X, say F = g(X). This situation exists whenever each row of the joint probability table of X and F has all entries but one equal to zero. As we saw in Example 6.3, F can be a function of X and yet p(X, Y) = 0. In that example, F = X2, a quadratic function of X. But if F is a linear function of X, then we can prove that p(X, F) must have one of its extreme values. And we shall also be able to prove that conversely, if p(X, F) = ±1, then X and F are linearly related. Before stating and proving these results, let us look at an example.
Example 6.4. Suppose the random variables X and Y have the joint probabilities given hi Table 35.
TABLE 35
x ^^\ i 3 5 P(X = x)
1 0 0 .2 .2
2 0 .5 0 .5
3 .3 0 0 .3
P(Y = y) .3 .5 .2 1
Since each row contains exactly one nonzero entry, we know that F is some function of X. We also observe that all nonzero probabilities occur on the diagonal along which the values of F increase as the values of X decrease. On the probability graph in Figure 26 we see that the points (x, y) at which positive probabilities are indicated all lie on the dotted straight line with negative slope. In fact, the joint probability table was constructed assuming F is the linear function of X given by Y = — 2X + 7. But let us calculate the correlation
991261
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


144
RANDOM VARIABLES / Chap. A
0.5 +
Figure 26
coefficient of X and 7 without using this fact. It is easy to find directly from Table 35 thai
ffx = .7, <7F = 1.4, Cov(Z, F) = .98. Hence
Theorem 6.5. Let Z be a random variable defined on a sample space S and suppose ax > 0. Let F be a linear function of X; i.e., 1" = 77i JC" + 6, where m and 6 are numbers and m 9^ 0. Then p(.Y, F) = +1 if m > 0 and p(Z, F) = 1 if m < 0.
Proof. Since F = mX + 6, we have juy y — jur = m(X — jux), from which it follows that
Cov(Z, F) = E[m(X  /xx)2]
+ &. Hence
Also we know from (3.12) that try = . F) = CovCY, 7)
. Thus
m_ \m\
Since m/m[ equals 1 if m > 0 and equals — 1 if m < 0, the proof is complete.
Before we can prove the converse of Theorem 6.5, we must be careful to isolate a minor difficulty. We are going to want to prove
991262
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVAR1ANCE AND CORRELATION 245
that if p(Z, F) = d=l, then F is a linear function of X] i.e., F = mX + b for some numbers ra and 6. What this means is that Y(0i) = mX(Oi) + b for each o% € S. But this is more than we can rightfully expect to prove. For suppose some simple event, say {01}, is assigned probability 0. Then we may as well forget about the element 01, for X(oi) is not one of the possible values of X" unless it is the value of X for some other element 0* e S for which P({0t}) > 0. In any case, the element 0i could have been deleted from our sample space, since it plays no role in the construction of the joint proba^ bility table of X and F. Thus, changing the value F(0i) cannot change the correlation coefficient of X and F. This means that our best hope is to be able to prove that F(0t) = mX(o%) + b for all Oi € S except possibly for elements of S that together make up an event with probability 0. Let us introduce the following handy terminology for this state of affairs: We shall say that two random variables are equal with probability 1 whenever their values are equal for all elements of the sample space S except possibly for elements that together make up an event with probability 0.
Now we can state and prove the converse of Theorem 6.5.
Theorem 6.6. Let X and F be random variables defined on a sample space S, and suppose p(X, F) = =tl. Then F is a linear function of X with probability 1. In fact, numbers m > 0, 5, and c exist so that F = mX + b if p = +1 and F = —mX + c if p = — 1, each with probability 1.
Proof. Suppose p(X, F) = +1 and proceed from Equation (6.21). We see that Var(Z*  F*) = 0 and it follows that X*  F* has one value that occurs with probability 1. This value must be the mean of X* — F* which is zero since the mean of each standardized random variable is zero. Thus X* = F* with probability 1 or
X  MX __ F  MF
Simplification yields the desired result F = mX + b with
yyi = — ^. Q guid 0 =*
ax 0"x
The reader can complete the proof if p(X, F) = — 1 by starting with (6.20) and proceeding as above.
As these theorems hint, the correlation coefficient is a meaningful
991263
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


246 RANDOM VARIABLES / Chap. 4
measure of relationship between values of X and F only when this relationship is a linear one. For a fuller understanding of this fact. one must study the socalled regression functions of each random variable on the other. These functions are defined and some of their properties most often used in statistics are stated in Problem 6.20.
PROBLEMS
6.1. Prove Theorem 6.1.
6.2. From the population of N = 10 people whose incomes are given in the frequencj^ table of Problem 5.15, a random sample of size n = 2 is drawn without replacement If X is the sample mean income, then determine the probability function of X, calculate ju^ and oJ, and thus check Formulas (6.14). Compare the results with those of Problem 5.15, where the sample is drawn with replacement.
6.3. From the population of N = 5 people whose IQ scores are given in Table 33, a random sample of size n is drawn without replacement. Let % be the sample mean IQ score. Determine the probability function of Z, calculate p,% and (b) n = 2, (c) n = 3, (d) n = 9, (e) n = 10. In each case check Formulas (6.14).
6.5. In a population of 10,000 families, annual income has mean $5000 and standard deviation $750. According to Chebyshev's inequality, within what interval will the sample mean % fall with probability at least f if a random sample of size 100 is drawn without replacement from the population?
6.6. The following parts refer to the text discussion of sampling without replacement from a finite population.
(a) Prove (6.12) for k = 3, 4, * • , n and thus complete the proof that Xi, X*, —, Xn are identically distributed.
991264
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVARiANCE AND CORRELATION 247
(b) Determine the joint probability function of X\ and X*, and also of X\ and Xz. What is the joint probability function of X3 and Xk for any j ^ A;?
(c) Show that p(X\, X%) = — .. ^ • Is this answer reasonable?
6.7. For sampling without replacement from a population of size N, you want to determine a sample size n\ such that the standard deviation of the sample mean is half as big as it is for samples of size n. Show that
provided the right hand side is an integer. (Note that n\ and n are equal when n — N, and explain why this is reasonable.)
6.8. Refer to the card guessing experiment described in Example 5.5, but now suppose that the subject chooses at random a permutation of the numbers 1, 2, • • *, n and then calls his guesses in the order specified by the selected permutation. As before, one way of doing this would be for the subject to have a duplicate deck. But now he makes his guesses by selecting a random sample of n cards, one by one without replacement from his deck.
Let X denote the random variable whose value is the number of correct guesses made by the subject, and thus write
X = Xi + Z2 + • • • + Xn,
where Xk has the value 0 or 1 according as the Mh guess (trial) is wrong or right.
(a) Show that Xi, Xs, * • • , Xn axe not independent.
(b) Prove that Xi, X% • • , Xn are identically distributed, the probability of a correct guess at the &th trial being l/n for k = 1, 2, •, w.
(c) Show that E(X) = 1, as in Example 5.5.
(d) Prove that Cov(X,, Xk) = ,, * ,. for ally 7* k.
nl(n — 1)
(e) Show that Yar(Z) = 1, a somewhat higher variance than in Exampk 5.5.
6.9. Show that all other things being equal, the greater the correlation coefficient of two random variables, the greater the variance of their sum and the less the variance of their difference.
6.10. The average covariance of X\, X%,  • • , Xn is denoted by CovAv(Xi, • • • , Xn) and denned as the sum of all Cov(Z/, Xk) with 1 < j < k < n
divided by ( n \ the number of such covariances.
991265
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


248 RANDOM VARIABLES / Chap. 4
(a) Show that
Var(Z) = ~ 2
(b) Suppose a number K (not depending on n) exists such that Var(JLj) < K for all j" = 1, 2, • • •, n. Suppose further that as n increases without bound, Cov\v(Zi, •  •, Xn) approaches^some limiting value, say C. Show that then the variance of X also approaches C.
8.11. Let X and Y be the characteristic random variables (see the definition in Problem 1.14) of events 4 and B, respectively. Find p(X, F) and determine whether X and Y are independent if
(a) P(A) = 1, P(A\B) = 1, P(B\A) = 
(b) P(A) = i, P(AB) = I, P(£A) = J.
6.12. Let ^T be the larger of the two numbers and Y be the sum of the numbers showing when two fair dice are rolled. Find p(X, F). (Cf. Problem 4.4.)
6.13. Let X be the number of empty cells and Y the number of objects in the first cell when three indistinguishable objects are randomly distributed into three numbered cells. Find p(X, F). (Cf. Problem 4.3.)
6.14. A fair die is rolled two independent tunes. Let X and F denote the number of points showing on the first roll and the second roll, respectively. Define U = X + Y and V = X  F. Show that U and V are dependent random variables, but that p(Uj V) = 0.
6.15. A fair coin is tossed four independent times. Let X be the number of heads obtained on the first two tosses and F be the total number of heads. Find the joint probability table of X and F and compute P(X, Y).
6.16. (An alternate proof of Theorem 6.4.) Let U = X  nx and 7 =
F  jur.
(a) Note that y = E([xU + F]2) > 0 for all real x.
(b) Expand and obtain
y = cri^2 + 2 Cov(Z, Y)x + 0.
(c) Interpret the inequality in (b) as showing that a certain parabola in the xy plane lies entirely above the xaxis or has exactly one point of contact with the xaxis.
(d) Conclude that the quadratic equation y = 0 has either no real roots or two equal real roots* Hence the discriminant of the quadratic equation must be negative or zero. Thus find — 1 < p(X, F) < 1.
991266
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 6 / COVAR1ANCE AND CORRELATION 249
6.17. Suppose y Is a linear function of X. Let pm be the value of p(X, 7) when Y = mX + 6. Draw a graph showing how pm depends upon the slope m. (Plot m along the horizontal axis.)
6.18. Let Z and Y be random variables and suppose a, &, c, d are any numbers provided only that a 5^ 0, c 5^ 0. Show that
This result shows that the absolute value of the correlation coefficient is not altered by a change in location of the origin or a change in scale on either xoiy axis. This is a property expected of any reasonable measure of relationship. For example, the correlation between weight and height will have the same absolute value whether we measure height in inches, in feet, or in tenths of an inch above or below 68 inches. If a and c have opposite signs, then we see that p(aX + 6, cY + d) = — p(Z, 7). Is this change of sign reasonable?]
6.19. Suppose X and Y each have only two possible values. Prove that, if X and 7 are uncorrelated, then they are also independent.
6.20. The conditional mean of Y given X = x3 is denoted by E(Y \ X = x3) and defined for j = 1, 2, •  , M by the equation
E(Y \X « x3) =  ykg(yk \ xj.
k — 1
(Note: Conditional probability functions are defined in Definition 4.3, p. 206.)
(a) Show that if X and 7 are independent, then
E(Y  X = x,) = E(Y) for j = 1, 2, • • • , M.
(b) For any random variables X and 7, show that
£(7) = 2 E(Y\X = x3)f(x,).
3 = 1
(c) The conditional mean ofX given Y — yk is denoted by E(X \ Y = T/L) and defined for k = 1, 2,   , N by the equation
E(X  7 = yk) = S xj(x, \ yk). j = i
State and prove the results analogous to those in parts (a) and (b), but now referring to the conditional mean of X given 7 = r/jt.
(d) The regression function of Y on X is defined as the function whose domain is the set of possible values of X and whose value at x,is the conditional mean E(Y \ X = xj) for j = 1, 2, •  • , M. Similarly, the regression function ofXonY has the set of possible values
991267
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


250 RANDOM VARIABLES / Chap. 4
of Y as domain, and Its value at y& Is the conditional mean E(X 1 7 = 2/jb) for k = 1, 2,  • •, N. The regression graph of Y on .X" is a set of Af points in the xy plane, the point with xcoordinate Xj having ^/coordinate E(Y \ X = x,). Similarly, the regression graph of X on 7 is the set of N points (E(X \ Y = y^), yk) for fc=l,2, ...,AT.
A regression function is said to be linear if all the points of the corresponding regression graph lie on a straight line. Otherwise, a regression function is said to be nonlinear, (i) For X and Y with joint probabilities given in Table 27 (p. 199),
show that both regression functions are linear, (ii) For X and Y with joint probabilities given in Table 31 (p. 218),
show that the regression function of Y on X is nonlinear, but
the regression function of X on F is linear, (iii) Construct a joint probability table so that both regression
functions are nonlinear.
(e) Suppose the regression function of F on X is linear; i.e., constants m and 6 exist such that for j = 1, 2, • • , M,
(*) E(Y \ X = a*) = J^ ykg(yk \»,) = mx, + 6.
To evaluate m and 6, proceed as follows. First multiply (*) by f(x3) and add the resulting equations f or j = 1, 2, • • •, M. Ob tarn fir = mjux h b. Then multiply (*) by x3f(x}) and add all M equations as before. Obtain E(XY) = mE(X*) + 6jux. Solve these simultaneous linear equations and thus determine m and 6. Finally, show that the linear regression function of Y and X can be written in the following form:
(**) E(Y \ X = x,) = Mr + p(Z, Y) & fe  Mx).
&x
Conclude that the points of the graph of the linear regression function lie on a straight line passing through the point (/x^, MF) and that this line is horizontal if and only if X and 7 are uncorrelated.
(f) The experiment is performed and we are given the incomplete information that the value of X is x3\ We want to estimate the value of F. Suppose we use a "leastsquares" criterion; i.e., we seek an estimate, say c3, such that the mean squared deviations of values of F from the estimated value c3 will be as small as possible. In symbols, we seek the number c3 which minimizes
E[(Y  c,Y 1X = *,] = s (y*  c,)V(y* I xj).
K a= 1
Show that this leastsquares estimate is the conditional mean of Y x3; i.e., show that c, = E(Y \ X = x3). [Hint: The proof
991268
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


SUPPLEMENTARY READING 251
follows immediately from the property of the mean stated in Problem 3.8.]
(g) For any pair of values (x^ yk], the error made by using the estimate E(Y \ X — £j) in place of y^ is the difference yk — E(Y \ X = x3). The mean squared error, denoted by cr2ej is therefore the sum
= S
all /,&
Show that if the estimate is given by the linear regression function (**), then
and thus conclude that this mean squared error decreases and approaches zero as p(X, F) approaches either +1 or —1. (h) State and prove the results analogous to those in parts (e)(g), but now supposing the regression function of X on F is linear and we are given that the value of Y is y^.
SUPPLEMENTARY READING
In addition to the references listed at the end of Chapter 2, the following books are among the many sources of material on random variables and related topics. As with most of the previously mentioned references, only parts of these books can be read without some knowledge of the differential and integral calculus.
1. Adams, J. K., Basic Statistical Concepts, McGrawHill Book Company, Inc., 1955.
2. Brunk, H. D., An Introduction to Mathematical Statistics , Ginn and Company, 1960.
3. David, F. N., Probability Theory for Statistical Methods, Cambridge University Press, 1949.
4. Fraser, D. A. S., Statistics: An Introduction, John Wiley and Sons, Inc., 1958.
5. Lindgren, B. W. and G. W. McElrath, Introduction to Probability and StatisticSj The Macmillan Company, 1959.
6. Mood, A. M., Introduction to the Theory of Statistics, McGrawHill Book Company, Inc., 1950.
7. Wilks, S. S., Elementary Statistical Analysis, Princeton University .Press, 1948.
991269
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Chapter 5
BINOMIAL DISTRIBUTION AND SOME APPLICATIONS
1» Bernoulli trials and the binomial distribution
Certain kinds of experiments and associated random variables occur time and again in the theory of probability and in its applications. They are therefore made the object of special study in which their properties are explored, values of frequently needed probabilities are tabulated, and so on. In this section, we describe a number of such experiments and random variables, paying special attention to the socalled binomial probability function. In the final sections of this chapfer, we discuss two important problems of statistics in which this function plays a central role.
As we have seen in numerous examples throughout this book, many problems involve experiments made up of a number, say n, of individual trials. Each trial is itself really an arbitrary experiment, and is therefore defined in the mathematical theory by some sample space and assignment of probabilities to its simple events. The trials can be independent or dependent, and the simple events of the sample space for the ntrial experiment are assigned probabilities accordingly.
Although each trial may have many possible outcomes, we are often interested only in whether a certain result occurs or not. For
252
991270
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 253
example, a machine turns ou$ parts which are classified defective or good; a card is selected from a standard deck and it is an ace or not an ace; two dice are rolled and the sum of the numbers showing is seven or is different from seven; a student selected from the senior class has a parttune job or has not; etc.
In order to have a convenient standard terminology for discussing all such trials, we shall call one of the two possible results of each trial a success and the other a failure. Which result is called a success is, of course, completely arbitrary — whether one calls a defective part a success or a failure, or a student with a parttime job a success or a failure, is a matter of taste as far as the theory goes. We must however make sure that we are consistent in our language in any one problem.
If when a trial is performed we are interested solely in whether a success or a failure results, then it is sensible to make the sample space defining the trial reflect this fact by containing just two elements, say S for success and F for failure. If the simple event {S} is given probability p, then an acceptable assignment of probabilities is determined for every choice of the number p, provided only that 0 < p < 1. Writing q = 1 — p for convenience, we have
As an example, consider drawing a card at random from a standard deck. Ordinarily we define as sample space a set containing 52 elements (one for each card) and assign probability ^ to each simple event. But if we are interested only in whether or not the card is an ace, and we call drawing an ace a success and drawing any other face value a failure, then we prefer to use {S, F} as sample space, with p = ^ and g = If as probability of success and failure, respectively.
Definition 1.1. Trials are called Bernoulli trials (after James Bernoulli, 16541705) if and only if they meet the following conditions:
(1) Each trial is defined by the sample space {S, F} ; i.e., we consider that each trial has only two outcomes: either S (success) or F (failure).
(2) The same assignment of probabilities, as given in (1.1), is made to the simple events of each trial; i.e., the probability of a success is the same on each trial and is denoted by p.
(3) The trials are independent
991271
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


254 BINOMIAL DISTRIBUTION / Chap. 5
A sequence of any (not necessarily Bernoulli) trials can be thought of as a process in which outcomes of the individual trials are produced as the trials are performed. A process of this kind is called a stochastic (= probability) or random process, since the particular sequence of outcomes obtained depends upon chance. A random process made up of Bernoulli trials is called a Bernoulli process.
Example 1.1. Tossing a coin 100 independent times is interpreted to mean 100 Bernoulli trials in which each trial (toss of the coin) results In success (say, heads) or failure (tails), and the probability p of a head is the same tor all 100 tosses. If the coin is fair, then p = f and q = ; if the coin is biased, then p =^ f.
Example 1.2. Consider a manufacturing process in which a metal part Is produced by an automatic machine. Suppose each part in a production run of 500 parts can be classified upon inspection as defective or good. We can think of the production of a part as a single trial which results in success (say, a defective part) or in failure (a good part). If we believe that the machine operation is just as likely to produce a defective on one trial as on any other, and if we also believe that the occurrence of a defective on any trial is made neither more nor less likely by the particular results obtained on the preceding trials, then It is reasonable to assume that the production run is a Bernoulli process with 500 trials. (The probability p of a defective on each trial is called the average fraction defective of the process.)
Of course, the Bernoulli process is a mathematical idealization of the actual production process. For example, if the machine setting wears down as the run proceeds, then the tendency of the machine to produce defectives will increase as time goes on and the probability p is therefore not the same for all 500 trials. It is clear that a real manufacturing process cannot be exactly represented by a Bernoulli process. Nevertheless, It Is often closely approximated by such a process, and useful results are obtained by means of this idealization.
Example 1.3. The sample space for an experiment made up of three Bernoulli trials with probability p for success on each trial is the Cartesian product set {S, F} X {S, F} X {S, F} containing 23 = 8 threetuples as elements. These threetuples and the probabilities of the corresponding simple events, obtained by use of the product rule of Chapter 2, Section 9 (since the trials are independent), are listed
991272
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


c.1 / BERNOULLI TRIALS 255
the first two columns of Table 36. The number of successes obined in this experiment, denoted by S3, is a random variable whose
TABLE 36
Outcome of Experiment Probability of Corresponding Simple Event Possible Value of S$ k P(&  k)
FFF m = 2s 0
991273
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


256 BINOMIAL DISTRIBUTION / Chap. 5
conclude that there are ( , 1 ntuples containing ft* S's and n — k F's,
and that the corresponding simple events all have the same probability, namely pkqn~k.
As in Example 1.3, we are interested in determining the probability function of the random variable whose value is the total number of successes obtained in the ntrial experiment. This random variable is denoted by Sn and clearly has possible values 0, 1, • • •, n. Now Sn = ft, where ft is any one of these possible values, is the event for which exactly ft S's (and therefore n — ft F's) occur. This event is
the union of the (!M simple events determined by ntuples with k S's
and n — ft F's. As we observed, each such simple event has probability pkqn~k. Hence
ao\ PfQf __ l~\ — I \ mk^n—k "fa — A 1 . . . yi
*&) JL ^OTV — H) — \7iPy. ^ — Vy •*•? ) *"
We have therefore proved the following result.
Theorem 1.1. Suppose an experiment consists of n Bernoulli trials with probability p for success on each trial. If Sn is the random variable whose value for any outcome of the experiment is the total number of successes obtained, then the probability function of Sn is given by (1.2).
For given values of n and p, the probability function defined by (1.2) is called the binomial probability function or the binomial distribution* with parameters n and p. Formula (1.2) thus defines not just one binomial distribution, but a whole family of binomial distributions, one for every possible pair of values for n and p. To show the dependence of the probabilities on the parameters, we shall write b(k\n, p) for the probability in (1.2). Thus
(1.3) b(k\n, p} = P(Sn = ft) =
is the probability of exactly ft successes, given the parameters n and p
* For the random variables considered in this volume the terms 'probability distribution and probability function are synonomous. We have avoided introducing the term probability distribution earlier due to possible confusion with the (cumulative) distribution function. But the reader should become familiar with the standard terminology, and we use it from now on. Note that one customarily shortens binomial probability distribution to binomial distribution.
991274
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


5ec. 1 / BERNOULLI TRIALS 257
if the binomial distribution; i.e., b(k\n, p) is the probability of exictly k successes in n Bernoulli trials with probability p for success 3n each trial The random variable Sn is said to be binomially dislributed with parameters n and p when Sn has the probability distriDution defined by (1.3).
The name binomial distribution arises from the fact that the probabilities b(k\n, p) for k = 0, 1,   , n are the terms in the binomial expansion of (q + p}n. (See Chapter 3, Section 2, for the binomial theorem and related identities involving binomial coefficients.) It Follows since p + q = 1 that
(1.4) S b(k\n, p) = (3 + p)« = 1,
fc = 0
as required for a probability function.
Example 1.4. If a fair coin is tossed six times, the probability of getting exactly five heads is
The probability of at least five heads is obtained by adding the probability of five heads and the probability of six heads. Since
Y = ~L = .015625,
it follows that the probability of at least five heads is
P(S* > 5) = 6(5[6, i) + 6(66, i) = .109375,
where we write & for the random variable denoting the total number of heads (successes) among the six tosses.
Example 1.5. Five percent of the metal parts produced by a machine are defective, the other 95 percent are good. How many parts must be produced in order for the probability of at least one defective to be  or more? We assume that the production of parts is a Bernoulli process for which each trial (producing one part) results in a success (defective part) or failure (good part). The probability p for success on any trial is given as p — .05. We seek the smallest integer n such that P(Sn > 1) > i Now
P(Sn >!)=! P(Sn = 0) = 1  6(0n, .05)
= 1  ("\ (.05)°(.95)n = 1  (.95)",
991275
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


258 BINOMIAL DISTRIBUTION / Chap. 5
so that we want the smallest integer n for which 1 — (.95) n >  or (.95}w < i Using logarithms (Table 22, p. 141) we find n log (.95) < —log 2, from which n > 13.5 approximately. Hence n = 14 is the smallest lot size that can be used in order to have an even chance or better of finding at least one defective part in the lot.
For many applications, it is necessary to compute the probability not of exactly r successes, but of at least r or at most r successes. Since such cumulative probabilities are obtained by computing all the included individual probabilities and adding, this task soon becomes laborious. For example, to compute the probability of at least six successes in ten Bernoulli trials with p = .3, we must compute 4(fc10, .3) for k = 6, 7, 8, 9, 10 and then add these five probabilities. Fortunately, extensive tables are available to lighten the task of such computations.*
A small table of binomial probabilities is included here (Table 37) for our use. Wherever possible, examples and problems from now on will be formulated with numerical values for the parameters n and p that will allow the table to be used to find required probabilities. Note that we have tabulated P(Sn > r), the probability of r or more {at least r) successes for n = 1, 2, •  *, 10, 20 and for p = .01, .05, *1Q, .20, .30, .40, .50. For each pair of values for n and p and each possible value of r we read
P(Sn > r) = b(r n, p) + b(r + 1», p) + . . • + b(n\n, p)
directly from the table. (We do not include a row for r = 0, since P(Sn > 0) = 1 for all n and p.) We illustrate the use of this table in the following examples.
Example 1 .6. In Example 1.4, we computed P(S& > 5) for p =  and found the answer. 109375. In our table, for n = 6, r = 5, p = .50, we read .109, which agrees to three decimals with the exact answer. To find P(& = 5) = 6(516, J) we note that
P(S6 = 5) = P(S* > 5)  P(& > 6),
* See Tahks of the Cumulative Binomial Probability Distribution, Annals of the Computation Laboratory of Harvard University, vol. XXXV, Harvard University Press, 1955; Tables of the Binomial Probability Distribution, National Bureau of Standards, Applied Mathematics Series, vol. 6, 1950; H. C. Romig, 50100 Binomial Tables^ John Wiley and Sons, Inc., 1953.
991276
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


TABLE 37. Cumulative Binomial Probabilities
« The entry is P(Sn > r) = S b(fcn, p). Missing entries are less tlian .0005.
n r p = .01 p = .05 p = .10 p = .20 p = .30 p = .40 p = .50
I I .010 .050 .100 .200 .300 .400 .500
2 I .020 .098 .190 .360 .510 .640 .750
2 .002 .010 .040 .090 .160 .250
3 1 .030 .143 .271 .488 .657 .784 .875
2 ,007 .028 .104 .216 .352 .500
3 .001 .008 .027 .064 .125
4 1 .039 .185 .344 .590 .760 .870 .938
2 .001 .014 .052 .181 .348 .525 ,688
3 .004 .027 .084 .179 .312
4 .002 .008 .026 .062
5 1 .049 .226 .410 .672 .832 .922 .969
2 .001 .023 .081 .263 .472 .663 .812
3 .001 .009 .058 .163 .317 .500
4 .007 .031 .087 .188
5 .002 .010 .031
6 1 .059 .265 .469 .738 .882 .953 .984
2 .001 .033 .114 .345 .580 .767 .891
3 .002 .016 .099 .256 .456 .656
4 .001 .017 .070 .179 .344
5 .002 .011 .041 .109
6 .001 .004 .016
7 1 .068 .302 .522 .790 .918 .972 .992
2 .002 .044 .150 .423 .671 .841 .938
3 .004 .026 .148 .353 .580 .773
4 .003 .033 .126 .290 .500
5 .005 .029 .096 .227
6 .004 .019 .062
7 .002 .008
8 1 .077 .337 .570 .832 .942 .983 .996
2 .003 .057 .187 .497 .745 .894 .965
3 .006 .038 .203 .448 .685 .855
4 .005 .056 .194 .406 .637
5 .010 .058 .174 .363
6 .001 .011 .050 .145
7 .001 .009 .035
8 .001 .004
259
991277
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


TABLE 37. Cumulative Binomial Probabilities (cont.)
n
The entry is P(Sn > r) = 2 K&X P) Missing entries are less than .0005.
n T p = .01 p  .05 p = .10 p = .20 p * .30 p = .40 p = .50
9 1 .086 .370 .613 .866 .960 .990 .998
2 .003 .071 .225 .564 .804 .929 .980
3 .008 .053 .262 .537 .768 .910
4 .001 .008 .086 .270 .517 .746
5 .001 .020 .099 .267 .500
6 .003 .025 .099 .254
7 .004 .025 .090
8 .004 .020
9 .002
10 1 .096 .401 .651 .893 .972 .994 .999
2 .004 .086 .264 .624 .851 .954 .989
3 .012 .070 .322 .617 .833 .945
4 .001 .013 .121 .350 .618 .828
5 .002 .033 .150 .367 .623
6 .006 .047 .166 .377
7 .001 .011 .055 .172
8 .002 .012 .055
9 .002 .011
10 .001
20 1 .182 .642 .878 .988 .999 1.000 1.000
2 .017 .264 .608 .931 .992 .999 1.000
3 .001 .075 .323 .794 .965 .996 1.000
4 .016 .133 .589 .893 .984 .999
5 .003 .043 .370 .762 .949 .994
6 .011 .196 .584 .874 .979
7 .002 .087 .392 .750 .942
8 .032 .228 .584 .868
9 .010 .113 .404 .748
10 .003 .048 .245 .588
11 .001 .017 .128 .412
12 .005 .057 .252
13 .001 .021 .132
14 .006 .058
15 .002 .021
16 .006
17 .001
18
19
20
260
991278
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 261
since the event (S& > 5) is the union of the mutually exclusive events ($6 = 5) and (S6 > 6). These cumulative probabilities are read directly from the table for n = 6, r = 5 and n = 6, r = 6, using the column headed p = .50. We find
P(£6 = 5) = .109  .016 = .093,
as compared to the exact answer .09375 computed in Example 1.4. The exact answer when rounded to three decimals is .094 rather than .093 as obtained from the table. Since each cumulative probability in the table is itself a rounded figure, such discrepancies are to be expected when subtracting tabular entries.
Example 1 .7. To find P(S10 < 3) when p = .20 we write < 3) = 1  P(S10 > 4),
and read this cumulative probability under p — .20 and in the row labeled n = 10, r = 4. We get
P(Sw < 3) = 1  .121 = .879.
Example 1 .8. To use the table when p > .50, rephrase the problem in terms of q = 1 — p. For example, to find the probability of at least seven successes in ten Bernoulli trials with p = .80, we compute instead the equal probability of at most three failures hi ten trials, but now entering the table with the probability appropriate to a failure, namely p = .20. This probability was computed in the preceding example.
Note that this method amounts to relabeling the two results of each trial so that S and F are interchanged. If the probability of a "success" is initially greater than .5, then after the relabeling it is less than .5 and the problem is reformulated in terms of this new language before using the table. (The formal identity used to justify this intuitively clear procedure is given in Problem 1.13.)
Example 1 .9. Two teams, A and B, compete in a series of games. Each trial (play of one game in the series) can result in success (say, A wins) or failure (B wins). If we assume that the probability p that A wins is the same for all games in the series and that the games are independent trials, then a Bernoulli process serves as a mathematical model of the series competition.
The probability p is taken as a measure of the relative strength oi
991279
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


262
BINOMIAL DISTRIBUTION / Chap. 5
the two teams. If p > , team A Is better than team B; if p = , the teams are evenly matched; if p < 4, team B is better than team A. How does the kind of series affect the probability of the better team to win the series? For example, a tie at the end of the regular baseball season between two National League teams is broken by a threegame series in which the team first to win two games is declared the pennant winner. The American League breaks ties by having the teams play a single game. World Series competition, however, is a sevengame series in wTMch the team first to win four games is the winner. We feel intuitively that a superior team has a better chance of showing its superiority in a seven game series than in a three game series or in a single game against the same opponent.*
Although the World Series ends as soon as one team wins four games, we could imagine it continued to the full seven games. Winning the series is equivalent to winning at least four of the seven games, and we can therefore use our table of cumulative binomial probabilities to compute the probability of a team winning the series for various values of p. For example, if p = .30, then we enter the table for n = 7, r = 4 and read the probability .126 for team A to win the series. If p = .90, then the probability that team A wins the series Is not directly available from the table. Instead, we read the probability that team B wins (entering the table for p = .10 appropriate to the new meaning of a "success") and find .003. Hence, team A
TABLE 38
Probability of Team A Winning Single Game P Probability that Team A Wins an nGame Series
n = 1 rc = 3 n = 5 n = 7 n = 9
0 0 0 0 0 0
.1 .100 .028 .009 .003 .001
.3 .300 .216 .163 .126 .099
.5 .500 .500 .500 .500 .500
.7 .700 .784 .837 .874 .901
.9 .900 .972 .991 .997 ,999
1.0 1.000 1.000 1.000 1.000 1.000
* For a complete discussion of this and related points, see F. Hosteller, "The World Series Competition," Journal of the American Statistical Association, vol. 47 (1952), pp. 355389.
991280
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 263
wins the series with probability .997 if p = .90. In Table 38, we summarize these computations for various values of p and for series containing an odd number of games, the winner being required to win a majority of the games. These probabilities are graphed in Figure 27,
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Probability of winning each game (p) Figure 27
and we can see how increasing the number of games in the series decreases the probability of a poorer team winning (the graphs get lower if p < .5) and increases the probability of a better team winning (the graphs get higher if p > .5). The fact that all five graphs are very close together around p = .5 means that, if one team is only slightly better than the other (say p = .51), then a ninegame series is not very much more effective than a single game as a discriminator between the teams. In fact, with p = .51 it turns out that the better team wins a ninegame series with probability .525, which is only slightly higher than .51, the probability that it wins a single game. Put differently, this means that the poorer team will win the nine
991281
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


204 BINOMIAL DISTRIBUTION / Chap. 5
game series roughly 47.5% of the time in spite of the fact that it faces a superior opponent. Of course, one reduces the probability that the series will erroneously be won by the poorer team by increasing the number of games in the series. (Similar ideas appear in a variety of statistical problems, as we shall see in the next section.)
We turn now to a discussion of some properties of the random variable Sn. In particular, we want to determine the mean and variance of this binomially distributed random variable. Since the probability function of Sn is given in (1.2), we could use the definitions of mean and variance given in the preceding chapter to compute E(Sn) and VarOS»). We would then have to evaluate the sums
E(Sn) = 2 Kb(k\n, p) = 2
(L5) and (16)
from which we compute the variance of Sn by use of the formula (1.7) VarOSn) = J?(£g) 
\n, p) = 2
This way of calculating the mean and variance of Sn is direct and offers useful practice in the manipulation of binomial coefficients. But we choose to leave this for the problems and instead present an alternate derivation which gives added insight into the nature of the binomial distribution.
Suppose (Cf. Section 5 of the preceding chapter) we have a population specified by the random variable X whose probability table is as follows:
X 0 i
P(X = x) g p
(1.8)
X" is here interpreted as the number of successes in a single trial, and we simulate a Bernoulli process by drawing random samples with replacement from this population, thinking of the occurrence of a success as corresponding to x = 1 and the occurrence of a failure as corresponding to x — 0. Indeed, if Xk is the fcth sample value obtained, then the sum Xi+ • • • + Xn is precisely the number of ones in a
991282
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 265
random sample of size n or equivalently, the number of successes in n Bernoulli trials with probability p for success on each trial. Hence the sample mean X is related to Sn by the formula
(1.9)
n
or
and we can compute E(Sn) and Var(Sn) by using Theorem 5.7 of the preceding chapter, since from (1.9) we have
(1.10) E(Sn) = nvx and Var(Sn) = nV.
Now the population mean and variance are easily determined from (1.8) to be
(1.11) AX = p, ol = pq.
Hence
E(Sn) = n^x = n^x = np
and
2
Var($n) = nV = n2 — = npq.
We have thus proved the following important result.
Theorem 1.2. A binomially distributed random variable with parameters n and p has mean np, variance npq, and standard deviation vnpq.
Example 1.10. In 100 families containing four children, the number of families that had 0,1, 2, 3,4 girls were recorded as in the following frequency table:
Number of Girls in Family 0 1 2 3 4 Total
Number of Families 4 31 35 25 5 100
If the probability of giving birth to a girl is assumed constant, then how can we use these data to estimate this unknown probability? We think of the sexes of the children in each family as being determined by four Bernoulli trials with the probability p of success (female child) fixed but unknown. Using this theoretical binomial distribution, Theorem 1.2 tells us that the mean number of girls in a family of four children is 4p. To estimate p we adopt the following
991283
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


266
BINOMIAL DISTRIBUTION / Chap. 5
procedure: set the mean of the theoretical binomial probability distribution equal to the mean of the observed frequency distribution. The mean number of girls In the 100 families is
0(4) + 1(31) + 2(35) + 3(25) + 4(5) __
_ _ im
Hence, according to the estimation procedure just stated, we equate 4p and 1.96 to obtain
* IM 4Q p = — = .49,
where we write p to denote an estimate of p based on the particular data given in this problem.
TABLE 39
Theoretically
Number of Girls "Fitted" Binomial Expected
in Family Observed Probabilities Frequencies
k Frequency 6(fcl4, .49) 1006(fc4, .49)
0 4 .068 6.8
1 31 .260 26.0
2 35 .375 37.5
3 25 .240 24.0
4 5 .058 5.8
The binomial distribution with parameters n = 4 and p = p = .49 is said to be "fitted" to the observed frequency distribution. From the fitted binomial distribution, we can compute the probabilities 6(fc4, p) and thus the theoretically expected frequencies 1006 (fc4, f) for k — 0, 1, 2, 3, 4 and then can compare these with the actually observed frequencies to see how good a "fit" we have. The result is given in Table 39. How to test the "goodness of fit" between observed and theoretically expected frequencies as well as how to appraise the given estimation procedure as compared with other possible procedures are problems of great importance in statistics, but we cannot go into these matters here.
Using Theorem 1.2, the standardized random variable corresponding to Sn is seen to be
991284
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 267
(1.12)
' npq The event — c < $* < c is the same as
(1.13) n£  cnpq < Sn 0, the normal curve approximation tells us that this probability is about 68 if n is large. Similarly, we learn that P(— 2 < $* < 2) is about .95 and P(3 < S% < 3) is about .997, so that the number of successes in n Bernoulli trials is almost certain to be within three standard deviations of its mean if n is large. Unfortunately, we cannot do any more here than mention these results which are of such great practical and theoretical significance in probability. We do however give one illustrative example.
Example 1.11. A coin is tossed 400 times and falls heads 210 times. If the coin is fair, is it unlikely to get this many heads? The number of heads is binomially distributed with parameters n = 400 and, we assume, p = . Hence the mean number of heads is 200 and the standard deviation is V400()() = 10 The probability that the number of heads is between 190 and 210 is approximately .68, since this is a one standard deviation interval on either side of the mean. • To get as many as 210 heads is therefore not at all unlikely, on the assumption that the coin is fair. Similarly, we find that with a fair coin it is almost certain (probability .997) that the number of heads will fall within three standard deviations, or 30, on either side of the mean, i.e., in the range 170 to 230. Obtaining a number of heads less than 170 or more than 230 would therefore throw grave doubts oil the hypothesis that the coin is fair.
991285
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


268 BINOMIAL DISTRIBUTION / Chap. 5
PROBLEMS
1.1. If the production of parts by a machine is regarded as a Bernoulli process with process average defective equal to p = .20, is it more likely to have (a) no defectives among ten parts, or (b) at most one defective among 20 parts?
1.2. In a 20question truefalse examination, suppose a student tosses a fair coin to determine his answer to each question. If the coin falls heads, he answers "true"; if it fails tails, he answers "false/7 Find the probability that he answers at least 12 questions correctly and thus passes
the exam.
1.3. The probability of having no ace in a bridge hand is approximately .30. "What is the probability that a person who plays ten hands of bridge will never receive an ace?
1.4. From the cumulative probabilities given in Table 37, determine the probability function of a binomially distributed random variable with parameters (a) n = 10 and p = .3, (b) n = 10 and p = .7, (c) n = 10 and p = .5.
1.5. Let X be a binomially distributed random variable with mean 12 and variance 4.8. Find (a) P(X > 5), (b) P(5 < X < 10), (c) P(X < 10).
1.6. A man is to throw a fair coin a certain number of independent times and is to receive a prize if he throws exactly five heads. At the outset, he is to choose the number of throws he will make. What number should he choose in order to maximize his chances of winning the prize? What then are the odds for his winning the prize?
1.7. For n = 20 Bernoulli trials, determine
(a) P(Sy> > 12) for p = 0.7.
(b) P(10 < &o < 14) for p = 0.6.
(c) The value of p for which P(&20 > 8) = .50. (Hint: Interpolate between two values found in the table.)
1.8. What is the probability of throwing exactly nine heads exactly twice in five throws of ten fair coins? (Hint: Use the binomial distribution twice.)
1.9. How many Bernoulli trials with probability .01 for success must be performed in order that the probability of at least one success be J or more?
1.10. We are given the information that n Bernoulli trials resulted in exactly k successes. Show that the conditional probability of a success on any particular trial is k/n.
991286
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 269
1.11. In order to decide whether to accept or reject a very large lot of items offered for sale, the buyer takes a sample of 20 items at random from the lot and tests them. If at most one defective is found, he accepts the entire lot; if more than one defective is discovered in the sample, he rejects the lot.
(a) Find the probability that the buyer accepts the lot if in fact it contains a proportion of defectives equal to pt where p assumes the values in Table 37.
(b) Graph the probability that the buyer accepts the lot against the proportion of defectives, showing the probability of acceptance on the vertical axis. (This is called an operating characteristic curve or OC curve for the singlesample decision rule adopted by the buyer.)
(c) Draw the operating characteristic curve for the following alternative singlesample decision rule: a sample of only ten items is drawn at random from the lot and tested. The lot is accepted if no defectives are found and rejected otherwise.
(d) Where in your analysis of this problem have you used the fact that the lot is very large?
1.12. (a) Prove the following recursion formula for binomial probabilities:
(b) Denote by m the unique integer for which
(n + l)p  1 < m < (n + l)p.
If (n + l)p is not an integer, show that as k goes from 0 to n, b(k\n, p) increases up to a maximum value which occurs for k — m and then decreases. But if m = (n + l)pt then show that b(k\n, p) increases up to b(m — ln, p) which is equal to 6(wn, p), and then decreases.
(c) Use Table 37 to compute the binomial probabilities for n = 4, p = A and n = 5, p = A. For these special cases, check the assertions made in (b).
(d) The number m defined in (b) is called the most probable number of successes in n Bernoulli trials with probability of success equal to p. Determine 6(mn, p) for n = 20, p = .10 and n = 20, p — .50. Does the most probable number of successes occur with high probability?
1.13. Show that
(a) b(k\n, p) = b(n — k\n, I — p)
(b) S K*K p) = 1  2 &(»n, 1  p).
991287
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


270 BINOMIAL DISTRIBUTION / Chap. 5
Interpret these formulas in words and show how they are used in relation to Table 37.
1.14. Show that
b(k\n 41, P) = pb(k  l\n, p) + qb(k\n, p)
and interpret this formula in words. Show how the formula can be used
to extend Table 37 to n = 11.
1.15. (a) Compute P(c < S* < c) for c = 1, 2, 3 if n = 5 and p = .20,
and compare with the corresponding normal curve approximations, (b) Repeat part (a), but with n = 10 and then n = 20.
1.16. Compute E(Sn) and Var($n) by evaluating the sums in (1.5) and (1.6).
1.17. The function G whose value for every real number t is given by
(7(0= S b(k\n,p)tk &=o
is called the generating function of the binomial distribution with parameters n and p (or of the random variable $„). Show that G(£) = (g }_ pt)n. [Note: Let those readers who know some differential calculus show from the definition of G that 6r'(l) = E(Sn) and (?"(!) +
991288
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 1 / BERNOULLI TRIALS 271
(Note: Recall the convention concerning binomial coefficients made in Formula (2.10) of Chapter 3.)
(b) Show that the sum of the probabilities h(k\n, p, N) taken over all possible values of F« is equal to 1, as required of a probability function. (Hint: Use Formula (2.11) of Chapter 3.)
(c) Our notation has been chosen so that the population of N objects has the relative frequency table given in (1.8). If J is the sample mean obtained in selecting the random sample, show that Yn — nX. Now use Theorem 6.2 of the preceding chapter to conclude that the mean and variance of Yn are given by
= np
(d) Show that as the population size N increases without bound, the hypergeometric distribution with parameters n, p, and N approaches the binomial distribution with parameters n and p. In symbols.
A(fcjn, p, N) —> b(k\n, p) as N —»°o.
The importance of this limit theorem lies in the fact that when n/N is small enough, binomial probabilities can be used as approximations to hypergeometric probabilities. (Hint: Write out the binomial coefficients in (a) and thus show that h(k\n, p} N) is equal to
/ _L\ ( & 1\ / _1\ / nkfn\P\P N)"\P N )q\q N)*"V N
W !/! 1
N
Now note what happens to each factor as N —»<».) (e) Suppose a sample of size n is drawn without replacement from AT objects of which Np are defective and Nq are good. In practice, one often knows N but the proportion defective p is unknown. If one obtains k defectives in the sample, then what is a reasonable estimate for this unknown proportion pi Since Np must be an integer, p is necessarily of the form j/N for some choice of j from among the integers 0, 1, • * , N. Estimating p is therefore equivalent to finding an estimate for the integer j.
Now the probability of getting exactly k defectives depends only on j once k, n, and N are fixed. Let us write
i, •£» N)
N
for this probability. The method known as maximumlikelihood estimation directs us to find that value of j, say j, such that h, is as
991289
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


272 BINOMIAL DISTRIBUTION / Chap. 5
large as possible. In other words, the probability of getting the experimental outcome actually obtained (i.e., exactly k defectives) is maximized if j = j. The number p = j/N is then called the maximum likelihood estimate of the unknown proportion defective p. To find p, proceed as follows: (I) Show that
h, jWJ + ln + t)
(II) Show that ^ is greater than 1 If j < k("N + ^ and is less
(HI) Conclude that if j is the greatest integer less than or equal to
^———j then the maximum likelihood estimate of p is given n
by p = J/N.
(f) Repeat the preceding problem, but now assume the sample is drawn with replacement, so that the binomial distribution applies. Show that the maximum likelihood estimate of p is given by p = k/n, the actual proportion defective found In the sample.
2* Testing a statistical hypothesis
In this section, we illustrate how the binomial distribution is used In a problem of statistical inference. We cannot here go into the general theory of hypothesis testing in statistics. Instead, we analyze one particular example in detail, in order to point out the highlights of the method of testing hypotheses.
The Committee for the ReElection of Smith as Mayor is meeting well ahead of election day to discuss campaign strategy. Smith, as the incumbent, is felt to have the edge on his opponent, but the committee wants some more information about this advantage as a guide to deciding whether to plan a very vigorous and expensive campaign, or a less vigorous and less expensive campaign. Since Smith's opponent is going aU out to win, and will undoubtedly reduce Smith's advantage during the campaign, the committee decides that they will raise funds for the more expensive campaign if 60% or less of the population is in favor of Smith, but that they will relax and wage the less expensive campaign if Smith has more than 60% of the voters on his side.
991290
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS 273
Let us denote by p the actual proportion of all voters in favor of Smith. If p were known, then the committee would have no problem. It would (according to its agreedupon plan) decide on one course of action if p < .60 and on the alternative course of action if p > .60. But p is unknown, and some evidence will have to be obtained in order to choose among the two possible types of campaigns.
It is customary hi statistics to say that there are two hypotheses, namely p < .60 and p > .60, and the procedure by which a choice is made between these hypotheses is called a test of one of the hypotheses against the other. The hypothesis that is tested is called the null hypothesis; the other is then called the alternate hypothesis. Although the committee will choose to accept one hypothesis or the other, it is customary to say instead that the committee's choice is between acceptance or rejection of the null hypothesis. We shall shortly make some comments about which hypothesis is to be taken as the null hypothesis, but for now let us make the following agreement:
Null Hypothesis: p < .60;
Alternate Hypothesis: p > .60.
The decision to accept or reject the null hypothesis will be based on the result of an experiment in which a certain number of people, say n, are randomly selected from the population of voters and then asked whether they are for or against candidate Smith. We cannot here discuss the very important practical problem of how to design a sample survey or opinion poll of this kind. But we shall assume that the selection of people is made in such a way that the process of sampling can reasonably be idealized as a Bernoulli process in which each trial (asking one of the selected people his voting intention) results in a success (will vote for Smith) or a failure (will not vote for Smith), the probability of a success on each trial being p, the proportion of people in the entire population who favor Smith. (Since the sample is ordinarily drawn without replacement, this theoretical model will be appropriate only if the sample size is very small compared to the number of voters in the entire population.)
Let us suppose that a sample of n = 20 people is drawn at random from the population and that each person is asked his voting intention. (We take so small a sample for illustrative purposes only; larger samples are discussed later.) For these n = 20 Bernoulli trials, let X denote the number of successes (people who say they favor Smith)
991291
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


"274 BINOMIAL DISTRIBUTION / Chap. 5
•obtained. Then X is binomially distributed with parameters n = 20 and p, but p is unknown. Our null and alternate hypotheses are thus statements about a parameter of a probability distribution. Such hypotheses are called statistical hypotheses.
The committee decision to accept or reject the null hypothesis will be based on the outcome of the poll of the 20 people in the sample, in particular, on the value of X obtained. Roughly speaking, the •committee will act on the assumption that p is low (and therefore accept the null hypothesis) if the value of X is small, and it will act on the assumption that p Is high (and therefore reject the null hypothesis) if the value of X is large. But this is quite vague, and it is clear that what we need is a rule that unequivocally prescribes the •committee's decision for each possible outcome of the poll. Consider for the moment the following example of such a decision rule: Reject the null hypothesis if and only if at least 13 of the 20 people in the sample say they are in favor of candidate Smith.
Note that the decision rule is completely described by giving the Talues of X that result in rejection of the null hypothesis. These values are called the critical set of values of X" for the given decision rule. If the observed outcome falls in the critical set, the null hypothesis is rejected; otherwise the null hypothesis is accepted.
Now the null hypothesis is, as a matter of fact, either true or false. And our decision rule leads either to acceptance or rejection of the null hypothesis. Hence the following possibilities can arise by use
991292
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS
275
a false sense of confidence wage a mild and less expensive campaign, even though Smith has no more than 60% of the voters on his side. This error, although it saves money, may lead to the defeat of Smith at the polls. If the committee makes an error of the second kind, then it will with a false sense of urgency wage a very expensive campaign, even though Smith has the support of more than 60% of the voters. This error leads to spending money for an expensive campaign which the committee, if it knew that Smith has such support, ^vould regard as unnecessary.
Since the committee is dedicated to Smith's reelection at all costs, the consequences of an error of the first kind are considered much more serious than the consequences of an error of the second kind. This fact accounts for our choice of p < .60 as the null hypothesis and p > .60 as the alternate hypothesis, rather than vice versa. For it is customary to formulate the null hypothesis so that rejecting it ivhen it is true (error of first kind) is more serious than accepting it when it is false (error of second kind). For example, in testing a new drug there are the two hypotheses "drug is toxic" and "drug is not toxic." The former would be taken as the null hypothesis, since rejecting it when it is true will lead to deaths of patients, whereas accepting this hypothesis when it is false will have the less undesirable consequences of loss of money by the manufacturer and unnecessary waste of the drug. Of course, in cases where the two kinds of errors are of the same importance, it is immaterial which of the two hypotheses is called the null hypothesis.
To study the decision rule stated above (for which n = 20 and the null hypothesis is rejected if and only if the value of X is at least 13) it is convenient to define the function x whose value for each possible value of the parameter p is the probability of rejecting the null hypothesis. That is,
20 ir(p) = P(X > 13) = 2 &(fc20, p).
The function TT is called the power function of the given decision rule.
The reader should check the following values of this power function
by referring to Table 37. (We write 0+ for a positive number less
than .0005, and 1— for a number greater than .9995 but less than 1.)
p 0 .10 .20 .30 .40 .50 .60 .70 .80 .90 1
Tr(p) 0 0+ 0+ .001 .021 .132 .416 .772 .968 1 1
991293
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


276
BINOMIAL DISTRIBUTION / Chap. 5
j 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 20 S &(Jfe20,*)
/ ^
/
I
1
/
1
/
j 1
^**+
0.1
0.4 0.5 Null hypothesis true
0.6 0.7 0.8 0.9 1.0 Null hypothesis false
Figure 28
The graph of this power function is shown in Figure 28 where we also indicate the graph (consisting of two horizontal line segments) of the power function for an ideal decision rule defined as a rule for which the probabilities of errors of both the first and second kind are zero. Since we are plotting the probability of rejecting the null hypothesis, we find in Figure 28 that this ideal power function has the value 0 when the null hypothesis is true and has the value 1 when the null hypothesis is false. For any value of p satisfying p < .60, the difference in heights of the actual graph and the ideal graph is the probability of an error of the first kind; for any value of p satisfying p > ,607 the difference in heights of the two graphs is the probability of an error of the second kind.
For the decision rule whose power function is graphed in Figure 28, we observe that as p increases from p = 0 to p = .6, the probability of an error of the first kind increases from 0 to .416. Similarly, as we
991294
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS 277
move to the left from p — 1, the probability of an error of the second kind increases from 0 to .584 as we approach the borderline value p = .6.
The committee is not very happy with this decision rule, for it involves high error probabilities. For example, even if candidate Smith is favored by only 50% of the voting population, the probability is .132 that there will be at least 13 people in favor of Smith among the 20 people in the sample, thus leading the committee to plan a weak campaign when a strong one is clearly required. And if the percentage favoring candidate Smith is less than but near 60%, the committee is appalled to find that the decision rule will lead to a wrong decision roughly 40% of the time it is used. And even if candidate Smith is comfortably in the lead with, let us say, 70% of the voters on his side, the sample of 20 will contain less than 13 in favor of Smith with ' probability .228 and the decision rule will then lead the committee to erroneously wage a strong expensive campaign.
The committee therefore asks whether it is possible to formulate a decision rule for which errors of the first and second kind are both smaller than for the rule already mentioned. Let us see what happens if we keep the sample size fixed at n = 20. Then the only sensible rules that the committee will consider will be of the form:
Reject the null hypothesis (that p < .60) if and only if X, the number in favor of Smith among the 20 people in the sample, is at least some specified number, say c.
Each choice of the number c determines one decision rule. We have already discussed the rule with c = 13. In order to compare the various possible rules, one determines the power function of each decision rule by using the definition of w(p) as the probability of rejecting the null hypothesis when the parameter value is p. That is,
20
TT(P) = P(X > c) = 2 6(i20, p).
k = c
We have used our table of binomial probabilities to compute ?r(p) for c = 15, 16, 17. These values are given in Table 40 and graphs of the corresponding power functions are drawn in Figure 29.
From either the table or the graphs we see that as c increases, the probability of an error of the first kind decreases for all p satisfying p < .60, i.e., the graphs move down toward the ideal graph for which
991295
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


278
BINOMIAL DISTRIBUTION / Chap. 5 TABLE 40
p 0 .10 .20 .30 .40 .50 .60 .70 .80 .90 1
*(p) c  15 0 IT 0+ 0 + 0 + .002 .021 .120 .416 .804 .989 1
c = 16 i 0+ 04 0h 0 + .006 .051 .238 .630 .957 1
e = 17 0 0f 0f 0 + 0+ .001 .016 .107 .411 .867 1
the error probability is zero. But, at the same time the probability of an error of the second kind increases; i.e., the graphs move down away from the ideal graph for all p satisfying p > .60. The committee thus learns that with sample size 20 it cannot simul
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
«*Null hypothesis true—Null hypothesis false*
Figure 29
991296
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS 279
taneously lower both the probability of making an error of the first kind and the probability of making an error of the second kind. The customary statistical procedure in this circumstance is to concentrate on the errors of the first kind since, as we have noted earlier, they are presumably more serious than those of the second kind. The committee chooses r, number, ordinarily denoted by a, which is the maximum probability of an error of the firs": kind that it will tolerate. In actual practice the number a is often chosen as one of the numbers .01, .05, or .10. Having picked a, +he particular decision rule is adopted which not only meets tho requirement that the maximum probability of an error of the first kind does not exceed a, but which in addition yields the lowest possible probabilities of errors of the second kind.
For example, suppose the committee chooses a = .02. Then in Figure 29 we seek that value of c for which the height of the power curve for p = .60 (which gives the maximum probability of an error of the first kind) does not exceed a = .02, but is as close to .02 as possible. We find from the figure or from the values in Table 40, that c = 17 has the required properties. Thus the committee's choice of a = .02 dictates the use of the decision rule for which c = 17; i.e., the committee determines the value of X, the number of voters for candidate Smith in the sample of size 20, and rejects the null hypothesis if and only if X > 17. Although the committee now has a very high probability of making an error of the second kind (wasting money on an unnecessary strong campaign) it does have assurance that there is only at most a 2% chance for an error of the first kind (not waging a vigorous campaign when Smith needs it to win).
Now suppose that the committee is not willing to risk such large chances of an error of the second kind. From Table 40, for example, we find that P(X > 17) = .107 when p = .70. Thus there is almost a 90% chance of accepting the null hypothesis (and therefore wasting money on an unnecessary strong campaign), even when Smith has 70% of the voters on his side. What can be done to maintain the maximum risk level given by a — .02 but at the same time to lower the risks of errors of the second kind?
With the sample size fixed at n = 20, there is nothing that can be done. But if larger samples are permitted, then risks of errors of both first and second kind can be controlled. We illustrate this point by considering samples of size n = 50, n = 100, and n = 300.
991297
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


280 BINOMIAL DISTRIBUTION / Chap. 5
Our decision procedure is stated in general terms as follows:
A sample of n people is drawn from the population of all voters. Let X be the random variable whose value is the number (among the r* selected people) who are in favor of Smith. Reject the null hypothesis (that p < .60) if and only if X > c, where c is determined so that the maximum probability of an error of the first kind does not exceed some prescribed value a (we suppose the committee chooses o: = .02) and so that probabilities of errors of the second kind are as small as possible.
From our previous discussion, the reader can see that to determine c we proceed as follows: First put p = .60, since it is for this value that the probability of an error of the first kind is largest. Any number c for which P(X > c) does not exceed a, i.e., for which
(2.1) 2 b(k\n, .60) < a,
fc = c
will determine a decision rule whose maximum probability of an error of the first kind is at most a. To also minimize the probability of making an error of the second kind, we select the smallest value of c satisfying the inequality in (2.1). Put differently, we choose c as the smallest number in the set
(2.2) {x \ S b(k\n, .60) < a}.
k = x
(We are assuming that a. is chosen so that 6(nn, .60) < a. The set in (2.2) therefore contains the number n and so is not empty.) This set, containing the values of X for which the null hypothesis is rejected, is called the critical set of values of X for the given decision rule.
The reader should refer to Table 37 and check that with a = .02 and n = 205 the value of c determined in this way is c = 17, as we have already seen in Table 40 and Figure 29. Using more extensive tables of cumulative binomial probabilities, we similarly find that with a = .02, the smallest value of c satisfying (2.1) is c = 38 for n = 50, c = 71 for n = 100, and c = 198 for n = 300. We therefore have four decision rules, all determined by the committee's setting of a = .02 as the maximum tolerable probability of an error of the first kind. Values of the power function of these four rules are given in Table 41. For comparison with Figure 29, we graph the three power functions for sample sizes n = 50, 100, and 300 in Figure 30.
991298
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS
TABLE 41
281
p .50 .60 .70 .80 .90 1
T(p) n = 20, c = 17 .001 .016 .107 .411 .867 1
n = 50, c = 38 0 + .013 .223 .814 .999 1
n = 100, c = 71 0 + .015 .462 .989 1 1
n = 300, c = 198 0 + .019 .941 1 1 1
As expected, the risk of making an error of the second kind goes down as the sample size increases; i.e., for each p > .60, as n increases the curves move up toward the ideal graph for which the probability of an error of the second kind is zero. From Table 41 we read that
0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
*Null hypothesis true*«—Null hypothesis false*
Figure 30
991299
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


282 BINOMIAL DISTRIBUTION / Chap. 5
with n = 300, the probability of finding at least 198 persons in favor of Smith when p = .70 is .941, so that the probability of an error of the second kind is reduced to .059. Thus we have demonstrated how the committee can maintain its maximum tolerable probability of an ^rror of the first kind at a = .02 and can also control the risks of €rrors of the second kind by sampling a sufficiently large number of people from among the entire population of voters.
For the remainder of our discussion, in order that we may use Table 37, we return to the simple decision rule with n = 20, c = 17. Since c is chosen as the smallest number in the set defined in (2.2),
we know that
20
(2.3) P(X >x)=% 6(fcl20, .60) < .02 for all x > 17,
&*=x
and
(2.4) P(X > x) = 2 6(fc20, .60) > .02 for all x < 17.
k=x
We are now able to see that although it is helpful in understanding the method of testing hypotheses to determine decision rules and power functions, it is in practice unnecessary to do so if all one wants to do is decide whether the experimental evidence leads to acceptance or rejection of the null hypothesis.
The result of the poll of 20 voters is the occurrence of the event X = x, where x is an integer from 0 (none hi favor of Smith) to 20 (all in favor of Smith). The larger the value of X, the more unfavorable Is the result of the poll to the null hypothesis that p < .60. The number P(X > x), calculated for the borderline value p = .60, is the probability of getting a value of X at least as unfavorable to the null hypothesis as the one actually observed and is called the statistical significance or the descriptive level of significance of the observed event X = x.
According to (2.3) and (2.4), if the descriptive level of significance of X = x is less than or equal to a = .02, then the null hypothesis is rejected, since x must then be greater than or equal to 17; if the descriptive level of significance of X = x is greater than a = .02, then the null hypothesis is accepted, since x is then less than 17. A value of X that leads to rejection of the null hypothesis is said to be significant at the level a = .02 (or at the 2% level of significance); a Talue of X that leads to acceptance of the null hypothesis is not significant at the level a = .02. Testing the significance of the observed
991300
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS 283
value of X at the level a (i.e., computing P(X > x) for p = .60 and comparing it with a) is therefore a way of determining the action to be taken without first finding the decision rule and its power function. For this reason, tests of statistical hypotheses are often called tests of significance.
To illustrate these ideas, suppose the committee has decided to sample n = 20 people and has set a = ,02 as the maximum tolerable probability of an error of the first kind. The 20 people are polled and the event X = 16 occurs; i.e., 16 people are in favor of Smith. With p = .60, we find from Table 37 that
P(X > 16) = .051.
Since .051 > .02, the observed event X = 16 is not significant at the 2% level of significance and the committee therefore accepts the null hypothesis. Note that if the committee had set a = ,06, say, then this same value of X would be significant at the 6% level of significance (since .051 < .06) and would therefore lead to rejection of the null hypothesis. By increasing a, the committee increases its chances of rejecting the null hypothesis. It also, of course, increases the chances of making an error of the first kind.*
We conclude by reminding the reader that we have discussed only one particular problem and that null hypotheses, decision rules, and tests of significance will generally assume different forms in different problems. Nevertheless, an understanding of this section should enable the reader to solve a variety of problems of the sort treated here where the binomial distribution applies. This hypothesis can be tested by trying the problems that follow.
PROBLEMS
2.1. Consider the decision problem discussed in the text, and suppose the committee chooses to base its decision on a sample of n = 20 people. As in the text, let a denote the maximum tolerable probability of an error of the first kind.
* The obvious fact that errors can be made when using tests of significance means that these tests are fraught with danger and must be interpreted with great caution. For a particularly impressive discussion of this point, with special reference to the field of psychology, see T. D. Sterling, "Publication Decisions and Their Possible Effects on Inferences Drawn From Tests of Significance—or "Vice Versa," Journal of the American Statistical Association, vol. 54 (1959), pp. 3034.
991301
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


284 BINOMIAL DISTRIBUTION / Chap. 5
(a) What decision rule is determined if a — 0? What then is the probability of an error of the second kind for all p > .60? Draw the graph of the power function for this rule.
(b) What decision rule is determined if the committee insists that the probability of an error of the second kind must be zero for all p > .60? What then is the value of al Draw the graph of the power function of this decision rule.
(c) What decision rule is determined if a = .10? For this decision rule, what is the probability of an error of the second kind if p = .70? if p = .80? What is the probability of an error of the first kind if p = .50?
(d) The committee has decided to use a = .10 as in part (c) and finds that 75% of the people in the sample are in favor of Smith. What is the descriptive level of significance of this observed event? Does the committee wage a very expensive or a less expensive campaign?
(e) The committee decides to use a — .01. How many people in the sample of 20 must be in favor of Smith before the observed event is significant at the level .01?
2.2. Suppose the committee decides on a sample of n = 10 people and sets a = .05. Determine the decision rule that should be used and draw the graph of its power function.
2.3. In a study of the effects of stress,* 20 college students were taught to tie a bowline knot by two different methods. Half the subjects learned method A first and the other half learned method B first. Later—after an active day and an evening final examination—each subject was asked to tie the knot. The prediction was that stress would induce regression, i.e., the subjects would tend to revert to the firstlearned method of tying the knot. Each subject was classified as a success (used knottying method he learned first) or a failure (used method he learned last). Assume that the experiment can be idealized as a set of 20 Bernoulli trials with (unknown) probability p for success on each trial.
Suppose the null hypothesis expresses the fact that there is no regression and that under stress it is equally likely to use either of the two methods of tying the knot. (It is cases of this kind that explain the use of the word "null": the null hypothesis asserts that stress has no effect.) The alternate hypothesis will then state that regression does occur; i.e., it is more probable that under stress the first^iearned method is used than the secondlearned method.
(a) Formulate these hypotheses in terms of p and determine the decision
* Barthol, R. P. and N. D. Ku, "Regression Under Stress to First Learned Behavior," Journal of Abnormal and Social Psychology, vol. 59 (1959). pp. 13138.
991302
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 2 / TESTING A STATISTICAL HYPOTHESIS 285
rule if a = .05; i.e., if an error of the first kind has probability at most .05.
(b) Suppose 15 of the 20 subjects use the firstlearned method of tying the knot. What is the descriptive level of significance of this observed outcome? Is it significant at the 5% level? Is it significant at the 1% level?
2.4. Determine a decision rule to test the null hypothesis p — .20 against the alternate hypothesis p = .60, assuming a sample of size n = 10 and a maximum tolerable probability for an error of the first kind equal to .05. What is the actual probability of an error of the first kind for your test? What is the probability that you will incorrectly accept the null hypothesis when p = .60?
2.5. The production manager of a company submits a report recommending hiring of additional repairmen. His conclusions are based on the assumption that, on the average, 20% of the machines in the shop will require maintenance on any given day; i.e., the probability is .20 that a machine observed for a period of a day (a machineday) will need the services of a repairman. The president of the company is interested in testing this assumption, since the conclusions of the report will be different if the assumed 20% is either too high or too low. Suppose (unrealistically, but in order to be able to use Table 37) that only 20 machinedays are observed and the president is willing to take at most a 10% risk of rejecting the assumption if it is true (a = .10).
(a) Formulate a null and alternate hypothesis, explaining how the binomial distribution applies (i.e., define trial, success, failure, etc.)
(b) Determine a reasonable decision rule for testing the null hypothesis.
(c) Of the 20 machinedays observed, seven required services of a repairman. What is the descriptive level of significance of this event? Is it significant at the .10 level?
(d) Draw the graph of the power function of the decision rule in (b) and on the same figure also draw the corresponding graph for an ideal decision rule for which the probabilities of both kinds of errors are zero.
2.6. A hair tonic manufacturer ckims that his product will cure baldness at least 70% of the time it is used according to instructions. Formulate null and alternate hypotheses to test this claim. Determine a decision rule assuming the maximum tolerable probability of an error of the first kind is a = .05. Use a sample of size n = 20.
991303
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


286 BINOMIAL DISTRIBUTION / Chap. 5
3. An example of decisionmaking under uncertainty
Analyses of the type discussed in the preceding section can be carried still further and made more realistic if we assign relative values to the losses that will arise when various kinds of errors are made. We should also take note of the fact that sampling involves certain expenses and that in some practical situations larger samples may cost more to obtain than the consequent reduction in probabilities of errors is worth. In short, statistical investigations are undertaken as a basis for action, and decisions should therefore be made in the light of all their relevant consequences.
We shall illustrate this approach by discussing a particularly simple problem that can be solved with the mathematical skills w^ have now accumulated.*
Before each production run, a machine used to produce a certain part must be adjusted by an operator. Five hundred parts are produced in each such run, and each part is classified as either good or defective. On the basis of his experience with the machine, the manufacturer is willing to assume that the production of the 500 parts can be thought of as a Bernoulli process in which the probability of a defective, denoted by p and called the average fraction defective, is the same on each of the 500 trials.
Two delicate adjustments must both be made perfectly by the operator before each run in order to have p = .01, which is the very best that the machine can do, because of mechanical limitations. But if only one of these adjustments is properly made, then the average fraction defective becomes p = .10, and if the operator happens to make neither adjustment properly then p = .20. We therefore have three possible "states" for the machine. On the basis of records of past production runs made by the operator, the manufacturer estimates that the operator will have both adjustments right 80% of the time, miss exactly one adjustment 15% of the time, and miss
* This problem, with changeb in numerical values, is one treated in great detail by somewhat different methods in R. Schlaifer, Probability and Statistics for Business Decisions, McGrawHill Book Company, Inc., 1959, especially Chapters 22 and 33. I here express my appreciation to Professor Schlaifer for permission to use this material. I am also indebted to Professor Howard Raiffa for introducing me to this kind of decision problem and for the particular method of solution used in the text.
991304
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / DECISIONMAKING UNDER UNCERTAINTY 287
both adjustments 5% of the time. These data are summarized in Table 42.
TABLE 42
State of Machine Probability of a Defective Part Given This State Probability of This State
I (Both adjustments right) II (Only one adjustment right) III (Neither adjustment right) .01 .10 .20 .80 .15 .05 i
Each of the 500 parts produced by the machine, whether good or defective, is used in assembling the final product, but a defective part requires special hand fitting which costs $5.00 per part. This means that a faulty machine setup (i.e., machine in states II or III) can lead to a fairly high cost of using defective parts.
However, the manufacturer can reduce these costs by calling in a master mechanic before the production run. If this is done, the machine is certain to be properly adjusted and therefore will be in state I, where the average fraction defective is equal to its minimum value p = .01. This special use of the master mechanic, Lowever, costs $50. Thus, if the regular operator has put the mach^o in state I (as he does most often), then this $50 would be a total loss. On the other hand, if the operator has missed one or both of the adjustments, then the saving in cost of using defective parts more than offsets the extra cost of hiring the master mechanic.
Finally, the manufacturer considers that he may be able to reduce his average costs by uospecting a sample of the product after the operator prepares the machine, but before beginning the actual production run. He might, for example, make a sample run of ten parts and then inspect each part. When the number of defectives among the ten parts in the sample is "high," he would call the master mechanic; when it is "low," he would order the regular run of 500 parts to be made without readjustment. But sample production and inspection cost $2.00 per part.
There are two possible decisions that the manufacturer can make:
1. Order the production run to proceed after the machine is prepared by the regular operator without calling the master mechanic; we shall call this a decision to moceed.
991305
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


288
BINOMIAL DISTRIBUTION / Chap. 5
2. Call the master mechanic and have him readjust the machine so that its average fraction defective is certain to be p = .01; we shall call this a decision to readjust.
The manufacturer, whose aim is to make the average cost of the entire production run as low as possible, asks the following questions:
1. Should the decision to proceed or readjust be made without a sample run or on the basis of evidence accumulated (at a price due to inspection costs) in a sample run?
2. If a sample run is not indicated, then which of the two decisions should be made?
3. If a sample run is indicated, then how large a sample should be taken; i.e., how many parts should be made by the machine? (We assume that all of these parts will be inspected.) And what decision rule should then be adopted; i.e., for each possible outcome (as measured by the number of defective parts discovered in the sample), which of the two decisions should be made?
By answering these questions, we give the manufacturer a rule foi action in the face of uncertainty (since the actual state of the machine is unknown). Moreover, this rule will be optimum in that it minimizes the average cost of the entire production run.
We first investigate the costs involved in each decision, assuming no sample run is made. If the decision is to proceed and we suppose for the moment that the state of the machine is given (i.e., p is known), then the mean number of defectives produced in the run is 50Qp, and so the mean cost of defectives is 500p X $5.00 = 2500p dollars. We compute this mean cost for each possible state of the machine in the third column of Table 43. Similarly, if the decision is to readjust, then the master mechanic adjusts the machine (makes p = .01) and the mean number of defectives produced in the run is
TABLE 43
State of Machine Probability of a Defective Part Given This State P Meau Cost of Defectives Given This State When Decision Is to Loss Due to Proceeding Rather Than Taking Better Decision if State Were Known Loss Due to Readjusting Rather Than Taking Better Decision if State Were Known Probability of This State
Proceed Readjust
I II III .01 .10 .20 S 25* 250 500 $75 75* 75* $ 0 175 425 $50 0 0 .80 .15 .05
991306
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / DECISIONMAKING UNDER UNCERTAINTY 289
500p = 5 parts. Hence, no matter what state the machine was put in by the regular operator, the mean cost of defectives when the decision is to readjust is 5 X $5.00 = $25.00, plus the $50.00 cost of the master mechanic, or a total of $75.00. This cost is listed in the fourth column of Table 43.
Note that if the machine is in state I, then the better decision (i.e., the one with the lower mean cost) is the decision to proceed. But if the machine is in state II or III, then the better decision is to readjust. The asterisks in Table 43 indicate the mean costs of the better decision if the state of the machine were known; i.e., if perfect information concerning the quality of the adjustments made by the regular operator were available to the manufacturer. But such perfect information is not available. Thus, if the machine is known to be in state I, then the better decision is to proceed and this action involves a mean cost of $25. Since this is the better decision for state I, the loss due to proceeding in the absence of perfect information happens to be zero; but the loss due to readjusting rather than taking the better decision is the mean cost of readjusting ($75) minus the mean cost of the better decision ($25), and hence is $50. Similarly, if the machine is in state II, then the better decision is to readjust. Hence, the loss due to proceeding rather than taking this better decision is the mean cost of proceeding ($250) minus the mean cost of the better decision ($75) or $175. In this way, we compute the losses given in the fifth and sixth columns of Table 43.
The loss due to proceeding is therefore $0, $175, or $425 with probability .80, .15, and .05 respectively, these being the given probabilities of the three states of the machine. Hence we find:
Mean loss due to proceeding = 0(.80) + 175(.15) + 425(.05).
= $47.50. Similarly, we find:
Mean loss due to readjusting = 50(.80) + 0(.15) + 0(.05)
= $40.00.
We conclude that if no sample run is ordered, then the mean loss due to the decision to proceed is $7.50 more than the mean loss due to the decision to readjust. The manufacturer should therefore decide to readjust; i.e., the master mechanic should be called in before the production run, thus making average costs $7.50 less per run than with the alternate decision to proceed.
Although the decision to readjust is the better decision, we have
991307
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


BINOMIAL DISTRIBUTION / Chap. 5
'Computed that the mean cost due to readjusting is $40.00. Hence $40.00 is the mean cost of uncertainty in this problem, and the manufacturer could therefore afford to pay any price up to $40.00 for the certain knowledge (never available in practice) of the value of p. In other words, his costs would average $40.00 less per run if he had information that allowed him to make the better decision for each .state, i.e., if he knew the true value of p and took the decision to readjust only when p = .10 or p = .20 (when it paid to readjust).
The state of uncertainty is somewhat reduced by evidence accumulated in a sample run. But such evidence costs $2.00 for each part produced by the machine and inspected. We turn now to the problem of determining whether the mean loss of $40.00 just computed •can be lowered by ordering a sample run before making a decision to proceed or readjust.
The decision rules we allow are of the following form:
Order a sample run of n parts. Let X be the random variable whose value is the number of defectives among the n parts in the sample. Make the decision to readjust if X > c, where c is some specified number. Make the decision to proceed if X < c.
Each choice of n and c determines one such rule, which we call the (n, c) decision rule. For example, the (5,1) decision rule requires that .a sample of n = 5 parts be produced; the decision to readjust is then made if and only if the number of defectives turns out to be at least 1. We now demonstrate how to compute the mean loss from the use •of such a decision procedure. We shall for the moment concentrate •on explaining the construction of Table 44 which concerns the (5, 1) decision rule. A similar analysis applies to any (n} c) rule, no matter what the values of n and c.
CD
(2)
(3)
TABLE 44
(4) (5)
(6)
(7)
(8)
Probability Probability Probability Mean Mean
of a of Decision of Decision Prob Loss Total
State Defective to Readjust to Proceed ability Loss Due Due to Loss Due
of Ma Part Given Using Using of Wrong to Wrong Wrong to Wrong
chine This State (5, 1) Rule (5, 1) Rule Decision Decision Decision Decision
I .01 .049 .951 .049 $ 50 $ 2.45 $ 12.45
II .10 .410 .590 .590 175 103.25 11325
III .20 .672 .328 .328 425 139.40 149.40
Mean loss due to use of (5, 1) decision rule = 12.45(.80) 4 113.25(.15) f 149.40(.05)
 $34.42.
991308
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


Sec. 3 / DECISIONMAKING UNDER UNCERTAINTY 291
Columns (1) and (2) of Table 44 are clear. Column (3) is obtained •directly from the cumulative binomial probabilities in Table 37. Under the (5,1) rule, the decision to readjust is made when Z, the number of successes (defectives) in the n = 5 Bernoulli trials making up the sample run, is at least 1. If p = .01, we find P(X > 1) = .049. Similarly, we read directly from the binomial tables that P(X > 1) •equals .410 if p = .10 and equals .672 if p = .20. Thus column (3) is completed. The probability that the (5, 1) rule will lead to a decision to proceed is 1 minus the probability that it will lead to a decision to readjust. Hence, the entries in column (4) in Table 44 are obtained directly from those in column (3).
The probability of a wrong decision, entered in column (5), is merely the probability that the decision rule leads to readjustment if p = .01 (when the better decision is known to be to proceed) and to proceeding if p = .10 or p = .20 (when the better decision is known to be to readjust).
The loss due to a wrong decision has been computed in Table 43, .and so column (6) of Table 44 is easily completed.
The entries in column (5) are multiplied by the entries in column (6) to give the mean loss due to a wrong decision; i.e., this mean loss (for given p) is the product of the loss and the probability with which it is sustained.
Finally, to the mean loss entered in column (7) we add the cost of the sample, which is $2 for each of the five parts sampled. The entries in column (8) are therefore merely $10 more than those in •column (7).
Since we are given (in Table 42) the probabilities of the three possible states, we can compute the overall mean loss due to the use of the (5, 1) decision rule. This we do in the lower part of Table 44. Since this mean loss is $34.42 and is therefore less than the mean loss of $40.00 due to a decision to readjust without a sample run, we see immediately that it does pay to order a sample run. The only remaining question is therefore the choice of the best possible decision rule.
To find the best decision rule, we must find the best pair of values of n, the sample size, and c, the smallest number of defectives leading to a decision to readjust. (Of course, "best" is interpreted as lowest mean loss for the production run of 500 parts.) We proceed by first finding the best value of c, given the sample size n; we then compare different values of n when each is used with the value of c that is best for it. From this point on, we only state results. The reader
991309
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


292 BINOMIAL DISTRIBUTION / Chap. 5
can verify each of our statements by carrying out computations similar to those used in constructing Table 44. (We omit problems at the end of this section, since there is ample opportunity to test one's understanding by checking our results.)
By keeping the sample size fixed at n = 5 and varying c, we find the following mean losses due to use of (5, c) decision rules:
Decision Rule Mean Loss
(5, 1) $34.42
(5.2) 49.82
(5.3) 56.03
It is clear that for samples of size n = 5, the best value of c is c = 1. In fact, similar computations show that for each of the sample sizes n = 4, 5, 6, 7, 8, 9 the best value of c is c = 1. (This is a peculiarity of our particular problem and is not generally true.) We thus obtain the following mean losses for decision rules with various sample sizes, each computed with the value c = 1 that is best for it.
Decision Rule Mean Loss
(4,1) $35.49
(5,1) 34.42
(6,1) 33.87
(7,1) 33.73
(8,1) 33.94
(9,1) 34.45
We note that rule (7, 1) has the lowest mean loss. It is therefore the decision rule preferred by the manufacturer. He orders a run of n = 7 sample parts. If at least one of these seven parts is defective, he spends the §50 required to have the master mechanic readjust the machine. If no defectives are found among the seven parts, then he orders the run to proceed without readjustment. His mean cost of uncertainty is thereby reduced to $33.73 from the $40.00 cost resulting from his best decision (namely, to always call the master mechanic) in the absence of a sample run.
SUPPLEMENTARY READING
The binomial distribution is discussed to a greater or lesser extent in the probability and statistics books included in reading lists at the end of
991310
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


SUPPLEMENTARY READING 293
preceding chapters. Some idea of statistical applications will be obtained by also consulting the following books, in addition to the references mentioned in footnotes.
1. Ackoff, R. L., The Design of Social Research, University of Chicago Press, 1953.
2. Bross, I. D. J., Design for Decision, The Macmillan Company, 1953.
3. Cowden, D. J., Statistical Methods in Quality Control, PrenticeHall, Inc., 1957.
4. Dodge, H. F. and H. G. Romig, Sampling Inspection Tables, 2nd edition, John Wiley and Sons, Inc., 1959.
5. Mosteller, F.} "Applications," pp. xxxivlxi, in Tables of the Cumulative Binomial Probability Distribution, Annals of the Computation Laboratory of Harvard University, vol. XXXV, Harvard University Press, 1955.
6. Sprowls, R. C., Elementary Statistics for Students of Social Science and Business, McGrawHill Book Company, Inc., 1955.
7. Wallis, W. A., and H. V. Roberts, Statistics, A New Approach, The Free Press, 1956.
Note. Now that you are at the end of this book, you can review some of the things you have learned and prepare the way for continued study of probability by reading the following articles.
Curtiss, J. H., "Elements of a Mathematical Theory of Probability," Mathematics Magazine, vol. 26 (1953), 233254.
Halmos, P. R., "The Foundations of Probability/' American Mathematical Monthly, vol. 51 (1944), 493510.
Robbins, H., "The Theory of Probability," Chap. XI in Insights Into Modern Mathematics, Twentythird Yearbook, National Council of Teachers of Mathematics, Inc., 1957.
991311
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


991312
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS TO ODDNUMBERED PROBLEMS
Chapter 1
1.1. (a) Finite, one element; (c) Infinite; (e) Finite, four elements, {l»2*4»3~»2, i>2»3>4>2, 2>4»3»2»l,
2 » 3 » 4 ^ 2 » 1}; (g) Finite, two elements, {2,1}.
1.3. Consider numbers of the form n2 + 2(71 — l)(n — 2)(n — 3), and find x such that this number is 94 when n = 4. x = 13.
1.5. (a) {(2, 3)}, the point of intersection of the two lines;
(b) 0, for the two lines, being parallel, have no points in common;
(c) {(x, y) \x + y = 5}, the set of all points on the graph of the equation x + y = 5, for the two equations define the same line.
1.7. (a)A=£; (c) A = B; (e) A = ft, 2} * B = {0, J, 2}.
2.1. (a) The same number appears on each die; (c) The sum of the numbers is 4.
2.3. B = 0.
2.5. (a) Correct; (b) Incorrect, for the only subsets of {{!}} are 0 and {{!}}; (c) Correct, for the elements of {!,{!}} are 1 and {!};
(d) Correct.
2.7. (a) {(0, 2), (0, 2)}; (c) 0; (e) Upper semicircle, including the points (2, 0) and (2, 0).
2.11. 5  4 = 20.
2.13. 8 * 8  9 • 104 = 5,760,000.
2.15. (a) 6; (b) 9; (c) 3; (d) 6.
2.17. Assuming the coins distinguishable: 23, 24, 2n ways.
2.19. (a) 169; (b) 338; (c) 169.
2.21. (a) 23, 33, n3; (b) 2r, 3r, nf.
295
991313
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


296 3.1. A! = {&, c},
ANSWERS: CHAPTER 1 A(JB = {a, 6}, AOB = 0,
B'={a,c}, A'C\(A\JB)= {b}. 3.3. (a) at = {BEE, HET, ETH, TEB, ETT, TET, TTE; TTT], where an element represents the outcome for the penny first, the nickel, and then the dime;
(b) A1 = {TEE, TET, TTE, TTT},
A U B = {EBB, BET, ETT, ETB, TTT},
A n C » {BEE, BET, ETB},
A' n C = {TEE}, (A Pi B) n C = {BEE}.
3.5. (a) (i) 54, (in) 3; (b) (1) I7 H C, (3) (F U A7)' or 7' H tf'.
3.7. (a) nCU) = n(A) + n(A'); (b) Let B = A' and note that then
n(A Pi B) becomes n(0)  0. 3.9. 4.
£ W U B) U C = A U (B U C) C
991314
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 1 4.3. (a)
297
A B A' B' A'n Br (Af n By AVB
€ e i i i 6 e
€ i i 6 i £ €
i 6 e i i € €
i i e € e i £
(e)
A B c A' B' C' Bn c (Af n (B n O) (.4' n (£ n c))' AU5' A U B' U C'
f. e e * i i e f # / e
£ * + e t € i t * e
€ i i i € € i * 6
# e f € i t e c i
i e t e i e f f i * i
/
4.5. (a) In Figure 9 of the text, both sides are represented by Ri & Rz & £3;
(b) In Figure 9 of the text, both sides are represented by RI & Rz & E4; (e) In Figure 10 of the text, both sides are represented by RI & Rz & E3 & R4 & R& & R7 & jK8.
4.7. (a) If (1) C U B = B and (2) B \J W = F, then C U ^ = WProo/; CUW" = CU(^U^ = (^U^)U^ = ^U^=^ by using (2), law 8a, (1) and (2), respectively. 4.9. (AC\B) nCCOD) = [(A n B) n C] H A by law 8b, = [A n (B H C)] n A by law 8b.
4.11. (a)
991315
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


298 ANSWERS: CHAPTER 2
elude that 1 X B C C X D. The converse is false, but if either A « 0 and B = 0, or A ^ 0 and 5 7* 0, then the converse is true.
5.5. (A X 11) H (01 X B) = (1 X 11) n ((^ U A') X B)
= (1 X Hi) O ((A X B) U U' X B)), by Problem 5.4, = ((A X ^) O (1 X B)) U ((A X at) O (A' X B)),
by 9b of Theorem 4.1, = (A X B) U 0 = A X B,
since (A X «U,) H (1 X B) = (1 X B) and (1 X 11) O (A' X B) = 0. 5.7. (a) a = d, 6 = e, and c = / implies that (a, 6, c) = (d, e, /).
Conversely, (a, 6, c) = (d, e,/) implies ((a, 6), c) = ((d, e),/), which implies (a, 6) = (d, 6) and c = /, which in turn implies that a = d and 6 = e, proving the assertion.
(b) If the corresponding objects of the rtuples are equal, the equality follows immediately. We now show that, conversely,
(01, 1. We know the result is true for r = 2 and r = 3 by part (a). Assume the result is true for the integer k, where k > 1. Using the definition of an ordered (k + l)tuple,
(di, as, • • •, 0fc, flfc+i) — (&i» ^2, • • •, &*, 64+1) means
((ai, 02, • • •, afc), afc+i) = ((61,62, • * •, 6*), &A+I). This equality of ordered pairs implies that OA+I = 6^+1 and (0i, 02, • • •, ofc) = (61, 62, • • •, 6fc). But by the induction hypothesis, it follows that 01 = 61, • • •, Ofc = bk, which establishes the result for r = k + 1. Hence we have shown that if the statement is true for r = k} where k is any integer greater than 1, then it is also true for r = k+l. This, together with the result that it is true for r = 2, completes the proof for all integers r > 1.
Chapter 2
11. (a) The set Din (1.3);
(c) 8 = {PN, PL, PQ, NP, ND, NQ, DP, DN, DQ, QP, QN} QD}; (e) 8 = {(03 2), (1,1), (2, 0)}, where (0, 2), for example, represents the outcome of zero objects hi cell 1 and two objects in cell 2;
(g) 8 = {FFF, FFM, FMF, FMM, MFF, MFM, MMF, MMM}; (i) 8 = {0,1,2, • • , r}, the set of possible numbers of heads, or with more detail,
S= {fa, •••jXr) a?i€A,t'= 1,2, ••,r}, where A = {H,T}.
991316
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 2 299
1.3. 4.
1.5. All are suitable except (b) and (e). 1.7. S = {16, Iw, 26, 2w} or S = {16i, \wit lw«, 262, 263, 2w;3}. 2.1. (a) E = {^ £*, • • •, 2.}; (c) E = {.4,}. 2.3. (a) Let A = {1, 2, •  , 365}. Then
^={3}X4XAX—XA(rl A's), F = A X {28} X A X • • • X A (r  1 A's in all), E H F = {3} X {28} XAXXA(r2 A's);
(b) n(J£) = n(F) = 36571, n(.E H F) = 365r~2, and n(E \J F) = 729(365)r~2 (cf. Example 1.3.4).
2.5. The relations are readily seen from the following:
S = {(0, 2), (1,1), (2, 0)}, E = {(0, 2)}, F = {(2, 0)}, 0  {(0, 2)}.
3.1. (a) A; (c) TV
3.3. (a) S = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADP, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF, DEF}, assign probability ^ to each simple event; (c) ^; (e) J.
3.5. (a) If S = {HHH, HHT, HTH, THE, HTT, THT, TTH, TTT}7 and J is assigned as the probability of each simple event, then P (exactly two tails) = f;
(c) If S = {(0, 2), (1,1), (2, 0)} and we assign  as the probability of each of the tnree simple events, then P(one cell empty) = f.
If we assign probabilities of i, , and i to the simple events {(0, 2)}, {(1,1)}, and {(2, 0)} respectively, then P(one cell empty) = f. The latter assignment is preferred;
(e) IfS= {(x1,x2,,o:r)^€A,t=l,2,...Jr}JwhereA= {H,T}, and we assign the same probability to each simple event of S, then P(aH coins fall heads) = (J)r;
(g) If S = {Sun., Mon., Tues., Wed., Thurs., FrL, Sat.} and we assign to each simple event the probability $, then P(13th day falls on Sunday) = ^. But see American Mathematical Monthly, vol 40 (1933), p. 607, for a demonstration that the 13th day is more likely to be Friday than any other day of the week.
3.7. P(E fl F) = Grb)2, P(E\JF) = 729/(365)2.
3.9. (a) 123,132, 213, 231, 312, 321;
(c) P(El) = P(E2) = P(#3) = I, P(Ei U Ej = i, P(Ei H Ej « fc
991317
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


300 ANSWERS: CHAPTER 9.
4.1. P(E) = H, 11 to 1. 4.3. 5 to 4. 4.5. TV < P(F) < i , the extreme values occurring when E C] F = 0 and
E (~} F *= E, respectively. 4.7. (a) S = {(a;, y) \ x e D, y e D, x ?* y} where D is defined in Example
1.3., and we assign probability 1/2652 to each simple event of $;
(b).A; (c)25tol. 4,9. 0.8. 4.11. (a) i; (b) i; (c) If, for any k = 1, 2, 3, • • •, an integer p is selected
at random from among the first 2(10)fc positive integers, then the
probability that p is divisible by either 6 or 8 is J.
4.13. (a) P(Ef {JF') = l P(E H F), the probability of not both E and F; (c) P(E' U F) = 1  P(£) + P(E H F), the probability of F or notE; (e) P(E fi ^0 = JPW  P(^ H F), the probability of # but not F.
4.15. If EI represents selecting a spade, E% an honor card, and E% a deuce, then
4.17. The theorem follows immediately from Formula (4.6), Definitions 4.2 and 3.3, and by noting that if EI, E% and E$ are mutually exclusive in pairs, then
EI n #2 n ^s = ^i n (a n a) = EI n 0 = 0.
4.19. The theorem is true for k — 2 and k — 3. (Cf. Theorem 4.5 and Problem 4.17.) Assuming the theorem to be true for any k events (the induction hypothesis), we must show that it is true for & + 1 events. This, plus the fact that the theorem is true for k = 2, will complete the proof for all integers k > 1. Now
But by Theorem 1.4.2. and since EI, E2, • • • , JSk+i are mutually exclusive in pairs, it follows that
(j&u&u — UJ&>na*i = 0.
Hence, by Theorem 4.5 and the induction hypothesis,
establishing the theorem for k + 1 events and thus completing the proof.
991318
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 9. 301
fM 1 _ _1_ 4. 1 ___ 1 fc 5. w 1 21321 432.1 8'
(c) In general, the probability of at least one match is
!_! + !_ L.J_ 1! 2! 3! " Nf
where Nl denotes the product of the first N positive integers.
r: i (A\ ik _ 1. /ON A . 1. O.J.. ^a; Q> \w 10 «
A 6 if 6
_2_ 1
5.3. Y = • (Our sample space contains ten elements, but the simple
"2T •
events are not assigned equal probabilities.) 5.5. M ' fw = 59, approximately.
5.7. (a) (i) J. (ii) i;
(b) (i) S = {(«!,   , x^) 1 xt € {H, T}3 1 == 1, 2,   , N} ; assign probability (J)^ to each simple event of S.
(ii) 2*V2* = . (Hi) (i)^(i)^1 = i. 5.9. ^ f, . 5.11. i
5.13. (a) .00359; (b) (1  .00359) (.00380) = .00379;
(c) (1  .00359) (1  .00380) (.00396) = .00393.
5.15. First derive the identity in Problem 5.16(f) and then use P(E H F) > P(E)P(F), which follows from the given inequality.
5.17. a, 6, c, d, and e must satisfy the following equations:
i irl & 3 c 2
, ,
2 a+o 8 c+a+e 5
=  Solving these equations, we find the unique solution
c + d { e 5
« = A, & = A, c = i (2 = Tk, and e =
5.19. (a) Plan l:
Plan3:
991319
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


302 ANSWERS: CHAPTER 2
6.1. (a) Define S = {(x, y)\x€C,yeC, and x?*y}, where
C « {Bi, #2, £3, £4, 6ri, (r2) , the set of six children. Assign probability •sV to each simple event and note that there are ten elements in the subset E for which the second child is a girl. Thus P(E) = J; (b) P(E) = ()() + ()() = i
6.3. The probabilities needed are: P(E) = 0.254, P(tf') = 0.746, P(El\E) = if
6.5. f.
6.7. f == .90, approximately. 6.9. TV 6.11. J.
6.13. E C\EiC, E,i *= 1, 2,  • • , n, by Definition I.3.I., demonstrating condition (i) of Definition 6.1. Also, use Theorem 1.4.1. to show that for i ^ j9 (E O Ei) C\(E C\EJ = 0 follows from the hypothesis that E* O EJ = 0. Thus condition (ii) holds. Finally, since {Ei, *  •, En} is a partition of S, if x is any element of E C 5, then there exists some #,• such that a: e Et. Then x*(E C\ El), which demonstrates condition (in).
7.1. Independent events in (a) and (b), dependent in (c).
7.3. (a) P(E)P(F) = (A)(«) * P(E H P) = i; (b) Hn = 3, theevents are independent. (Cf. Example 7.3.) To prove the "only if" part, we note that if S = {(xlt •  • , x«) 1 xt c {H, T}, i = 1, 2, •  , n} and we assign probability ()n to each simple event of S, then
If J5J and F are independent, then
/n + l\/2» 2\ =n
\ 2« A 2" / 2*'
which implies that n + 1 — 271"1 or % = 3.
7.5. (a) Let S be the set of 7460 females in the sample. 1/7460; (c) 0.143; (e) 0.014, approximately.
7.7. All independent.
7.9. Since P(F) = 1 and, by Theorem 4.2, P(E U F) > P(F), it foUows that P(E U F) = 1. Now use Theorem 4.4 to show that
P(E)P(F) = P(E O F).
991320
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 2 303
7.11. No. For counterexample, choose any event F with P(F) = 1 and let E and G be any dependent events. (Cf. Problem 7.9.)
7.13. (a) P(S)P(A) = (i)(J) * PCS n 4) = A; (b) 9. 8.1.
but P(EdP(EjP(E*) = TV ^ P(#i H E, n S) = 0. 8.3. P((El n #2) n Ez)  P(#i n & n Ez) = P^PC^PC^), since we
know Equation (8.3) holds. But then P(El)P(E^) = P(El C\ E2), and thus P((#i 0 #2) n #3) = P(El Pi #2)P(J&). That ^ and S are not necessarily independent may be seen by letting E% = 0, and #1 and E^ be any dependent events.
8.5. Twice, with probability .46.
8.7. P(Ei Pi E$ = P(Ei)P(J5J2) by hypothesis. Consider the caseP(^) = 0. Then suice 0 < P(El H ^s) < P(#s) == 0, it follows that
0 = P(El O E$ = P(E1)P(EZ). By an identical argument, P(#2 Pi ^3) = P(EZ)P(E9). Similarly,
P(El n E, n #3) = P(EdP(E*)P(E$  0.
In the case P(J£S) = 1, since P(E3) < P(^i U &), it Mows that P(Ei U ^3) = 1. Then, by Theorem 4.4, P(El C\ E,) = P(E1)P(EZ). By the same argument P(#2 O S) = P(E$P(E*). Also since P(&) = 1, it follows that P(£ri U ^2 U #3) = 1, and then, by using the result of Problem 4.14,
P(El C\E,r\ E3)
8.9.  One needs to prove that if Eit E2} ••, En are independent
?l i~ JL
events, then El, E£,   * , E» are also independent. 8.11. .012.
9.1. (a) Sample space is 8 X S X S, where S = {H, T}, and i is the probability assigned to each simple event of S X S X >S; (b) Using the same sample space as in (a), we assign the probability pkg3~k fa eack single event whose 3tuple contains k H's and therefore3  ft T's.
9.3. The sample space is S X S X    X S (ten S's) where
S = {correct, incorrect}. (i)fc(f)10~* is the probability assigned to eack simple event whose 10tuple contains exactly k "corrects." P(9 or 10 correct) = 51 (J)10 = .00000084, approximately.
991321
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


304 ANSWERS: CHAPTER 3
9.5. .784. 9.7. 7.
9.9. (a) Sn X Sni, , l •, (b) B and D; (d) B, D, A, and 7.
n(n — 1)
10.1. (a) ui = w2 = A, 20! = 2t?2 = T
991322
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 3 305
7 \ fi/5\ /3\
123 (a) M.4.2; = JL. *\Z  i.. fc) VU _ £.
1.23. (a) , n ^ 165> W g  21, (c) 7  3.
1.25. We give the number of different poker hands of each kind. The required probability is this number divided by 2,598,960, the total number of poker hands, (a) 1,302,540; (b) 123,552; (c) 54,912; (d) 10,200; (e) 5108; (f) 3744; (g) 624; (h) 40.
1.27. (a) S = Si X 81 X Si X Si, but the probabilities of simple events of S are not assigned according to the product rule. In general, knowledge of any hand changes the probability of any of the other hands having a certain makeup;
answer to (c); (d) Refer to Formula II.9.8, using P(Ed = P(Ci}.
1.29. (a) 5 = 13j approximately;
= .21,approximately;
(c) 3! = .11, approximately.
(is)
4!/13W13\3 ^
(is)
1.31. From the preceding problem, the probability that the queen falls is
•407 + (I) (.497) = .531. Hence the odds are approximately 53 to 47, or 1.13 to 1.
2.1. (a) p5
(c) a4  12a36 + 54a262  108a65 + 8164.
2.3. (a) 1.072; (b) 1.219; (c) .922.
991323
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


306 ANSWERS: CHAPTER 4
rnl rn(n — 1)1 (n — 1\
T^r
(n  1)1
(r  l)!(n r)! r\(n — r  1)1 r(n  1)1 + (n  r)(ro  1)1 .
2.7.
(r + l)I(nrl)I (r + l)rl(n  r)(n  r  1)1
_ n — r fn\ ~ r+l\r)
2.9. (a) _ fet . sfa !)(* 2).. (xr+1). If , .
such that 0 < re < r, then a term of the numerator above is zero. Hence f x \ = 0, as defined in Equation (2.10). If x > r and an integer,
then by multiplying the above expression by j  ~> we have
(x — r) !
('x\ xl
} — 77 — ' — — f> as previously defined. r) r!(xr)!
2.11. (a) 1; (b) 252; (c) 12,600.
Chapter 4=
1.1. (a) i(x) = J, , ,  for x = 0, 1, 2, 3, respectively, /(x) = 0 otherwise; (b) F(x) = 0 if x < 0, F(x) = JifO 3.
1.3. (a) /(z) = ff, 4, 4i for x = 0, 1, 2, respectively, /(x) = 0 otherwise;
(b) F(x) = Oifx<0,F(x) = HifO 2.
1.5. (a) k = i; (b) J, 1, ; (c) 2; (d) F(x) = 0 if x < 0, F(x)  * if 0 < a? < 1, F(x) = J if 1 < x < 2, F(x) = 1 if x > 2.
i7. (b) i i, A, i i i 1, *;
(c) /(») = i, i ,  for x = 1, 1, 2, 3, respectively, /(x) = 0 otherwise.
/13V 39 \ 1.9. /(x) = V x /)jj a/ for i = 0, 1, ••, 13; one finds (with three
(S)
decimal place accuracy)
991324
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 4 307
/(re) = .013, .080, .206, .286, .239, .125, .042, .009, .001 for x = 0, 1, 2, 3, 4, 5, 6, 7, 8 respectively, and /(re) = 0 for all other re. 1.11. (a) The event (X < b) is the union of the two mutually exclusive events (X < a) and (a < X < b). Hence F(b) = F(a) + P(a < X < b) from which the result follows;
(c) The event (a < X < 6) is the union of the mutually exclusive events (a < X
991325
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


308 ANSWERS: CHAPTER 4
2.21. (a) Unique if p ^ F(xk) for k = 1, 2, •  •, N. If there is a possible. value xk of X for which F(xk) = P, then there are infinitely many medians.
2.23. From the hypothesis it follows that if a + d$ is a possible value of X for any number dj, then a — d3 is also a possible value and
/(a + *)=/(* 4). Suppose there are p such pairs. Then
E(X) = af(a) + 2 (a + d,)f(a + d,) + 2 (a  d,)/(a  d,)
j=i j = i
= af/(a) + I /(a + d,) + 2 /(a  d,)] = a, L ji y=i J
since the sum in brackets is the sum oif(xk) over all possible values #& of X, and hence equals 1.
3.1. nx = f , oi: = 3%:, or = 0.43, approximately;
pY = 7000, o = 4,500,000,
991326
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 4 309
3.17. In each case, Chebyshev's inequality says probability is greater than f for z = 1.5 and greater than f for 2 = 2. The actual probabilities are, for z = 1.5 and z = 2 respectively: (a) 1, 1; (b) f, ^; (c) f, 1.
3.19. z = V2, VlO, \/20, 10.
4.1.
^\ 0 1 2 3 P(Z = z)
1 0 3/8 3/8 0 3/4
3 1/8 0 0 1/8 1/4
P(Y = y) 1/8 3/8 3/8 1/8 1
Y and Z are dependent.
4.3.
x ^^^\ 0 1 2 3 P(X  x)
0 0 6/27 0 0 2/9
1 6/27 6/27 6/27 0 6/9
2 2/27 0 0 1/27 1/9
P(Y = y) 8/27 12/27 6/27 1/27 1
X and y are dependent.
4.5. h(x, y)
_26_ \
~ —
M\ (l3J
x and y are any nonnegative
integers for which x + y < 13, h(x, y) = 0 otherwise. X and 7 are dependent since /i(13,13) = 0, but
P(X = 13) = P(7 = 13) = so that A(13,13) ^ /(13)0(13).
4.7. Independence follows from Theorem 4.2 by considering the four tosses as two independent trials of two tosses each.
991327
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


310 4.9. (a)
ANSWERS: CHAPTER 4
X ^N. 1 2 3 4 5 P(X  x)
1 1/25 1/25 1/25 1/25 1/25 1/5
2 0 1/20 1/20 1/20 1/20 1/5
3 0 0 1/15 1/15 1/15 1/5
4 0 0 0 1/10 1/10 1/5
5 0 0 0 0 1/5 1/5
P(Y  y) 12/300 27/300 47/300 77/300 137/300 1
4.11.
(c) P(Y = 2/jX = 3) =  for y = 3, 4, 5;
(d) P(X = £]F = 3) = if, if, If for x = 1, 2, 3, respectively;
(e) A, 4«.
if x < 0 or if z < 1 if 0 < 2 < 1 and 1 < z < 3 if 0 < x < 1 and 3 < z if 1 < z and 1 < z < 3 if 1 < s and 3 < z
4.13. Using notation in Table 29, let ct/ = f(x3)/f(xi) and show that if X and F are independent, then for k — 1, 2,  • , JV we have
r o
L i
4.15. (c) Let X have exactly two possible values differing only in sign, say +1 and — 1. Let 7 be any random variable such that X and F are dependent. Since X2 has only one possible value, X2 and F2 are independent.
5,1. (a) Let/(x) = P(X + Y = x). Then/(x) = .1, .2, .3, .4 for a; = 2, 3,
4, 5, respectively, and/(x) = 0 otherwise; E(X + F) = 4;
(b) Let g(x) = P(XY  x). Then g(x) = .1, .2, .1, .2, .4 for x = 1, 2, 3, 4, 6, respectively, and #(0;) = 0 otherwise; E(XY) = 4.
5.3. (a) Not true for all Z, F. False for random variables in Problem 5.2(a). True if F = X, for example;
(c) False for random variables in Problem 5.2(c). True if F = X; (e) True for all Z, F.
5.5. (a) 80 and 13; (b) 20 and 13; (c) 210 and Vl396.
991328
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 4 31 f
5.7. (a) Y(0k) = b for all Ok eS; i.e., 7 is a constant function. Note that our notation does not distinguish a constant function from the number that is its constant value;
(b) Y is the function equal to a for all Ok e S. X and F are independent by the result in Problem 4.10.
5.9. First generalize Theorem 5.1 to functions of n random variables, and thus show that
E(XiX2 • ' • Xn) = S VM • •  Vjlfa, 02, • • •, *>n)
where &(i>i, vz, • • • , t>n) = P(Zi = t>i, Z2 = v% • • •, 3T« = fl») and the sum extends over all possible values Vi of Xi, v% of Xt, • • • , vn of XnBut by Definition 4.4,
where /* is the probability function of X& Hence (as in Theorem 5.4 where n = 2), the sum can be written as a product of the n sums S vkfk(Vk) f or & = 1, 2, • • • , n. Since the &th sum extends over all possible values Vk of Xt, it is equal to E(Xk) and the result follows. (Note: A proof by mathematical induction is also possible. In such a proof one needs to use the following fact: If Xi9 X% • * •, Xn are independent and if Y = XiX%  * • Xn~i, then 7 and Xn are independent. This result can be proved by a method similar to that used below in the solution to part (a) of Problem 5.11.)
5.11. (a) Let x and y be any numbers. Then
P(7fc = y, Xjc+i = x) = S P(Xi = fli, • • , Xk = I* Xjc+i = x), the summation extending over all values Vi of Xi, • • • , % of Xt such that
ai»i + •  • + a^t = 2/Now we invoke Definition 4.4 to obtain P(Yk = y, Xk+i = x) = S P(Zi = «!). P(Zfc = t;fc)P(Zfc+i = x).
Since x is a constant with respect to this summation, the term P(Xfc+i = x) can be placed before the summation sign. The remaining sum is just P(7fc = y). Hence,
P(7fc = t/, Xw = x)  P(7& = 2/)P(X*+1 = x),
which proves the independence of 7& and Xk+i* (b) Let I be the set of positive integers for which the theorem is true. By Theorem 3.3 and (5.10), we know that 1 € I and 2 el. Now let us assume that Jc e / and show that then (k + 1) e I for any integer k.
991329
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


31 2 ANSWERS: CHAPTER 4
When there are k + I independent random variables Xi, • • •, Xk+i, then by part (a), Yk and Xk+i are independent. Hence, by (5.10), Var(F* + a^Xk+i) = Var(F*) + af+1 Var(Z*+1).
Now use the induction hypothesis (that k e I) to expand Var(F*) and thus show that (k + 1) c I. This completes the proof.
5.13. (a) MZ = 1, crz = 1/V2; (b) The probability function of X for samples of size 2 is given by:
X 0 1/2 1 3/2 2
P(X = a;) 1/16 4/16 6/16 4/16 1/16
P2 = 1 and cr = ; (c) The probability function of .5f for samples of size 3 is given by:
X 0 1/3 2/3 1 4/3 5/3 2
P(X = a?) 1/64 6/64 15/64 20/64 15/64 6/64 1/64
5.15. (a) /xz = $5450, o = 3,322,500, ax = $1823, approximately; (b) The probability function of % is given by:
X 3500 4250 5000 5500 6250 7000 7500 8250 9000
P(X = 5) .09 .24 .16 .12 .22 .08 .04 .04 .01
Px = $5450, o = 1,661,250, as = $1289, approximately.
6.1. Write ju3 for E(X3) and use the definition of variance together with (5.4) to obtain
VarCZ! +  •  + Xn) = E([(Xi  ^ +    + (Xn  M*)P). Now perform the indicated squaring of the sum in brackets and use Definitions 3.1 and 6.1 to complete the proof.
991330
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 4 313
6.3. (a) Letting/(x) = P(X = x) we have
ffl = .1, .4, .2, .1, .2
for x = *§*, H*, *P, ^, *§* and/(x) = 0 otherwise;
E(X) = 108, Var(J) = if*;
(c) /(£) = 1 for £ = 108,/(z) = 0 otherwise;
E( 1) = 108, Var(Z) = 0.
6.5.
991331
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


314 ANSWERS: CHAPTER 5
r 1 if m < 0 6.17. pm = \ 0 if m = 0
I 1 if m > 0.
6.19. Without loss of generality (see Problem 6.18), we can assume that X and F each have possible values 0 and 1. Then
so that Cov(Z, F) = 0 implies
P(X = 1, F = 1) = P(X = 1)P(F = 1).
Show then that the other three joint probabilities must also be products of the corresponding marginal probabilities.
Chapter 5
1.1. Probabilities are .107 for (a) and .069 for (b).
1.3. .000006.
1.5. Using Theorem 1.2, find p = .60 and n = 20. Required probabilities are (a) .998 (b) .126 (c) .245.
1.7. (a) .772; (b) .746; (c) p = .376.
1.9. n > 69.
1.11. Corresponding to the values of p given in Table 37, the probabilities of accepting the lot are .983, .736, .392, .069, .008, .001, and .000 in part (a), and .904, .599, .349, .107, .028, .006, and .001 in part (c).
1.13. (a) By (1.3), 6(71  jfcn, 1  p) = ( * Vl  p)**p«fr*>. Now recall that
(n*) (I)'
(b) By (a), S b(k\n, p) = I, 6(n  k\n, 1  p)
k=r k=r
k=Q
1.15.
n = 5 n = 10 n = 20 Normal Approx.
P(l
991332
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


ANSWERS: CHAPTER 5 315
n /n\
1.17. G(t) = S 7 }(pt)k(q)n~k} which equals (g + pOn, by the binomial &=o \#/
theorem.
2.1. (a) Accept null hypothesis if X > 0; i.e., accept no matter what result is obtained from the sample of 20. The probability of an error of the second kind is then 1.
(b) Reject the null hypothesis no matter what value of X occurs. Then a = 1.
(c) Reject null hypothesis if and only if X > 16. From Table 40 with c = 16, find 1  T(.70) = .762, 1  Tr(.SO) = .370, ir(.50) = .006.
(d) .126, accept null hypothesis and wage very expensive campaign.
(e) P(X > 17) = .016 and P(X > 18) = .004; therefore at least 18 must favor Smith.
2.3. (a) Null hypothesis: p = i; Alternate hypothesis: p > J. From Table 37, find
P(X > 14) = .058, P(X > 15) = .021.
Hence reject null hypothesis if and only if X} the number who revert to firstlearned method, is at least 15.
(b) Significant at the 5% level, since P(X > 15) = .021 < .05. Xot significant at the 1% level.
2.5. (a) Let a trial (observing a machine for a day) result in success (machine needs repair) or failure (machine does not need repair). Let p = probability of a success. Null hypothesis: p = .20; Alternate hypothesis: p 7* .20.
(b) Since mean number of successes is np — 4 if the null hypothesis is true, we reject null hypothesis if X, number of successes observed, is either too much larger or too much smaller than four; i.e., we reject null hypothesis if and only if either X < 4 — d or X > 4 + d, where d denotes the smallest deviation from the mean that makes X "too much" larger or "too much" smaller than the mean. The number d is determined by requiring the probability of an error of the first kind to be no larger than .10 but as close to .10 as possible. This error probability is P(X < 4  d) + P(X > 4 + d), calculated for p = ,20. If d = 3, this probability is greater than .10; if d = 4, it is less than .10 (from Table 37). Hence, reject null hypothesis if and only if X = 0 or X > 8. (This is called a twotailed test.)
(c) Probability that X deviates from its mean in either direction by at least as much as the observed value does is P(X < 1) + P(X > 7) or .069 + ,087 = .156. Not significant at .10 level.
(d) Ideal decision rule has T(P) = 0 if p = .20, ir(p) — 1 if p ^ .20.
991333
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


991334
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


INDEX
ABSOLUTE value, 191
Acceptable assignment of probabilities,
55
for repeated trials, 114,120 Algebra of sets, 2838 Alternate hypothesis, 273 A posteriori probability, 94 A priori probability, 94 Associative laws (for sets), 29 Average (see Mean) Average covariance, 247 Average fraction defective, 254, 286
BATES' formula, 93 Bayes, Thomas, 91 Bernoulli, James, 228, 253 Bernoulli process, 254
for production run, 286
by sampling with replacement, 264 Bernoulli trials, 253 Best decision rule, 291 Binomial coefficients, 150
generalized, 157
identities for, 156, 157
Pascal's triangle, 151
properties of, 151 ff
recursion formula, 156 Binomial distribution, 256
fitted to frequency distribution, 266
generating function, 270
as limit of hypergeometric, 271
maximum value, 269
mean, 265
recursion formula, 269
standard deviation, 265
standardized, 267
table, 25960
variance, 265
Binomial parameter p, inaximumlikelihood estimate, 272
Binomial probabilities, approximation for hypergeometric, 271
Binomial probability function (see Binomial distribution)
Binomial theorem, 149, 257
Birthday example, 48, 52, 59, 88
Blocking coalition, 27
Blood type, 22
Boole, George, 34
Boolean algebra, 34
Brace notation (for sets), 3
Bridge, 49, 139, 148, 210, 268
CANTOR, Georg, 1
Card guessing, 196, 219, 247
Cards (see Bridge; Card guessing;
Matching of cards; Poker) Cartesian product, 40
graph of, 41
number of elements, 43 Center of gravity, 179 Certain event, 52
probability of, 65
Changes in scale and location of origin, effect of:
on correlation coefficient, 249
on mean, 178
on variance, 191
Characteristic random variable, 171 Chebyshev, P. L., 192 Chebyshev's inequality, 194
applied to binomial, 267
generalized, 197
in proof of law of large numbers,
226
Chuckaluck, 182 Coalitions, 27
Commutative laws (for sets), 29 Complementary event, 52
probability, 67
317
991335
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


318
Complement (of a set), 17
laws for, 28
Composite function? 175 Compound experiment, 78 Compound probabilities, theorem on,
78
Conditional mean, 249 Conditional probability, 76 Conditional probability function, 206 Correlation coefficient, 241
effect of changes in scale and location of origin, 249
properties of, 242246 Correlated random variables, 241 Cost of sample inspection, 287 Counting, fundamental principle of, 9 Counting techniques, 132148
fundamental principle, 9
objects in cells, 133, 135
ordered rtuples, 132, 135
permutations, 132, 135, 139
in probability problems, 13948
rsubsets, 133, 135 Covariance, 232
average, 247 Critical set, 274, 280 Crosspartition, 101 Cumulative binomial probabilities, table, 25960
Cumulative distribution function, 171 (see aZso Distribution function)
DECISIONMAKING example, 286 Decision rule:
ideal, 276
(rc, c) type, 290
operatingcharacteristic curve, 90, 269
power function, 275
in sampling inspection, 90, 269
to test null hypothesis, 274 De Morgan's laws, 29
generalized, 36
proof by membership table, 31
verification by Venn diagram, 32 Dependent events, 102 Dependent random variables, 204 Descartes, Rene", 41 Dice, 12, 47, 57, 58, 16065, 173, 190,
195
Dictator, 27 Difference equation, 128 Disjoint sets, 20
INDEX
Dispersion, 185 Distribution function, 163
graph, 164
joint, 211
properties, 166 Distribution table, 164 Distributive laws (for sets), 29
generalized, 36
proof by membership table, 33
verification by Venn diagram, 38 Domain (of a function), 158 Dominant gene, 124 Duality principle, 34
ELEMENT (of a set), 2 Empty set, 5, 6
as impossible event, 52
probability of, 58 Equally likely outcomes, 69 Equivalence relation, 7 Error, of estimate, 251
of first kind, 274
of second kind, 274 Estimation, statistical, 181
of binomial p, 265
in hypergeometric, 271
maximumlikelihood method, 271 Euclid, 3
Eugenics example, 129 Events, characteristic random variable of, 171
dependent, 102
determined by a trial, 117, 120
glossary of terms, 52
as hypotheses, 94
independent, 102
from independent trials, 118, 120
mutually exclusive in pairs, 73
pairwise independence, 107
probability of, 58
probability of union of, 67
simple, 55
as subsets, 51 ff. Expected value, 173 Experiments, mathematical description, 46
compound, 78
repeated, 113, 120
FACTORIAL (nl), 134
logarithms (table), 141 Failure, as result of trial, 253 Fair coins, dice, 61
991336
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


INDEX
"Favorable" outcomes, 69
Finite sample space, restriction to, 48
Finite set, 2
Fitted binomial distribution, 266
Flags, numbers on, 64
Frequency distribution, 221, 234, 266
Function, definition, 158
of random variable, 175, 213
regression, 249
Functionmachine, 159, 175, 213 Fundamental principle of counting, 9
used to prove counting theorems, 135 ff.
GENEBATING function, 270 Genetics example, 123131 Greatest integer symbol, 69 Guessing of cards, 196, 219, 247
HARDYWEINBERG law, 128 Hypergeometric distribution, 27072 Hypotheses (see Testing statistical hypotheses)
IDEAL decision rule, 276
Idempotent laws (for sets), 28
Identically distributed random variables, 168
in sampling with replacement, 222 hi sampling without replacement, 234
Identity laws (for sets), 28
Impossible event, 52
Independent events, 102, 109, 111 from independent trials, 118 multiplication rule for, 102,109 pairwise, 107
Independent partitions, 107
Independent random variables, 204,
209
correlation coefficient, 242 in sampling with replacement, 222 in sampling without replacement, 234
Independent trials, 114, 120 in Bernoulli process, 253 product rule, 115, 120
Infinite set, 2, 3, 47
Inspection, sampling, 90, 269, 290
Intersection of sets, 17, 35
JOINT distribution function, 211 Joint probability function, 197 ff.
definition, 202
table, 203
319
LAPLACE definition of probability, 70
Law of large numbers, 226
Leastsquares criterion, 250
Limit theorem, 271
Linear function, 244
Linear regression function, 250
Logarithms (tables), 141
Losing coalition, 27
MARGINAL probability function, 200,. 204
Master mechanic, 287
Matching of cards, 63, 74,196,219, 247
Mathematical expectation, 173
Mathematical induction 36, 81, 216
Maximumlikelihood estimation, 271
Mechanical interpretation, of mean,,
178 of variance, 195
Mean, of binomial, 265 conditional, 249 of function of random variable, 176,
215
of hypergeometric, 271 mechanical interpretation, 178 of product of random variables, 217 of random variable, 17284 of sample mean, with replacement,,
225
without replacement, 237 of sample variance, 231 of standardized random variable, 192
Mean absolute deviation, 196
Mean cost of uncertainty, 290
Median, 172, 184
Membership table, 30 ff.
Mendelian mating, 125
Mode, 172, 184
Moment of inertia, 195
Mortality table, 88, 105
Most probable number of successes, 26^
Multinomial coefficient, 154
Multinomial theorem, 154
Multiplication rule, 102, 109
Mutation, 126
Mutually exclusive events, 52 in pairs, 73 probability of union, 67
Mutually exclusive sets, 20
(n, c) DECISION rule, 290 Normal probability curve, 267, 270 Null hypothesis, 273 ff.
991337
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


320
Null set (see Empty set) Number of elements in a set, 2, 20, 43 Number of rsubsets, 133, 135 Numbers on flags, 64
OBSERVED event, statistical significance, 282
Odds, 70
Operatingcharacteristic curve, 90, 269
Ordered ntuple, 42, 132
Ordered pair, 39
Outcomes, as elements of sample space,
46
equally likely, 69 "favorable," 69
PAIEWISE independence of events,
107
Panmixia, 125, 128 Parallelaxis theorem, 196 Parameters, of binomial, 256
of hypergeometric; 270 Partition, 91
crosspartition, 107
independent, 107 Pascal's triangle, 151 Percentile, 184
Permutation, 63, 132, 135, 139 Poker, 10, 143, 147, 148 Polya urn model, 87, 99 Population mean, 222, 225 Population, sampling from, 222, 234 Population variance, 222, 231 Possible values of random variable, 166 Power function, 275 ff. Prime number, 3
Probability, acceptable assignment to simple events, 55, 114, 120
a posteriori, 94
a priori, 94
basic definition, 58
of complementary events, 67
conditional, 76
of empty set, 58
extreme values, 66
interpretations of, 228
odds, 70
of union, 67 Probability chart, 162 Probability function, 161
binomial, 256
conditional, 206
graph, 162
INDEX
Probability function, (cont.):
hypergeometric, 270
joint, 202
marginal, 200, 204
properties, 165 Probability table, 161 Product rule for independent trials,
115, 120
Product set (see Cartesian product) Production run, 286
QUARTILES, 184
RANDOM mating, 125 Random process, 254 Random selection, 61 (see also Sampling)
Random variables, binomial, 257 ff. (see also Distribution function; Independent random variables; Probability function)
characteristic, 171
as composite function, 175
conditional mean, 249
conditional probability function, 206
correlation coefficient, 241
covariance, 232, 247
defined as function, 159
dependent, 204
determined by trials, 209
equal with probability one, 245
identically distributed, 168
mean, 172
mean absolute deviation, 196
median, 172
mode, 172
possible values, 166
regression function, 249
standard deviation, 187
standardized, 192, 241, 267
variance, 187 Range (of a function), 158 Rate of mortality, 89 Recessive gene, 124 Recursion formula, binomial coefficients, 156
binomial distribution, 269 Reflexive relation, 7, 8 Regression function, 249 Regression under stress, 284 Relation, equivalence, 7 Rencontre, problem of, 74
991338
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


INDEX
Repeated experiments, 113,120 Roster method, 2
SAMPLE mean, 223 ff.
mean of, 225, 237
probability function, 224, 237
variance of, 225, 237 Sample run, 288 Sample space, 45 ff.
for Bernoulli trials, 253, 255
as certain event, 52
for compound experiment, 80
infinite, 47
for repeated trials, 113, 120
restriction to finite, 48 Sample variance, 231 Sampling inspection plan, 90, 269, 290 Sampling theory, 181 Sampling, with replacement, 115, 205, 221 ff., 239, 264
without replacement, 123, 205,
234 ff., 270 Selection force, 126 Series competition, 261 Sets, algebra of, 28 (see also Events; Sample spaces; Subsets)
associative laws, 29
brace notation, 3
Cartesian product, 40
commutative laws, 29
complement, 17, 28
defining property, 2
De Morgan's laws, 29
disjoint, 20
distributive laws, 29
element, 2
equality, 5
finite, 2
graph, 4
idempotent laws, 28
identity laws, 28
intersection, 17
mutually exclusive, 20
number of elements, 2, 20, 43
partition, 91
roster method, 2
symmetric difference, 37
union, 17
universal, 16 Significance level, 282 Simple event, 55 Smoking habits, 105 Square root, 191
321
Standard deviation, 187
of binomial, 265
of hypergeometric, 271
of sample mean, 226 Standard score, 192 Standardized random variable, 192,241
binomial, 267
Statistical estimation (see Estimation) Statistical hypothesis, 274 (see also
Testing statistical hypotheses) Statistical significance, 282 Step function, 164, 166 Stochastic process, 254 Subset, 8 (see also Events)
number of, 11,150
rsubset, 133,135 Success, as result of trial, 253 Sums of random variables, 212 ff.
mean, 215, 216
variance, 218, 219, 233 Symmetric difference, 37 Symmetric relation, 7, 8
TABLES, common logarithms, 141
cumulative binomial probabilities, 25960
joint probabilities, 203
logarithms of factorials, 141 Testing statistical hypotheses, 272285
critical set, 274, 280
decision rule, 274 ff.
errors, 274
power function, 275
significance tests, 282 Transitive relation, 7, 8 Tree diagram, 9, 78, 83, 84, 94, 96 Trials, Bernoulli, 253 (see also Independent trials)
determine an event, 117,120
repeated, 113, 120 Type I error, 274
maximum tolerable, 279 Type II error, 274
UNCERTAINTY, cost of, 290 Uncorrelated random variables, 241 Union, of events, 67
of sets, 17, 35
of simple events, 55 Unit subset, 55 Universal set, 16
Urn problems, 64, 78, 8185, 87, 96, 101, 170, 183, 205
991339
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


322 INDEX
VALID argument, 37 Venn, John, 19
Value (of a function), 158 Venn diagram, 19
Variance, of binomial, 265 Veto power, 27
of hypergeornetric, 271
mechanical Interpretation, 195
of random variable, 187 ff. WEATHER forecasting problem, 99
sample, 231 Welldefined collection, 2
of sample mean, with replacement, Winning coalition, 27
225 World Series example, 261 without replacement, 237
of standardized variable, 192
of sums, 218, 219, 233 XRAY test, 93
991340
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


991341
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


991342
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG


991343
991
PROBAILITY AN INTRODUCTION
SAMUEL GOLDBERG

