Chapter 20: Learning Probabilistic Models — Exercises
Chapter 20: Learning Probabilistic Models¶
Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.), end-of-chapter Exercise prompts — full text is included on this site (same material as the exercises/ch20/ sources in the aima monorepo).
Exercise 20.1¶
The data used for Figure bayes-candy-figure on page bayes-candy-figure can be viewed as being generated by . For each of the other four hypotheses, generate a data set of length 100 and plot the corresponding graphs for and . Comment on your results.
Exercise 20.2¶
Repeat Exercise bayes
Exercise 20.3¶
Suppose that Ann’s utilities for cherry and
lime candies are and , whereas Bob’s utilities are
and . (But once Ann has unwrapped a piece of candy, Bob won’t
buy it.) Presumably, if Bob likes lime candies much more than Ann, it
would be wise for Ann to sell her bag of candies once she is
sufficiently sure of its lime content. On the other hand, if Ann unwraps
too many candies in the process, the bag will be worth less. Discuss the
problem of determining the optimal point at which to sell the bag.
Determine the expected utility of the optimal procedure, given the prior
distribution from Section statistical
Exercise 20.4¶
Two statisticians go to the doctor and are both given the same
prognosis: A 40% chance that the problem is the deadly disease , and
a 60% chance of the fatal disease . Fortunately, there are anti-
and anti- drugs that are inexpensive, 100% effective, and free of
side-effects. The statisticians have the choice of taking one drug,
both, or neither. What will the first statistician (an avid Bayesian)
do? How about the second statistician, who always uses the maximum
likelihood hypothesis?
The doctor does some research and discovers that disease actually comes in two versions, dextro- and levo-, which are equally likely and equally treatable by the anti- drug. Now that there are three hypotheses, what will the two statisticians do?
Exercise 20.5¶
Explain how to apply the boosting method of
Chapter concept
Exercise 20.6¶
Consider data points ,
where the s are generated from the s according to the linear
Gaussian model in
Equation (linear
Exercise 20.7¶
Consider the noisy-OR model for fever described
in Section canonical
Exercise 20.8¶
This exercise investigates properties of
the Beta distribution defined in
Equation (beta-equation).
By integrating over the range , show that the normalization constant for the distribution is given by where is the Gamma function, defined by and . (For integer , .)
Show that the mean is .
Find the mode(s) (the most likely value(s) of ).
Describe the distribution for very small . What happens as such a distribution is updated?
Exercise 20.9¶
Consider an arbitrary Bayesian network, a complete data set for that network, and the likelihood for the data set according to the network. Give a simple proof that the likelihood of the data cannot decrease if we add a new link to the network and recompute the maximum-likelihood parameter values.
Exercise 20.10¶
Consider a single Boolean random variable (the “classification”).
Let the prior probability be . Let’s try to
find , given a training set with
independent samples of . Furthermore, suppose of the are
positive and of the are negative.
Write down an expression for the likelihood of (i.e., the probability of seeing this particular sequence of examples, given a fixed value of ) in terms of , , and .
By differentiating the log likelihood , find the value of that maximizes the likelihood.
Now suppose we add in Boolean random variables (the “attributes”) that describe each sample, and suppose we assume that the attributes are conditionally independent of each other given the goal . Draw the Bayes net corresponding to this assumption.
Write down the likelihood for the data including the attributes, using the following additional notation:
is .
is .
is the count of samples for which and .
is the count of samples for which and .
is the count of samples for which and .
is the count of samples for which and .
[Hint: consider first the probability of seeing a single example with specified values for and .]
By differentiating the log likelihood , find the values of and (in terms of the various counts) that maximize the likelihood and say in words what these values represent.
Let , and consider a data set with 4 all four possible examples of thexor function. Compute the maximum likelihood estimates of , , , , and .
Given these estimates of , , , , and , what are the posterior probabilities for each example?
Exercise 20.11¶
Consider the application of EM to learn the parameters for the network
in Figure mixture
Explain why the EM algorithm would not work if there were just two attributes in the model rather than three.
Show the calculations for the first iteration of EM starting from Equation (candy-64-equation).
What happens if we start with all the parameters set to the same value ? (Hint: you may find it helpful to investigate this empirically before deriving the general result.)
Write out an expression for the log likelihood of the tabulated candy data on page candy-counts-page in terms of the parameters, calculate the partial derivatives with respect to each parameter, and investigate the nature of the fixed point reached in part (c).