Question sets

Question 1

Consider flipping a coin for which the probability of heads is p. Let X_i be the outcome of the i-th single toss, where X_i \in \{0, 1\}. Thus, p = P(X_i=1) = \mathbb{E}[X_i].

The sample proportion (fraction of heads) after n tosses is defined as \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i,
The sample variance is \mathbb{V}(\bar{X}_n) = \frac{p(1-p)}{n}.

1.1

State the type of convergence and the value to which \bar{X}_n converges as n \to \infty.

1.2

Suppose the coin is fair (p = 0.5). Use Chebyshev’s Inequality to find the required sample size n such that P(0.475 \leq \bar{X}_n \leq 0.525) \geq 0.95.

Question 2: Sample Size Estimation

We want to survey what percentage (p\%) of people in Hong Kong like coriander (a dichotomous outcome). The acceptable margin of error is 2.5\% (\alpha=0.05), meaning the total width of the 95\% confidence interval should be 5\%.

2.1

What is the probability distribution of a single survey outcome?

2.2

What is the approximate distribution of the sample proportion \hat{p}?

2.3

Given the critical value Z_{1-\alpha/2} = 1.96, what is the formula for the confidence interval of the estimation?

2.4

What condition yields the maximum possible width for this confidence interval?

2.5

In the worst-case scenario (maximum variance), what is the necessary sample size to ensure the confidence interval has a total width of no more than 5\%?

2.6

Please briefly explain the marginal effect of increasing the sample size by one unit on the improvement of estimation accuracy.

2.7

Comparing the sample size results from Question 4 and Question 5, why is there such a large difference (8000 vs 1537)?

Question 3

Assume that our prior belief about the mean human body temperature (\mu) is normally distributed: \mu \sim N(36.5, 6.25) in degrees Celsius. The known population variance of body temperature is also 6.25.

Older adults usually possess lower body temperatures due to factors such as slower metabolism or reduced muscle mass. We conduct a brief survey of 100 older adults to find a better estimate of their average body temperature. The sample mean from this survey is 36^\circ\text{C}. Assuming the body temperature of older adults shares the same known variance (\sigma^2 = 6.25), what is the posterior estimate for the mean body temperature of older adults?

Question 4

Please show that the normal distribution is the conjugate prior for the mean of a normal distribution when the variance is known. Given IID observations x_1,\dots,x_n \sim N(\mu,\sigma^2), where \sigma^2 is known but \mu is unknown. Let the prior distribution be \mu \sim N(\mu_0,\sigma^2_0).

Calculation tips: You can define the sample mean as \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i, which implies \sum_{i=1}^n x_i =n\bar{x}.

Question 5

Please show that the Gamma distribution is the conjugate prior for the precision of a normal distribution when the mean is known. Given IID observations x_1,\dots,x_n \sim N(\mu,\sigma^2), where \mu is known but the variance \sigma^2 is unknown. Let the prior distribution on the precision \lambda = \frac{1}{\sigma^2} be \lambda \sim \text{Gamma}(\alpha_0,\beta_0). Please derive the posterior distribution of \lambda.

Question 6

6.1

Let X be a non-negative continuous random variable and suppose \mathbb{E}(X) exists. For any t>0, please show that Markov’s inequality holds:

P(X \geq t) \leq \frac{\mathbb{E}(X)}{t}

6.2

Let \mu=\mathbb{E}(X) and \sigma^2=\mathbb{V}(X). Please show that Chebyshev’s inequality holds for any t>0: P(|X-\mu| \geq t) \leq \frac{\sigma^2}{t^2}

6.3

Let X_1,\dots,X_n be IID random variables where \mu=\mathbb{E}(X_1) and \sigma^2=\mathbb{V}(X_1). Let the sample mean be \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i.

Please show the Weak Law of Large Numbers (WLLN): > If X_1,\dots,X_n are IID with finite variance, then \bar{X}_n \xrightarrow{P} \mu