import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# Example data
= np.random.normal(loc=5.8, scale=0.5, size=100)
data
# Calculate mean and standard deviation of the sample
= np.mean(data)
mean_value = np.std(data)
std_dev
# Calculate the standard error of the mean (SEM)
= std_dev / np.sqrt(len(data))
sem
# Calculate the margin of error for a 95% confidence interval
= norm.ppf(0.975) * sem
margin_of_error
# Calculate the confidence interval
= (mean_value - margin_of_error, mean_value + margin_of_error)
confidence_interval
# Plotting the data
=20, alpha=0.7, color='#39729E', edgecolor='black', label='Sample Data')
plt.hist(data, bins
# Plotting the confidence interval
=confidence_interval[0], color='grey', linestyle='--', label='95% Confidence Interval')
plt.axvline(x=confidence_interval[1], color='grey', linestyle='--')
plt.axvline(x
# Adding labels and title
'Height')
plt.xlabel('Frequency')
plt.ylabel('95% Confidence Interval for Adult Height')
plt.title(
# Adding legend
plt.legend()
# Show the plot
plt.show()
Imagine you want to estimate the average height of all the adults. It’s very impractical to measure average height of all adults. So you take a sample of 100 adults and measure their average height to be 5’7 inches. Now you know that this is just a estimate and if you took another sample, you might get different average. So how can you express the uncertainty around your estimate?
Confidence Intervals
A confidence interval is a range of values, derived from the sample data, that is likely to contain the true value of an unknown population parameter. For example you can say you are 95% confident that the average height of all adults is between 5’5 and 5’9 inches. This is 95% confidence interval.
How to calculate
\[ \bar{x} \pm z\cdot \frac{s}{\sqrt{n}} \]
where \(\bar{x}\) = sample mean, \(s\) = sample standard deviation, \(z\) = confidence level value and \(n\) =sample size
Interpretation
Saying “I am 95% confident that the average height is between 5’5 and 5’9 inches” doesn’t mean that there is 95% chance that the true average falls in this range. Instead it means that if you were to take many samples and compute a 95% confidence interval for each one, about 95% of these intervals would contain the true average height.
Choosing CI
In order to choose confidence interval you should look into consequences of being wrong, if consequences are severe you might choose 99% CI else 95% is very common.
Higher CI level results in wider intervals and vice versa. So there is always a trade off between precision and confidence.
Limitations
CI assumes that your data is normally distributed but which may not always be the case.