Lecture 15
2024-06-06
critics
and audience
movie_scores
A regression model is a function that describes the relationship between the outcome, \(Y\), and the predictor, \(X\).
\[\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]
\[ \begin{aligned} Y &= \color{#325b74}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{#325b74}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{#325b74}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned} \]
Use simple linear regression to model the relationship between a quantitative outcome (\(Y\)) and a single quantitative predictor (\(X\)): \[\Large{Y = \beta_0 + \beta_1 X + \epsilon}\]
\[\Large{\hat{Y} = b_0 + b_1 X}\]
\[\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}\]
\[e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i\]
\[e^2_1 + e^2_2 + \dots + e^2_n\]
The regression line goes through the center of mass point (the coordinates corresponding to average \(X\) and average \(Y\)): \(b_0 = \bar{Y} - b_1~\bar{X}\)
Slope has the same sign as the correlation coefficient: \(b_1 = r \frac{s_Y}{s_X}\)
Sum of the residuals is zero: \(\sum_{i = 1}^n \epsilon_i = 0\)
Residuals and \(X\) values are uncorrelated
The slope of the model for predicting audience score from critics score is 0.519. Which of the following is the best interpretation of this value?
\[\widehat{\text{audience}} = 32.3 + 0.519 \times \text{critics}\]
✅ The intercept is meaningful in context of the data if
🛑 Otherwise, it might not be meaningful!
ae-11-penguins-modeling
ae
.ae-11-penguins-modeling.qmd
.