Simple Linear Regression
#Math
Topics
$\displaystyle \hat{Y}=\hat{\beta}{1}X+\hat{\beta}{0}+\varepsilon$
- The hats are considered estimators, or values we have to calculate to predict $\displaystyle Y$
- $\displaystyle \hat{\beta}{1}$ and $\displaystyle \hat{\beta}{0}$ are regression coefficients
- $\displaystyle \varepsilon$ is random noise
$\displaystyle L({\beta}{0},{\beta}{1})=\frac{1}{n}\sum_{i = 1}^{n}[y_{i}-({\beta}{1}X+{\beta}{0})^{2}]=J(\boldsymbol{\theta})=\left\lVert \boldsymbol{y}-\boldsymbol{X\theta}\right\rVert^{2}_{2}$
- The loss function for simple linear regression
- Minimizing this by varying $\displaystyle {\beta}{0}$/$\displaystyle {\beta}{1}$ or $\displaystyle \boldsymbol{\theta}$ gives the linear model that best fits with the data $\displaystyle \boldsymbol{y}$
- $\displaystyle \boldsymbol{y}$ is the target vector
- $\displaystyle \boldsymbol{X}$ is the design matrix
- $\displaystyle \boldsymbol{\theta}$ is the Parameter Matrix
$\displaystyle \nabla J(\boldsymbol{\theta})=2(\mathbf{X^{^{\top}}X}\boldsymbol{\theta}-\mathbf{X^{^{\top}}y})$
- Gradient of loss function
$\displaystyle \hat{\beta}{1}= w=\frac{\sum{i}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sum_{i}(x_{i}-\bar{x})^{2}}$
- Coefficient of regression that minimizes $\displaystyle L$
- Same as $\displaystyle \frac{\text{cov}(X,Y)}{\text{var}(X)}$
$\displaystyle \hat{\beta}{0}=b=\bar{y}-\hat{\beta}{1}\bar{x}$
- Coefficient of regression that minimizes $\displaystyle L$
$\displaystyle \hat{\beta}=\theta=\underset{\beta}{\text{argmin}}\text{,MSE}(\beta)=(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{Y}$
- If $\displaystyle \mathbf{X}^{^{\top}}\mathbf{X}$ is not invertible, this happens both/either because
- $\displaystyle N<D$, or there are fewer data points than features
- $\displaystyle \mathbf{X}$ is not linearly independent, in which case
$\displaystyle SE(\hat{\beta}{0})=\sigma\sqrt{ \frac{1}{n}+\frac{\bar{x}^{2}}{\sum{i}(x_{i}-\bar{x})^{2}} }$
- $\displaystyle \sigma$ is the variance in $\displaystyle \hat{y}$
- $\displaystyle \bar{x}$ is the average value of our sample
$\displaystyle SE(\hat{\beta}{1})=\frac{\sigma}{\sqrt{ \sum{i}(x_{i}-\bar{x})^{2} }}$
$\displaystyle \sigma\approx \sqrt{ \sum_{i} \frac{(\hat{f}(x)-y_{i})^{2}}{n-2}}$
- Used for when the noise $\displaystyle \varepsilon$ is unknown