This function \(h\) is called a hypothesis.
Accuracy of our hypothesis function \(h\) is measured using a cost/loss function. One particular choice of the loss function for linear regression is called “Squared error function” (or “Mean squared error”).
\[J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}_{i}- y_{i} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2\]
The mean is halved (1/2) to simplify the computation of the gradient descent, as the derivative term of the square function will cancel out the (1/2) term.
Why squared loss (but not absolute loss)?
The absolute value is not convenient, because it doesn’t have a continuous derivative, which makes the function not smooth. Functions that are not smooth create unnecessary difficulties when employing linear algebra to find closed form solutions to optimization problems. Closed form solutions to finding an optimum of a function are simple algebraic expressions and are often preferable to using complex numerical optimization methods, such as gradient descent (used, among others, to train neural networks). –The 100-page ML Book
To have an inverse, the matrix must be “square” (\(N_{row} = N_{col}\)).
Inverse of a value. Inverse of a matrix. * When we multiply a number by its reciprocal we get 1. * When we multiply a matrix by its inverse we get the Identity Matrix. \(A \times A^{-1}=A^{-1} \times A = I\)
for a \(2 \times 2\) matrix, the inverse is:
According to the invertible matrix theorem, The inverse might not exsit, if the determinant is zero, such a matrix is called “Singular”.
#Packages for data science: Statistical analysis for high dimensional data
install.packages('e1071')
# Multiclass Logistic Regression
install.packages("glmnet")
install.packages(c("lar","RandomForest","rpart","SIS","tilting"))
#Packages for data science: survival analysis case study
install.packages(c("survival","mstate","p3state.msm","msm"))
Slides and Resources are here.
Slides and Resouces are here
\[HR=e^{\beta_1(75-70) + \beta_2(Sex-Sex) +\beta_3(pstat-pstat) + \beta_4(mspike-mspike) }\]
Given \(\beta_1 = 0.05\), \(HR=exp(5*0.06)=1.35\). The event relative risk will increase 35% for 5 units (year) controlling for other factors.
Even though \(h_0(t)\) is unspecified, we estimate the \(\beta\)s. We can estimate \(S(t,x)\) using a minimum of assumptions. There are two techniques to adjust the partical likelihood for tied lifetimes: Brceslow and Efron.
The code can be found in the Google Drive folder.
library(survival)
mgus.data<-read.csv("C:/Users/ytian/Google Drive/PhD/Data Science Bootcamp June 10-21 2019/D3 Code/mgus.data.csv",header = TRUE)
Slides and RMarkdown Code can be found here.
install.packages(c("here","olsrr","modelr","broom","caret","neuralnet","DescTools","PredictABEL"))
library(magrittr)
library(here)
library(olsrr)
library(modelr)
library(neuralnet)
library(dplyr)
library(PredictABEL)
library(ggplot2)
library(caret)
library(ggplot2)
library(ROCR)
library(broom)
library(DescTools)
diabetes<-read.csv("C:/Users/ytian/Google Drive/PhD/Data Science Bootcamp June 10-21 2019/C3 Case Study_Diabetes/diabetes_data_full.csv")
diabetes_data <- read.csv("C:/Users/ytian/Google Drive/PhD/Data Science Bootcamp June 10-21 2019/C3 Case Study_Diabetes/diabetes_analytic_data.csv")