/ Machine Learning

总结1 - 线性回归和逻辑回归

20170904

  • 修订代码 e .^ (-z)exp(-z)

  • 修订代码

    gradients = 1 / m * X' * (H - y) + lambda / m * theta;
    gradients(1) = 1 / m * X(:,1)' * (H - y);
    

    temp = theta;
    temp(1) = 0;
    gradients = 1 / m * X' * (H - y) + lambda / m * temp;
    

1. 假设函数(Hypothesis)

1.1. 线性回归(Linear Regression)

$$ \begin {split} h_{\theta }\left( x\right) &=\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\ldots +\theta _{n}x_{n}\\ &=\sum _{j=0}^{n}\theta _{j}x_{j} \qquad \left( x_{0}=1\right) \\ &=\theta ^{T}x \qquad \left(\theta =\left[ \begin{matrix} \theta _{0}\\ \theta _{1}\\ :\\ \theta _{n}\end{matrix} \right] \in \mathbb{R} ^{n+1},x=\left[ \begin{matrix} x_{0}\\ x_{1}\\ \vdots \\ x_{n}\end{matrix} \right] \in \mathbb{R} ^{n+1},x_{0}=1 \right) \\ \end {split} $$
function [y_test] = hypothesisFunction(X_test, theta)

y_test = X_test * theta;
end

1.2. 逻辑回归(Logistic Regression)

需要保持 $0\leq h_{\theta }\left( x\right) \leq 1$ ,所以引入公式 $g\left( z\right) =\dfrac {1} {1+e^{-z}}$

$$ \begin {split} h_{\theta }\left( x\right) &=g\left(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\ldots +\theta _{n}x_{n}\right)\\ &=g\left(\sum _{j=0}^{n}\theta _{j}x_{j}\right) \qquad \left( x_{0}=1\right) \\ &=g\left(\theta ^{T}x\right) \qquad \left(\theta =\left[ \begin{matrix} \theta _{0}\\ \theta _{1}\\ :\\ \theta _{n}\end{matrix} \right] \in \mathbb{R} ^{n+1},X=\left[ \begin{matrix} x_{0}\\ x_{1}\\ \vdots \\ x_{n}\end{matrix} \right] \in \mathbb{R} ^{n+1},x_{0}=1 \right) \\ \Rightarrow h_{\theta }\left( x\right) &=\dfrac {1} {1+e^{-\theta ^{T}x}} \\ \end {split} $$
function [y_test] = hypothesisFunction(X_test, theta)

y_test = 1 ./ (1 + exp(-X * theta)); 
end
function p = predict(theta, X)
% 用于效验准确度
p = round(1 ./ (1 + exp(-X * theta)));
end

2. 成本函数(Cost Function)

2.1. 线性回归 成本函数

$$ \begin {split} J\left( \theta \right) &=\dfrac {1} {2m}\sum _{i=1}^{m}\left( h_{\theta} \left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right) ^{2} \end {split} $$
function J = costFunction(X, y, theta)

m = length(y);
J = sum((X * theta - y) .^ 2) / (2 * m);
end

2.2. 逻辑回归 成本函数

$$ \begin {split} J\left( \theta \right) =-\dfrac {1} {m}\sum _{i=1}^{m}\left[ y^{\left( i\right)}\log h_{\theta }\left( x^{\left( i\right)}\right)+\left( 1-y^{\left( i\right)}\right) \log \left( 1-h_{\theta }\left( x^{\left( i\right)}\right) \right)\right]\\ \end {split} $$
function J = costFunction(X, y, theta)

m = length(y);
H = 1 ./ (1 + exp(-X * theta));
J = -1/m * sum(y.*log(H) + (1-y).*log((1-H)));
end

3. 正则化成本函数(Cost Function Regularized)

3.1. 线性回归 正则化成本函数

$$ \begin {split} J\left( \theta \right) &=\dfrac {1} {2m}\left[\sum _{i=1}^{m}\left( h_{\theta} \left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right) ^{2}+\lambda \sum _{j=1}^{n}\theta _{j}^{2}\right]\\ \end {split} $$
function J= linearRegCostFunction(X, y, theta, lambda)

m = length(y);
J = sum((X * theta - y) .^ 2) / (2 * m) + lambda / (2 * m) * sum(theta(2:end,:) .^ 2);

3.2. 逻辑回归 正则化成本函数

$$ \begin {split} J\left( \theta \right) =-\dfrac {1} {m}\sum _{i=1}^{m}\left[ y^{\left( i\right)}\log h_{\theta }\left( x^{\left( i\right)}\right)+\left( 1-y^{\left( i\right)}\right) \log \left( 1-h_{\theta }\left( x^{\left( i\right)}\right) \right)\right]+\dfrac {\lambda } {2m}\sum _{j=1}^{n}\theta _{j}^{2}\\ \end {split} $$
function J = costFunctionRegularized(X, y, theta, lambda)

m = length(y);
H = 1 ./ (1 + exp(-X * theta));
J = -1 / m * sum(y .* log(H) + (1 - y) .* log((1 - H))) + lambda / (2 * m) * sum(theta(2:end,:) .^ 2);
end

4. 梯度下降(Gradient Descent)

复杂度为$O\left( kn^{2}\right)$

$$ \begin {split} Repeat\{ \\ \theta _{j}&:=\theta _{j}-\alpha \dfrac {\partial } {\partial \theta _{j}}J\left( \theta \right) \qquad \left( j=0,1,\cdots ;n\right)\\ &:=\theta _{j}-\alpha \dfrac {1} {m}\sum _{i=1}^{m}\left(\left( h_{\theta }\left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right)x_{j}^{\left( i\right)}\right)\\ \} \end {split} $$
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iterations)
% 如使用fminunc函数则无需此方法。
m = length(y);
J_history = zeros(num_iterations, 1);

for iter = 1:num_iterations
	theta = theta - alpha / m * X' * (X * theta - y);
	J_history(iter) = computeCostMulti(X, y, theta);
end
end

4.1. 线性回归 梯度下降 偏导数算法

function gradients = gradientsFunction(X, y, theta)
% 使用fminunc函数时,将本函数代码添加到Cost Function对应函数中。
m = length(y);
H = X * theta;
gradients = 1 / m * X' * (H - y);
end

4.2. 逻辑回归 梯度下降 偏导数算法

function gradients = gradientsFunction(X, y, theta)
% 使用fminunc函数时,将本函数代码添加到Cost Function对应函数中。
m = length(y);
H = 1 ./ (1 + exp(-X * theta));
gradients = 1 / m * X' * (H - y);
end

5. 正则化梯度下降(Gradient Descent Regularized)

$$ \begin {split} Repeat\{ \\ \theta _{0}&:=\theta _{0}-\alpha \dfrac {\partial } {\partial \theta _{0}}J\left( \theta \right)\\ &:=\theta _{0}-\alpha \dfrac {1} {m}\sum _{i=1}^{m}\left(\left( h_{\theta }\left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right)x_{0}^{\left( i\right)}\right)\\ \theta _{j}&:=\theta _{j}-\alpha \dfrac {\partial } {\partial \theta _{j}}J\left( \theta \right) \qquad \left( j=1,\cdots ,n\right)\\ &:=\theta _{j}-\alpha \left[ \left( \dfrac {1} {m}\sum _{i=1}^{m}\left(\left( h_{\theta }\left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right)x_{j}^{\left( i\right)}\right)\right) +\dfrac {\lambda } {m}\theta _{j}\right]\\ &:=\theta _{j}\left( 1-\alpha \dfrac {\lambda } {m}\right)-\alpha \dfrac {1} {m}\sum _{i=1}^{m}\left(\left( h_{\theta }\left( x^{\left( i\right)}\right) -y^{\left( i\right)}\right)x_{j}^{\left( i\right)}\right)\\ \} \end {split} $$

这里将$\theta _{0}$ 单独处理。

另:$\because 1-\alpha \dfrac {\lambda } {m} < 1\therefore \theta _{j}\left( 1-\alpha \dfrac {\lambda } {m}\right) < \theta _{j}$

5.1. 线性回归 正则化梯度下降 偏导数算法

function gradients = gradientsFunctionRegularized(X, y, theta, lambda)
% 使用fminunc函数时,将本函数代码添加到Cost Function对应函数中。
m = length(y);
H = X * theta;
temp = theta;
temp(1) = 0;
gradients = 1 / m * X' * (H - y) + lambda / m * temp;
end

5.2. 逻辑回归 正则化梯度下降 偏导数算法

function gradients = gradientsFunctionRegularized(X, y, theta, lambda)
% 使用fminunc函数时,将本函数代码添加到Cost Function对应函数中。
m = length(y);
H = 1 ./ (1 + exp(-X * theta));
temp = theta;
temp(1) = 0;
gradients = 1 / m * X' * (H - y) + lambda / m * temp;
end

6. 特征缩放(Feature Scaling)

将$x_{i}$ 的值固定在一个范围内,这样梯度下降的算法将更加具有运算效率。

这个范围通常定为$-1\leq x_{i}\leq 1$ ,$-3\leq x_{i}\leq 3$ , $-\dfrac {1} {3}\leq x_{i}\leq \dfrac {1} {3}$ 。

$$ \begin {split} x_{i}:=\dfrac {x_{i}-\mu _{i}} {s_{i}}\\ \end {split} $$
$\mu _{i}$ 为$x_{i}$ 的平均值。

$s_{i}$ 为$x_{i}$ 的范围(max-min),代码中常用方差$\sigma$ 替代。

function [X_normalize, mu, sigma] = featureNormalize(X)
% 不要包含x0列
n = size(X, 2);
mu = mean(X);
sigma = std(X);
X_normalize = (X - mu) * (ones(n, 1) * (1 ./ sigma) .* eye(n));
end

7. 正规方程(Normal Equation)

复杂度为$O\left( n^{3}\right)$

条件:

  • n <= 10000。计算机性能越强这个数字越大。
  • m>n。否则无逆矩阵。
  • 无冗余。
$$ \begin {split} \theta =\left( X^{T}X\right) ^{-1}X^{T}\overrightarrow {y} \end {split} $$
function [theta] = normalEquation(X, y)

theta = pinv(X' * X) * X' * y; % 或使用inv
end

8. 正则化 正规方程(Regularized Normal Equation)

条件:

  • n <= 10000。计算机性能越强这个数字越大。
  • 无冗余。
$$ \begin {split} \theta &=\left( X^{T}X+\lambda \cdot L\right) ^{-1}X^{T}\overrightarrow {y}\\ L&=\left[ \begin{matrix} 0& & & & \\ & 1& & & \\ & & 1& & \\ & & & \cdots & \\ & & & & 1\end{matrix} \right] \qquad \left( L\in \mathbb{R} ^{\left( n+1\right) \times \left( n+1\right) }\right)\\ \end {split} $$

9. 其他

9.1. 为X添加x0列

function [X] = addx0(X)

X = [ones(m, 1) X];
end

9.2. 对X_test特征缩放

function [X_test_normalize] = featureNormalizeForTest(X_test, mu, sigma)
% 不要包含x0列
n = size(X, 2);
X_test_normalize = (X_test - mu) * (ones(n, 1) * (1 ./ sigma) .* eye(n));
end

9.3. 使用fminunc函数

成本函数代码:

function [J, gradients] = costFunction(X, y, theta, lambda)
% 逻辑回归 正则化 成本函数和梯度下降算法
m = length(y);
H = 1 ./ (1 + exp(-X * theta));
J = -1 / m * sum(y .* log(H) + (1 - y) .* log((1 - H))) + lambda / (2 * m) * sum(theta(2:end,:) .^ 2);
gradients = 1 / m * X' * (H - y) + lambda / m * theta;
gradients(1) = 1 / m * X(:,1)' * (H - y);
end

调用代码:

initial_theta = zeros(size(X, 2), 1);
lambda = 1;
options = optimset('GradObj', 'on', 'MaxIter', 100); % 循环100次

[theta, J, exit_flag] = fminunc(@(t)(costFunction(X, y, t, lambda)), initial_theta, options);