• [学习笔记]多元线性回归的梯度下降



    通过房子的大小,卧室数量,层数,房龄预测价格

    Size (sqft)Number of BedroomsNumber of floorsAge of HomePrice (1000s dollars)
    21045145460
    14163240232
    8522135178

    预测(带入)

    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    def predict(x, w, b): 
        p = np.dot(x, w) + b     
        return p    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    代价函数

    J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2
    f w , b ( x ( i ) ) = w ⋅ x ( i ) + b f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b fw,b(x(i))=wx(i)+b

    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
    
    Returns:
      cost (scalar): cost
    """
    
    def compute_cost(X, y, w, b):
        m = X.shape[0]
        cost = 0.0
        for i in range(m):
            f_wb_i = np.dot(X[i], w) + b
            cost += (f_wb_i - y[i])**2
        cost /= 2*m
        return cost
    
    
    X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
    y_train = np.array([460, 232, 178])
    b_init = 785.1811367994083
    w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
    
    cost = compute_cost(X_train, y_train, w_init, b_init)
    print(f'Cost at optimal w : {cost}')
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    Cost at optimal w : 1.5578904330213735e-12

    多变量梯度下降

    重复, 直到收敛
    loop  until convergence:    {    w j = w j − α ∂ J ( w , b ) ∂ w j    for j = 0..n-1 b    = b − α ∂ J ( w , b ) ∂ b }

    loop until convergence:{wj=wjαJ(w,b)wjfor j = 0..n-1b  =bαJ(w,b)b}" role="presentation" style="position: relative;">loop until convergence:{wj=wjαJ(w,b)wjfor j = 0..n-1b  =bαJ(w,b)b}
    loop} until convergence:{wj=wjαwjJ(w,b)b  =bαbJ(w,b)for j = 0..n-1

    n: 特征数量, m: 训练集的个数
    ∂ J ( w , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) )

    J(w,b)wj=1mi=0m1(fw,b(x(i))y(i))xj(i)J(w,b)b=1mi=0m1(fw,b(x(i))y(i))" role="presentation" style="position: relative;">J(w,b)wj=1mi=0m1(fw,b(x(i))y(i))xj(i)J(w,b)b=1mi=0m1(fw,b(x(i))y(i))
    wjJ(w,b)bJ(w,b)=m1i=0m1(fw,b(x(i))y(i))xj(i)=m1i=0m1(fw,b(x(i))y(i))

    计算梯度

    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
    
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    def compute_gradient(X, y, w, b):
        m, n = X.shape          #(m:number of examples, n:number of features)
        dj_dw = np.zeros((n,))
        dj_db = 0.
        
        for i in range(m):
            dif = np.dot(X[i], w) + b - y[i]
            for j in range(n):
                dj_dw[j] = dj_dw[j] + dif * X[i, j]
            dj_db = dj_db + dif
        dj_dw /= m
        dj_db /= m
            
        return dj_db, dj_dw
    			
    			
    tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
    print(f'dj_db at initial w,b: {tmp_dj_db}')
    print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    dj_db at initial w,b: -1.6739251122999121e-06
    dj_dw at initial w,b:
    [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
    image

    梯度下降

    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
    """
    def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
        
        # An array to store cost J and w's at each iteration primarily for graphing later
        J_history = []
        w = copy.deepcopy(w_in)  #avoid modifying global w within function
        b = b_in
        
        for i in range(num_iters):
    
            # Calculate the gradient and update the parameters
            dj_db,dj_dw = gradient_function(X, y, w, b)
    
            # Update Parameters using w, b, alpha and gradient
            w = w - alpha * dj_dw
            b = b - alpha * dj_db
          
            # Save cost J at each iteration
            if i<100000:      # prevent resource exhaustion 
                J_history.append( cost_function(X, y, w, b))
    
            # Print cost every at intervals 10 times or as many iterations if < 10
            if i% math.ceil(num_iters / 10) == 0:
                print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
            
        return w, b, J_history
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    测试

    # initialize parameters
    initial_w = np.zeros_like(w_init)
    initial_b = 0.
    # some gradient descent settings
    iterations = 1000
    alpha = 5.0e-7
    # run gradient descent 
    w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                        compute_cost, compute_gradient, 
                                                        alpha, iterations)
    print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
    m,_ = X_train.shape
    for i in range(m):
        print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    image

    Iteration 0: Cost 2529.46
    Iteration 100: Cost 695.99
    Iteration 200: Cost 694.92
    Iteration 300: Cost 693.86
    Iteration 400: Cost 692.81
    Iteration 500: Cost 691.77
    Iteration 600: Cost 690.73
    Iteration 700: Cost 689.71
    Iteration 800: Cost 688.70
    Iteration 900: Cost 687.69
    b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
    prediction: 426.19, target value: 460
    prediction: 286.17, target value: 232
    prediction: 171.47, target value: 178

    绘制cost-iteration图像

    # plot cost versus iteration  
    fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
    ax1.plot(J_hist)
    ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
    ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
    ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost') 
    ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step') 
    plt.show()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    image

    🤙欢迎关注泥烟的客栈(常在这里更新)

  • 相关阅读:
    tomcat启动jvm内存设置
    基于SSM开发实现校园疫情防控管理系统
    DC/DC开关电源学习笔记(八)DC/DC功率变换的研究内容
    【MySql】7- 实践篇(五)
    大模型解决方案:具体业务场景下的智能表单填充(附代码)
    大数据必学Java基础(五):第一段程序
    Redis的高可用实现方案:哨兵与集群
    BFT 最前线 | 张一鸣成立个人基金;马斯克:AI是双刃剑;阿里首席安全科学家离职;卡内基梅隆究团队:解决农业虫卵问题的机器人
    怎么防止电脑上的重要视频被录屏
    怒刷LeetCode的第20天(Java版)
  • 原文地址:https://blog.csdn.net/qq_39391544/article/details/127569072