数据科学 - 线性回归

我们缺少一个影响卡路里燃烧的重要变量，那就是训练课程的持续时间。

持续时间结合平均脉搏将更准确地解释卡路里燃烧。

线性回归

当您尝试查找变量之间的关系时，会使用回归这个术语。

在机器学习和统计建模中，这种关系用于预测事件的结果。

在本模块中，我们将涵盖以下问题

我们能否得出结论，平均脉搏和持续时间与卡路里燃烧有关？
我们能否使用平均脉搏和持续时间来预测卡路里燃烧？

最小二乘法

线性回归使用最小二乘法。

其概念是通过所有绘制的数据点绘制一条线。这条线的位置使之与所有数据点的距离最小化。

该距离称为“残差”或“误差”。

红色虚线表示数据点到绘制的数学函数的距离。

使用一个解释变量的线性回归

在本例中，我们将尝试使用线性回归通过平均脉搏来预测卡路里燃烧。

示例

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

full_health_data = pd.read_csv("data.csv", header=0, sep=",")

x = full_health_data["Average_Pulse"]
y = full_health_data ["Calorie_Burnage"]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x)
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, slope * x + intercept)
plt.ylim(ymin=0, ymax=2000)
plt.xlim(xmin=0, xmax=200)
plt.xlabel("Average_Pulse")
plt.ylabel ("Calorie_Burnage")
plt.show()

自己动手试试 »

示例说明

导入所需的模块：Pandas、matplotlib 和 Scipy
将 Average_Pulse 作为 x 分离出来。将 Calorie_burnage 作为 y 分离出来
获取重要的关键值：slope, intercept, r, p, std_err = stats.linregress(x, y)
创建一个使用斜率和截距值返回新值的函数。此新值表示对应 x 值将在 y 轴上的哪个位置
将 x 数组的每个值都通过函数运行。这将产生一个新的数组，其中包含 y 轴的新值：mymodel = list(map(myfunc, x))
绘制原始散点图：plt.scatter(x, y)
绘制线性回归线：plt.plot(x, mymodel)
定义轴的最大值和最小值
标记轴：“Average_Pulse” 和“Calorie_Burnage”

输出

Linear Regression - One variable - Least Square

您认为这条线能够准确预测卡路里燃烧吗？

我们将证明，仅使用变量 Average_Pulse 无法对卡路里燃烧进行精确预测。

★ +1

W3schools Pathfinder

Track your progress - it's free!

数据科学

DS 数学

DS 统计学

DS 高级

DS 证书

数据科学 - 线性回归

线性回归

最小二乘法

使用一个解释变量的线性回归

示例

示例说明

输出

颜色选择器

Contact Sales

Report Error

Top Tutorials

Top References

Top Examples

Get Certified