python-3.x - 数据是否欠拟合？

Question

回归线是否欠拟合，如果是，我该怎么做才能获得准确的结果？我无法确定诸如回归线是否过拟合或欠拟合或准确等问题，因此也将不胜感激有关这些的建议。文件“Advertising.csv”：- https://github.com/marcopeix/ISL-linear-regression/tree/master/data

#Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error

#reading and knowing the data
data=pd.read_csv('Advertising.csv')
#print(data.head())
#print(data.columns)
#print(data.shape)

#plotting the data
plt.figure(figsize=(10,8))
plt.scatter(data['TV'],data['sales'], c='black')
plt.xlabel('Money Spent on TV ads')
plt.ylabel('Sales')
plt.show()

#storing data into variable and shaping data
X=data['TV'].values.reshape(-1,1)
Y=data['sales'].values.reshape(-1,1)

#calling the model and fitting the model
reg=LinearRegression()
reg.fit(X,Y)

#making predictions
predictions=reg.predict(X)

#plotting the predicted data
plt.figure(figsize=(16,8))
plt.scatter(data['TV'],data['sales'], c='black')
plt.plot(data['TV'],predictions, c='blue',linewidth=2)
plt.xlabel('Money Spent on TV ads')
plt.ylabel('Sales')
plt.show()

r2= r2_score(Y,predictions)
print("R2 score is: ",r2)
print("Accuracy: {:.2f}".format(reg.score(X,Y)))

score 0 · Accepted Answer

要确定您的模型是否欠拟合（或过拟合），您需要查看模型的偏差（模型预测的输出与预期输出之间的距离）。你不能（据我所知）仅仅通过查看你的代码来做到这一点，你还需要评估你的模型（运行它）。

由于它是线性回归，因此您可能拟合不足。

我建议将您的数据分成训练集和测试集。您可以在训练集上拟合您的模型，并使用测试集查看它在看不见的数据上的表现如何。如果模型在训练数据和测试数据上的表现都很糟糕，那么它就是欠拟合的。如果它在训练数据上表现出色但在测试数据上表现不佳，那就是过拟合了。

尝试以下方式：

from sklearn.model_selection import train_test_split

# This will split the data into a train set and a test set, leaving 20% (the test_size parameter) for testing
X, X_test, Y, Y_test = train_test_split(data['TV'].values.reshape(-1,1), data['sales'].values.reshape(-1,1), test_size=0.2)

# Then fit your model ...
# e.g. reg.fit(X,Y)

# Finally evaluate how well it does on the training and test data.
print("Test score " + str(reg.score(X_test, Y_test)))
print("Train score " + str(reg.score(X_test, Y_test)))

score 0 · Accepted Answer

而不是对相同的数据进行训练和测试。将您的数据集拆分为 2,3 个集合（训练、验证、测试）您可能只需将其拆分为 2 个（训练、测试）使用 sklearn 库函数 train_test_split 在训练数据上训练您的模型。然后对测试数据进行测试，看看你是否得到了好的结果。如果模型的训练精度非常高但测试非常低，那么你可能会说它有过拟合。或者，如果模型在训练中甚至没有得到高精度，那么它就是欠拟合的。希望你会。:)

python-3.x - 数据是否欠拟合？

2 回答 2

Related

Reference