我的回归公式一直有问题。当我浏览我的数据库并将任何空白单元格替换为 0 值时,我的数据集没有任何 Nan 值。我有一种感觉,因为我已经读入了 3 个不同的数据库?这是我当前的代码:
import seaborn as sns
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
from scipy import stats
covid = pd.read_csv(r'C:\Users\ISAAC\Documents\Business Analytics\Capstone project\complete covid database.csv')
covid.head()
covidEA = pd.read_csv(r'C:\Users\ISAAC\Documents\Business Analytics\Capstone project\europe north america.csv')
covidEA.head()
covidEU = pd.read_csv(r'C:\Users\ISAAC\Documents\Business Analytics\Capstone project\europe.csv')
covidEU.head()
feature = ['total_vaccinations']
label = ['new_deaths']
x = covidEU[feature]
y = covidEU[label]
x_train, x_test, y_train, y_test = train_test_split(x,y)
linreg = LinearRegression()
linreg.fit(x_train, y_train),