我有一个包含 LSAT 数据的 CSV 文件,我想用它来从中提取信息。
问题:如何划分两个数据框以获得每个“问题类型”的正确答案百分比
--> 对于qTypeTotal 中的“问题类型”:将“问题类型” is_correct True 值除以“问题类型”qTypeTotal 值
以下代码显示了我需要比较的两条信息。
qTypeTotal返回每种类型的问题,以及它被问了多少次。
correct_answers返回一个带有“问题类型”以及答案是否正确的 DataField,以 True/False 给出,后跟 True/False 的次数
import pandas as pd
df = pd.read_csv('C:/Users/Kenny/Downloads/logicReasoning.csv')
qTypeTotal = df['Question Type'].value_counts()
print(qTypeTotal)
df['is_correct'] = df['Your Answer'] == df['Correct Answer']
correct_answers = df.groupby(['Question Type', 'is_correct']).size()
print(correct_answers)
尝试: DataFrame.merge
import pandas as pd
# use dtype to specify data type ex dtype={"name": str, "age": np.int32}
df = pd.read_csv('C:/Users/Kenny/Downloads/logicReasoning.csv')
qTypeTotal = df['Question Type'].value_counts()
print(qTypeTotal)
df['is_correct'] = df['Your Answer'] == df['Correct Answer']
correct_answers = df.merge(['Question Type', 'is_correct'])
print(correct_answers)
结果:
文件“C:\Users\Kenny\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py”,第 526 行,在init 'type {right}'.format(right=type(right)))
ValueError:无法将 DataFrame 与类型的实例合并
替代尝试:我尝试将列表转换为 Pandas DateFrame
import pandas as pd
import numpy as np
df = pd.read_csv('C:/Users/Kenny/Downloads/logicReasoning.csv')
qTypeTotal = df['Question Type'].value_counts()
#print(qTypeTotal)
df['is_correct'] = df['Your Answer'] == df['Correct Answer']
correct_answers = df.groupby(['Question Type', 'is_correct']).size()
#print(correct_answers)
dframe = pd.DataFrame(np.array(qTypeTotal.reshape(50,3), columns = list('Question Type')))
print(dframe)
结果:
文件“C:\Users\Kenny\Anaconda3\lib\site-packages\pandas\core\generic.py”,第 4372 行,在getattr 返回对象中。getattribute(自我,姓名)
AttributeError:“系列”对象没有属性“重塑”
一些资料来源:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html