2

我正在使用 Python 数据表(https://github.com/h2oai/datatable)读取仅包含整数值的 csv 文件。之后,我将数据表转换为熊猫数据框。在转换时,仅包含 0/1 的列被视为布尔值而不是整数。

让下面的 csv 文件(small_csv_file_test.csv):

a1,a2,a3,a4,a5,a6,a7,a8,a9,a10
 1, 1, 1, 1, 1, 1, 1, 0, 1, 1
 2, 2, 2, 2, 2, 2, 2, 1, 0, 1
 3, 3, 3, 3, 3, 3, 3, 0, 0, 1
 4, 4, 4, 4, 4, 4, 4, 1, 0, 0
 5, 5, 5, 5, 5, 5, 5, 0, 0, 0
 6, 6, 6, 6, 6, 6, 6, 0, 0, 0
 7, 7, 7, 7, 7, 7, 7, 1, 1, 0
 8, 8, 8, 8, 8, 8, 8, 1, 1, 1
 9, 9, 9, 9, 9, 9, 9, 1, 1, 1
 0, 0, 0, 0, 0, 0, 0, 1, 0, 1

源代码:

import pandas as pd
import datatable as dt

test_csv_matrix = "small_csv_file_test.csv"

data = dt.fread(test_csv_matrix)
print(data.head(5))

matrix= data.to_pandas()
print(matrix.head())

结果:

   | a1 a2 a3 a4 a5 a6 a7 a8 a9 a10  
-- + -- -- -- -- -- -- -- -- -- ---  
 0 | 1 1 1 1 1 1 1 0 1 1  
 1 | 2 2 2 2 2 2 2 1 0 1  
 2 | 3 3 3 3 3 3 3 0 0 1  
 3 | 4 4 4 4 4 4 4 1 0 0  
 4 | 5 5 5 5 5 5 5 0 0 0  

[5 行 x 10 列]

   a1 a2 a3 a4 a5 a6 a7 a8 a9 a10  
0 1 1 1 1 1 1 1 假 真 真  
1 2 2 2 2 2 2 2 真 假 真  
2 3 3 3 3 3 3 3 假 假 真  
3 4 4 4 4 4 4 4 真假假  
4 5 5 5 5 5 5 5 假 假 假  

编辑 1: a8、a9 和 a10 列不正确,我希望它们作为整数值而不是布尔值。

谢谢您的帮助。

4

4 回答 4

3

您可以将每一列强制转换为 int64:

matrix = data.to_pandas().astype('int64')
于 2020-07-20T13:28:14.830 回答
1

你可以这样做:

import datatable as dt
x = dt.Frame({"a": ["1", "2", "3"], "b":["20", "30", "40"]})
x.stypes
#(stype.str32, stype.str32)
x[:,:] = dt.int64
x.stypes
#(stype.int64, stype.int64)
于 2020-10-30T16:10:59.743 回答
1

您可以随时推送数据类型

df = pd.DataFrame({"a1":[1,2,3,4,5,6,7,8,9,0],"a2":[1,2,3,4,5,6,7,8,9,0],"a3":[1,2,3,4,5,6,7,8,9,0],"a4":[1,2,3,4,5,6,7,8,9,0],"a5":[1,2,3,4,5,6,7,8,9,0],"a6":[1,2,3,4,5,6,7,8,9,0],"a7":[1,2,3,4,5,6,7,8,9,0],"a8":[0,1,0,1,0,0,1,1,1,1],"a9":[1,0,0,0,0,0,1,1,1,0],"a10":[1,1,1,0,0,0,0,1,1,1]})
df = df.astype({c:"int64" for c in df.columns})
df.dtypes


于 2020-07-20T13:32:54.563 回答
1

将此代码与您的代码段一起添加。

matrix = matrix.iloc[:].astype(int)
matrix

输出:

   a1   a2  a3  a4  a5  a6  a7  a8  a9  a10
0   1   1   1   1   1   1   1   0   1   1
1   2   2   2   2   2   2   2   1   0   1
2   3   3   3   3   3   3   3   0   0   1
3   4   4   4   4   4   4   4   1   0   0
4   5   5   5   5   5   5   5   0   0   0
5   6   6   6   6   6   6   6   0   0   0
于 2020-07-20T13:40:47.623 回答