1

我想用来PyJulia加速部分代码

import numpy as np
import julia
import pandas as pd
import random
from julia import Base
from julia import Main
from julia import DataFrames

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.dfj = df

Main.eval(""" 
for i = 1:10
    #println(i)
    if dfj.Score[i] >= 10
        println(dfj.Score[i])
    end
end
"""
)

但是我收到以下错误消息:

JuliaError: Exception 'TypeError: non-boolean (PyObject) used in boolean context' occurred while calling julia code:

此外,以下命令:

Main.eval(""" 
println(dfj.Score[1])
"""
)

给出输出(这似乎不是 Julia DataFrame):

PyObject 84

有没有办法将 pandas DataFrame 转换为 Julia DataFrame?

编辑 1

感谢@PrzemyslawSzufel 的回答,现在可以使用以下代码:

import numpy as np
import julia
import pandas as pd
import random
import copy
from julia import Base
from julia import Main
from julia import DataFrames
from julia import Pandas
#julia.install(DataFrame)
%load_ext julia.magic

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
""")

但是,虽然我;在行尾放了 a ,但我总是从 dfj 得到一个打印输出,它不需要而且很长(100000 行)并且需要大约一秒钟。有没有办法避免打印输出?

此外,如果我现在修改 Julia 中的数据框(这比在 python 中执行此操作和整个问题的目标要快得多)并希望它将其转换回 python pandas,我也会收到错误消息

Main.eval(""" 
for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end
"""
)

dfjpy = pd.DataFrame(Main.dfj)
dfjpy


RuntimeError: Julia exception: MethodError: no method matching iterate(::DataFrames.DataFrame)
Closest candidates are:
  iterate(!Matched::Core.SimpleVector) at essentials.jl:568
  iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:568
  iterate(!Matched::ExponentialBackOff) at error.jl:199
  ...
Stacktrace:
 [1] jlwrap_iterator(::DataFrames.DataFrame) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:144
 [2] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:125

顺便说一句,命令type(dfjpy)作为PyCall.jlwrap输出

编辑 2

为了将 julia Dataframe 转换为 Python Pandas,您必须首先将其转换为 Julia Pandas。是最新的工作代码

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;

for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end

dfjp = dfj |> Pandas.DataFrame;
"""
)

dfjpy = Main.dfjp
dfjpy
4

1 回答 1

4

你需要已经Pandas.jl安装了。该库将使用 Julia 处理您的 Python pandas 数据框,然后您可以将其转换为DataFrames.jl.

这是 Julia 代码(假设这dfj是您的 Python 变量):

import DataFrames
import Pandas
juliandf = dfj |> Pandas.DataFrame |> DataFrames.DataFrame;

注意最后一行也可以写成:

C= DataFrames.DataFrame(Pandas.DataFrame(dfj));

转换回来Pandas.DataFrame(juliandf)应该工作。

于 2020-09-04T01:42:36.850 回答