python - numba 中有没有办法使用 numpy datetime64 变量确定天数和小时数？

Question

我在一个非常大的文件（30 gigs）上运行一个非常大的函数。因为python很慢，所以我决定尝试在numba中实现这个功能。在初步阅读之后，似乎 numba 在使用 datetime 的操作能力方面非常过时，只允许对 np.datetime64 对象进行操作并且只查看 timedeltas 和非常基本的 np.datetime64 操作。

文件中的一列是日期时间对象。我需要运行的一项检查是检查日期是否更改（在数据集的时区中定义为下午 5:00），如果日期更改则执行操作。不幸的是，我还没有找到一个干净的解决方案，我可以在 numpy datetime64 对象上工作以执行此检查，并且想知道是否有办法做到这一点。

目前，该函数采用整数数组表示年、月、周、工作日、日、小时、分钟和秒，这就是我在 numba 函数中处理时间的方式，效率非常低。

# What I have right now: 
@nb.jit
def check(hour): 
    for i in range(1, len(hour)-1): 
        if hour[i-1] == 4 and hour[i] == 5: 
              # run code
        else: 
              pass

# What I would Like (timestamp is a numpy datetime64 array): 
@nb.jit
def check(timestamp): 
   if hour(timestamp)[i-1] == 4 and hour(timestamp[i]) == 5: 
         # Run code
   else: 
        pass



Return the same thing that I am doing now without the function needing to use integer array variables.

score 0 · Accepted Answer

我认为 Numba 的基本规则是“不要使用对象！”

您应该这样做并用作 2D 整数数组。在 Numba 之外进行。

dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])
year_month_days = np.stack([dates.year, dates.month, dates.day], axis=1)

python - numba 中有没有办法使用 numpy datetime64 变量确定天数和小时数？

1 回答 1

Related

Reference