0

我想读取我已经使用file=glob.glob...函数的文件夹中的文件名。并在'file_last_mod_t'列中添加最后修改文件时间。

我的部分代码:

df=pd.DataFrame(columns=['filename','file_last_mod_t','else'])

df.set_index('filename')
for file in glob.glob('folder_path'): #inside this folder is file.txt
    file_name=os.path.basename('folder_path')
    df.loc[file_name]= os.path.getmtime(file)

这给了我:

df:
filename,file_last_mod_t,else
file.txt,123456,123456          #123456 its time result example

我只想将最后一次修改时间添加到file_last_mod_t列,而不是全部。

我想收到:

df:
filename,file_last_mod_t,else
file.txt,123456,

感谢您的建议

代码修改后:

df=pd.read_csv('C:/df.csv')
filename_list= pd.Series(result_from_other_definition)# it looks same as in #filename column
df['filename']=filename_list # so now i have dataframe with 3 columns and firs column have files list
df.set_index('filename')
      for file in glob.glob('folder_path'):#inside this folder is file.txt
      df['file_last_mod_t']=df['filename'].apply(lambda x: (os.path.getmtime(x)) #the way how getmtime is present is now no matter, could be #float numbers
      df.to_csv('C:/df.csv')

#printing 样本: 第一次运行:

df['filename']=filename_list
print (df)
,'filename','file_last_mod_t','else'
0,file1.txt,NaN,NaN
1,file2.txt,NaN,NaN

当 df 为空时,上面的代码在第一次运行后工作正常,只有标题。下次运行后,当我运行代码并df.csv有一些内容i am changing manually value of timestamp in file时,我收到一个错误:TypeError: stat: path should be string, bytes, os.PathLike or integer,not float 此代码应该用良好的时间戳替换手动修改的单元格。我认为它与apply 我不知道为什么索引出现在df中有关

**解决了 **

4

1 回答 1

1

请参阅以下代码注释:

import os
import pandas as pd
import datetime as dt
import glob

# this is the function to get file time as string
def getmtime(x):
    x= dt.datetime.fromtimestamp(os.path.getmtime(x)).strftime("%Y-%m-%d %H:%M:%d")
    return x

df=pd.DataFrame(columns=['filename','file_last_mod_t','else'])

df.set_index('filename')

# I set filename list to df['filename']
df['filename'] = pd.Series([file for file in glob.glob('*')])

# I applied a time modified file to df['file_last_mod_t'] by getmtime function
df['file_last_mod_t'] = df['filename'].apply(lambda x: getmtime(x))

print (df)

结果是

          filename      file_last_mod_t else
0        dataframe  2019-05-04 18:43:04  NaN
1      fer2013.csv  2018-05-26 12:18:26  NaN
2         file.txt  2019-05-04 18:49:04  NaN
3        file2.txt  2019-05-04 18:51:04  NaN
4   Untitled.ipynb  2019-05-04 17:41:04  NaN
5  Untitled1.ipynb  2019-05-04 20:51:04  NaN

对于更新的问题,我从df.csv以下数据开始:

filename,file_last_mod_t,else
file1.txt,,

而且,我认为您想添加新文件。所以,我制作了如下代码:

import os
import pandas as pd

df=pd.read_csv('df.csv')

df_adding=pd.DataFrame(columns=['filename','file_last_mod_t','else'])
df_adding['filename'] = pd.Series(['file2.txt'])
df = df.append(df_adding)
df = df.drop_duplicates('filename')

df['file_last_mod_t']=df['filename'].apply(lambda x: (os.path.getmtime(x))) #the way how getmtime is present is now no matter, could be #float numbers
df.to_csv('df.csv', index=False)

df_adding为新文件创建了数据框,并将其附加到读取的 df 中df.csv。最后,我们可以申请getmtime并保存 if to df.csv

于 2019-05-04T09:48:11.330 回答