0

我正在使用一些代码来合并两个 csv,并按两列对它们进行排序。输出一个新的 csv。输入的 csv 具有相同的名称,只是编号为 1 和 2。我正在为多组数据重复此代码。我想知道使代码输出包含原始文件名第一部分的文件名的方法是什么。

我当前的代码:

import pandas as pd

df1 = pd.read_csv("data csv 1\September 2013 1 UUedit1.csv", delimiter = ",")
df2 = pd.read_csv("data csv 1\September 2013 2 UUedit2.csv", delimiter = ",")
merged = df1.merge(df2, on="Unique Element")
delcols = "Element_y", "number_y", "date_y", "title_y", "name_y"

for delcol in delcols:
    del merged[delcol]
    
merged.rename(columns={"name_x": "name", "rdate_x": "date", "title_x": "title", "number_x": "number", "Element_x": "Element"}, inplace = True)
merged = merged.sort("Element").reset_index(drop=True)
merged = merged.sort("date").reset_index(drop=True)
merged.to_csv("MRG.csv", index=False, sep = ",")

所以在这个例子中,两个输入文件都被称为September 2013“数字”“UUedit”我想让我的代码直接输出文件名,因为September 2013 MRG.csv如何编码?为了澄清这两个原始文件是否是October 2013那么输出将是October 2013 MRG.csv 非常感谢 GTPE

编辑

运行 Christian Ternus 提供的代码后,我收到以下打印和回溯:

Usage: C:/Test.py <month> <year>
Traceback (most recent call last):
  File "C:/Test.py", line 7, in <module>
    month, year = sys.argv[1:]
ValueError: need more than 0 values to unpack

我不确定第二个变量应该设置为什么。
非常感谢
GTPE

编辑 2

我设法通过调用它 CMD 来让代码工作,但是我尝试通过 python 调用脚本似乎没有奏效。我尝试了以下方法:

import subprocess
p = subprocess.Popen(['python', 'RawDataSheetMergerPandasTest.py September 2013'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print out
4

2 回答 2

5

给定当前月份的名称,以下是获取下个月名称的方法:

import calendar
nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]

这是应用于您的脚本的相同逻辑,并进行了一些其他改进:) 将此脚本作为“ ./myscript.py somemonth someyear”运行。它会输出一个名为 的 CSV 文件nextmonth year MRG.csv,甚至考虑本地化和正确包装年份。

import pandas as pd
import calendar
import sys

if len(sys.argv) != 3:
    print "Usage: {0} <month> <year>".format(sys.argv[0])
month, year = sys.argv[1:]

if not month in calendar.month_name:
    print "Invalid month! Month must be one of:{0}".format(str(calendar.month_name))
if not year.isdigit():
    print "Invalid year! Year must be a number."

nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]

df1 = pd.read_csv("data csv 1\{0} {1} 1 UUedit1.csv".format(month, year), delimiter = ",")
df2 = pd.read_csv("data csv 1\{0} {1} 2 UUedit2.csv".format(month, year), delimiter = ",")
merged = df1.merge(df2, on="Unique Element")
delcols = "Element_y", "number_y", "date_y", "title_y", "name_y"

for delcol in delcols:
    del merged[delcol]

merged.rename(columns={"name_x": "name", "rdate_x": "date", "title_x": "title", "number_x": "number", "Element_x": "Element"}, inplace = True)
merged = merged.sort("Element").reset_index(drop=True)
merged = merged.sort("date").reset_index(drop=True)

if month == calendar.month_name[-1]: year = str(int(year + 1))

merged.to_csv("{0} {1} MRG.csv".format(nextmonth, year), index=False, sep = ",")

如果您不需要下个月的功能(听起来您实际上不需要),请取出以下两行:

nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]
[...]
if month == calendar.month_name[-1]: year = str(int(year + 1))

并将最后一行替换为:

merged.to_csv("{0} {1} MRG.csv".format(month, year), index=False, sep = ",")
于 2013-10-28T01:48:32.933 回答
0

os.path.commonprefix您可以使用接受任意数量的输入文件的内置函数:

import os

filenames = ['data csv 1\September 2013 1 UUedit1.csv',
             'data csv 1\September 2013 2 UUedit2.csv',]

merged_filename = os.path.commonprefix(filenames).rstrip(' ') + ' MRG.csv'
print repr(merged_filename)  # --> 'data csv 1\September 2013 MRG.csv'
于 2013-10-30T17:32:47.087 回答