程序在 PyCharm 中运行时运行良好。我第一次尝试使用 PyInstaller 创建一个 .exe 文件。
我的逻辑如下:
- 使用 Pandas 将 CSV 读入 Dataframe
- 拆分成 12 个大小大致相等的 Dataframe 3 启动 12 个线程处理数据并上传到 MongoDB
下面的示例代码,但这是我得到的错误。让我大吃一惊的一件事是,最初的“Start DateTime ...”消息出现了两次。
Start DateTime=2020-11-18 11:29:22
Pandas reading large CSV file
Number of rows in CSV loaded into memory: 12,632,654
group_size= 1,052,721
remainder_size= 2
Number of split_dataframes= 12
Revised Number of split_dataframes= 13
split_df: 0 size= 1052721
split_df: 1 size= 1052721
split_df: 2 size= 1052721
split_df: 3 size= 1052721
split_df: 4 size= 1052721
split_df: 5 size= 1052721
split_df: 6 size= 1052721
split_df: 7 size= 1052721
split_df: 8 size= 1052721
split_df: 9 size= 1052721
split_df: 10 size= 1052721
split_df: 11 size= 1052721
split_df: 12 size= 2
Number of Concurrent Process= 12
Defining Process: 0
Starting Process: 0
Start DateTime=2020-11-18 11:29:41
Pandas reading large CSV file
Traceback (most recent call last):
File "UploadIP2LocationsToMongoDBMultiProc.py", line 243, in <module>
File "pandas\io\parsers.py", line 688, in read_csv
File "pandas\io\parsers.py", line 454, in _read
File "pandas\io\parsers.py", line 948, in __init__
File "pandas\io\parsers.py", line 1180, in _make_engine
File "pandas\io\parsers.py", line 2010, in __init__
File "pandas\_libs\parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'parent_pid=22208'
[22956] Failed to execute script UploadIP2LocationsToMongoDBMultiProc
错误行 243 指向这一行,但它第一次运行正常,不应该运行第二次。
df = pd.read_csv(input_csv_filename, dtype=dataTypes)
部分代码:
```
print("Start DateTime=" + str(startDateNowFmt))
print("Pandas reading large CSV file")
df = pd.read_csv(input_csv_filename, dtype=dataTypes)
print(f"Number of rows in CSV loaded into memory: {len(df.index):,}")
print("Number of Concurrent Process=", num_current_processes_to_use)
processes = [] # hold our processes
for index, split_df in enumerate(df_split_list):
print("Defining Process:", index)
process = mp.Process(target=load_dataframe_to_mongodb,
args=(index, split_df, arg_mongodb_collection_name)
)
processes.append(process)
print("Starting Process:", index)
process.start()
for indx, process in enumerate(processes):
print("Joining process:", index)
process.join()
```