python - 合并单元格，在同一列，在同一个 df-Python

Question

我正在尝试将两个单元格合并在一起。原因是“机箱”下的每个单元都应该是一个字母数字（ABCD123456），但是提供的 PO 偶尔会将最后一个数字移到下一行（所述行上没有其他数据）使得数据看起来像这样示例我最初尝试创建一个查看单元格的语句，确认它小于一个数字，然后查看下一个单元格，并将两者合并。从来没有达到甚至接近于表现出任何结果。然后我决定复制数据框，移动第二个数据框（因此丢失的数字在同一行），并将它们合并在一起。这就是我现在的位置。错误信息这是我在 Python 中的第一个真正的代码，所以我相当肯定我在做低效的事情，所以一定要让我知道我可以改进的地方。

目前我有这个...

Col1	机壳	其他栏目...	其他栏目 2...
楠	ABCD12345	美国广播公司	123
楠	6	楠	楠
楠	WXYZ987654	GHI	456
楠	QRSTU654987	楠	789
楠	MNOP999999	XYZ	楠

最终目标是这个...

Col1	机壳	其他栏目...	其他栏目 2...
楠	ABCD12345 6	美国广播公司	123
楠	WXYZ987654	GHI	456
楠	QRSTU654987	楠	789
楠	MNOP999999	XYZ	楠

import PyPDF2 as pdf2
import tabula as tb
import pandas as pd
import re
import csv
import os

os.listdir()
pd.set_option('display.max_columns', None)

#bring in pdf, remove first page, convert to csv
PO = 'PO.pdf'
pages = open(PO, 'rb')
readPDF = pdf2.PdfFileReader(pages)
totalpages = readPDF.numPages
x = '2-' + str(totalpages)
POCSV = tb.convert_into(PO, 'POCSV.csv', output_format = 'csv', pages = x)

#Convert column to string, create second data frame, shift said data frame up 1
df = pd.read_csv('POCSV.csv')
df['Chassis'] = df['Chassis'].astype(str)
dfshift = df.shift(-1)
dfshift.rename(columns=({'Chassis': 'Chassis Shifted'}), inplace = True,)
dfMerged = pd.concat([df, dfshift], axis=1)

#For each row combine rows, create new column
for ind, row in df.iterrows():
    dfMerged.loc[ind, 'Complete Chassis'] = row['Chassis'] + row["Chassis Shifted"]
    
print(dfMerged['Complete Chassis'])

score 0 · Accepted Answer

为列创建一个虚拟组并合并该组的Chassis行：

# Convert 'NaN' string to pd.NA
df = df.replace('Nan', pd.NA)

cols = df.columns.difference(['Chassis'])
m = df[cols].any(1)
df = df.assign(Chassis=df.groupby(m.cumsum())['Chassis'] \
       .transform('sum')).loc[m].reset_index(drop=True)
print(df)

# Output
   Col1      Chassis Other Columns Other Columns 2
0  <NA>   ABCD123456           ABC             123
1  <NA>   WXYZ987654           GHI             456
2  <NA>  QRSTU654987          <NA>             789
3  <NA>   MNOP999999           XYZ            <NA>

设置：

import pandas as pd
import numpy as np

data = {'Col1': ['Nan', 'Nan', 'Nan', 'Nan', 'Nan'],
        'Chassis': ['ABCD12345', '6', 'WXYZ987654', 'QRSTU654987', 'MNOP999999'],
         'Other Columns': ['ABC', 'Nan', 'GHI', 'Nan', 'XYZ'],
         'Other Columns 2': ['123', 'Nan', '456', '789', 'Nan']}
df = pd.DataFrame(data)
print(df)

# Output
  Col1      Chassis Other Columns Other Columns 2
0  Nan    ABCD12345           ABC             123
1  Nan            6           Nan             Nan
2  Nan   WXYZ987654           GHI             456
3  Nan  QRSTU654987           Nan             789
4  Nan   MNOP999999           XYZ             Nan

python - 合并单元格，在同一列，在同一个 df-Python

1 回答 1

Related

Reference