0

I need to calculate the value which has the similar name in the header. Here i have Bill and Non Bill fields in the column. I need to calculate all the Bill separately and non-bill separately and need to sum in another column Bill Amt Total and Non Bill Amt Total using xlswrriter in python.

Input file

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS
Alex    |   30          |                   |      10    |
Ram     |   10          |          20       |            |
Stephen |               |                   |      20    |
Robert  |               |          10       |            |      10
Mohan   |               |          20       |      10    |

Output file:

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex    |   30          |                   |      10    |              |    40          |       
Ram     |   10          |          20       |            |              |    10          |   20
Stephen |               |                   |      20    |              |                |   20
Robert  |               |          10       |            |      10      |                |   20
Mohan   |               |          20       |      10    |              |     10         |   20

输入文件

输出文件

4

1 回答 1

0

只需按字符串前缀 ( str.startswith()) 选择列,然后按 . 执行水平求和df[selected_columns].sum(axis=1)

此外,Pandas带有 Excel 保存/加载功能,因此您实际上不需要导入xlsxwriteropenpyxl.

数据

数据以 Excel 格式复制。

import pandas as pd
import io
import numpy as np

df = pd.read_csv(io.StringIO("""
Name    | Bill - Php  | Non Bill - Php  | Bill - JS  | Non Bill -JS
Alex    |   30        |                 |      10    |
Ram     |   10        |          20     |            |
Stephen |             |                 |      20    |
Robert  |             |          10     |            |      10
Mohan   |             |          20     |      10    |
"""), sep=r"\|\s*", engine='python')

# cleanup
df.columns = [c.strip() for c in df.columns]
df["Name"] = df["Name"].str.strip()

# save .xlsx
df.to_excel("/mnt/ramdisk/data.xlsx", index=False)

解决方案

# load .xlsx
df = pd.read_excel("/mnt/ramdisk/data.xlsx")

for prefix in ("Bill", "Non Bill"):
    # select the columns to be summed
    cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
    # new column name
    col = f"{prefix} Amt Total"
    # sum the selected columns horizontally
    df[col] = df[cols_to_sum].sum(axis=1)
    # (optional) replace 0 with nan
    df[col] = df[col].replace({0.0: np.nan})

# save a new file
df.to_excel("/mnt/ramdisk/out.xlsx", index=False)

查看添加的列:

print(df.iloc[:,-2:])
# Out[219]: 
#    Bill Amt Total  Non Bill Amt Total
# 0            40.0                 NaN
# 1            10.0                20.0
# 2            20.0                 NaN
# 3             NaN                20.0
# 4            10.0                20.0
于 2021-04-06T11:43:20.903 回答