python - 从python中的df -h输出中选择特定列

Question

我正在尝试创建一个简单的脚本，它将从 unixdf - h命令中选择特定的列。我可以使用 awk 来做到这一点，但我们如何在 python 中做到这一点？

这是df -h输出：

已使用的文件系统大小可用使用百分比已安装在
/dev/mapper/vg_base-lv_root 28G 4.8G 22G 19% /
tmpfs 814M 176K 814M 1% /dev/shm
/dev/sda1 485M 120M 340M 27% /boot

我想要类似的东西：

第 1 列：

文件系统
/dev/mapper/vg_base-lv_root           
tmpfs                 
/dev/sda1

第 2 栏：

尺寸
28G
814M
485M

score 12 · Accepted Answer

您可以使用op.popen运行命令并检索其输出，然后splitlines拆分split行和字段。如果列太长，则运行df -Ph而不是不拆分行。df -h

df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]

结果是行列表。要提取第一列，可以使用[line[0] for line in df_output_lines]（注意列从 0 开始编号）等等。您可能想要使用df_output_lines[1:]而不是df_output_lines剥离标题行。

如果您已经将输出df -h存储在某个文件中，则需要先加入这些行。

fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read())
df_output_lines = [s.split() for s in fixed_df_output.splitlines()]

请注意，这假定文件系统名称和挂载点都不包含空格。如果他们这样做了（在某些 unix 变体上进行一些设置是可能的），则几乎不可能解析df, 甚至df -P. 您可以使用os.statvfs获取给定文件系统的信息（这是C 函数的Python 接口，它在df内部为每个文件系统调用），但没有可移植的方法来枚举文件系统。

score 2 · Accepted Answer

这是完整的示例：

import subprocess
import re

p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()

dfdata = dfdata.decode().replace("Mounted on", "Mounted_on")

columns = [list() for i in range(10)]
for line in dfdata.split("\n"):
    line = re.sub(" +", " ", line)
    for i,l in enumerate(line.split(" ")):
        columns[i].append(l)

print(columns[0])

它假设挂载点不包含空格。

这是不硬核列数的更完整（和复杂的解决方案）：

import subprocess
import re

def yield_lines(data):
    for line in data.split("\n"):
        yield line

def line_to_list(line):
    return re.sub(" +", " ", line).split()

p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()

dfdata = dfdata.decode().replace("Mounted on", "Mounted_on")

lines = yield_lines(dfdata)

headers = next(lines, line_to_list)

columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
    columns[i].append(h)
 
for line in lines:
    for i,l in enumerate(line_to_list(line)):
        columns[i].append(l)

print(columns[0])

score 2 · Accepted Answer

不是问题的答案，但我试图解决问题。:)

from os import statvfs

with open("/proc/mounts", "r") as mounts:
    split_mounts = [s.split() for s in mounts.read().splitlines()]

    print "{0:24} {1:24} {2:16} {3:16} {4:15} {5:13}".format(
            "FS", "Mountpoint", "Blocks", "Blocks Free", "Size", "Free")
    for p in split_mounts:
        stat = statvfs(p[1])
        block_size = stat.f_bsize
        blocks_total = stat.f_blocks
        blocks_free = stat.f_bavail

        size_mb = float(blocks_total * block_size) / 1024 / 1024
        free_mb = float(blocks_free * block_size) / 1024 / 1024

        print "{0:24} {1:24} {2:16} {3:16} {4:10.2f}MiB {5:10.2f}MiB".format(
                p[0], p[1], blocks_total, blocks_free, size_mb, free_mb)

score 1 · Accepted Answer

不使用 os.popen，因为它已被弃用（http://docs.python.org/library/os#os.popen）。

我已将 df -h 的输出放在一个文件中：test.txt 并从该文件中读取。但是，您也可以使用 subprocess 阅读。假设您能够阅读 df -h 输出的每一行，以下代码将有所帮助：-

f = open('test.txt')

lines = (line.strip() for line in f.readlines())
f.close()    
splittedLines = (line.split() for line in lines)
listOfColumnData = zip(*splittedLines)
for eachColumn in listOfColumnData:
    print eachColumn

eachColumn 将显示您想要的整个列作为列表。你可以迭代它。如果你需要，我可以给出从 df -h 读取输出的代码，这样你就可以删除对 test.txt 的依赖，但是，如果你去 subprocess 文档，你可以很容易地找到如何做到这一点。

score 1 · Accepted Answer

我有一个安装点，里面有一个空格。这抛弃了大多数例子。这从@ZarrHai 的示例中借鉴了很多，但将结果放在dict

#!/usr/bin/python
import subprocess
import re
from pprint import pprint

DF_OPTIONS = "-laTh" # remove h if you want bytes.

def yield_lines(data):
    for line in data.split("\n"):
        yield line

def line_to_list(line):
    pattern = re.compile(r"([\w\/\s\-\_]+)\s+(\w+)\s+([\d\.]+?[GKM]|\d+)"
                          "\s+([\d\.]+[GKM]|\d+)\s+([\d\.]+[GKM]|\d+)\s+"
                          "(\d+%)\s+(.*)")
    matches = pattern.search(line)
    if matches:
        return matches.groups()
    _line = re.sub(r" +", " ", line).split()
    return _line

p = subprocess.Popen(["df", DF_OPTIONS], stdout=subprocess.PIPE)
dfdata, _ = p.communicate()

dfdata = dfdata.replace("Mounted on", "Mounted_on")

lines = yield_lines(dfdata)

headers = line_to_list(lines.next())

columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
    columns[i].append(h)

grouped = {}
for li, line in enumerate(lines):
    if not line:
        continue
    grouped[li] = {}
    for i,l in enumerate(line_to_list(line)):
        columns[i].append(l)
        key = headers[i].lower().replace("%","")
        grouped[li][key] = l.strip()

pprint(grouped)

score 1 · Accepted Answer

1

我发现这是一种简单的方法......

df -h |  awk '{print $1}'

于 2015-08-11T12:33:24.083 回答

score 0 · Accepted Answer

这有效：

#!/usr/bin/python

import os, re

l=[]
p=os.popen('df -h')
for line in p.readlines():
    l.append(re.split(r'\s{2,}',line.strip()))


p.close()

for subl in l:
    print subl

score 0 · Accepted Answer

在我可以访问的所有系统中，我注意到一件事：带有选项 -P 的 df 在空白对齐的列中打印。这意味着标题与其余项目的宽度相同（用空格填充）。基于7erm 的答案，这使用标头的大小来确保它获得整个安装点，即使其中有空格也是如此。

这已经在 Ubuntu 14.04、16.04 和 FreeBSD 9.2 上进行了测试。

我已经解决了这两种不同的方法，第一种是对 OP 问题的直接回答，给出 6 列，每列以标题开头，然后在其下方按顺序排列每个挂载点：

import pprint
import subprocess
import re

DF_OPTIONS = "-PlaTh" # remove h if you want bytes.

# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)

# Split it based on newlines
lines = dfdata.split("\n")

dfout = {}
headers = []

# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace 
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]

for hi,head in enumerate(headers):
  dfout[hi] = [head.strip()]

for line in lines[1:]:
  pos = 0
  dfstruct = {}
  for hi,head in enumerate(headers):
    # For the last item, grab the rest of the line
    if head == headers[-1]:
      item = line[pos:]
    else:
      # Get the current item
      item = line[pos:pos+len(head)]

    pos = pos + len(head)

    #Strip whitespace and add it to the list

    dfstruct[head.strip()] = item.strip()
    dfout[hi].append(item.strip())

pprint.pprint(dfout)

第二个对我更有用，并且解决了我首先偶然发现这个问题的原因。这会将信息放入一个字典数组中：

import pprint
import subprocess
import re

DF_OPTIONS = "-PlaTh" # remove h if you want bytes.

# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)

# Split it based on newlines
lines = dfdata.split("\n")

dfout = []
headers = []

# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace 
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]

for line in lines[1:]:
  pos = 0
  dfstruct = {}
  for head in headers:
    # For the last item, grab the rest of the line
    if head == headers[-1]:
      item = line[pos:]
    else:
      # Get the current item
      item = line[pos:pos+len(head)]

    pos = pos + len(head)
    #Strip whitespace for our own structure
    dfstruct[head.strip()] = item.strip()

  dfout.append(dfstruct)

pprint.pprint(dfout)

python - 从python中的df -h输出中选择特定列

8 回答 8

Related

Reference