1

我最近在 python 中编写了一个脚本,它处理 Microsoft Windows DHCP 服务器转储文件并使用电子表格 XML 格式生成当前保留的 XML 文件。

该脚本基本上使用 python open()命令打开一个文件,然后遍历每一行(对于文件中的行)并查找关键字reservedip。如果找到关键字,则使用 shlex split()命令将该行分成多个字段。

但是,当我使用 microsoft DHCP 服务器的默认转储文件运行此脚本时,我没有得到任何结果。另请注意,我无法使用 Linux 的 grep 命令在文件中进行搜索

然后我尝试在 gedit 中打开文件并将其保存为 unix 文本文件。完成此操作后,我得到了结果并能够在文件中进行 grep。然而,这种方法破坏了编写脚本来自动化我的工作的全部意义。

我一直在谷歌上搜索,但没有找到我想要的东西。我也尝试以二进制模式打开文件,但这也无济于事。

我希望有人可以帮助我解决这个问题。

根据请求,以下是脚本的作用(至少是循环部分)和 DHCP 服务器输出的示例:

脚本

# Setup an empty dictionary to store the extracted records
records = {}

# Open dhcp dump file
f = open(dhcp.txt, "r")

# Iterate file line by line
for line in f:

  # Only use line with the word "reservedip" in it
  if "reservedip" in line:

    # Split line into fields by spaces (excluding quoted substrings)
    field = shlex.split(line)

    # Add new entry for each record using the 32bit IP address int as it's key
    records[addr_to_int(field[7])] = [field[7], field[8], field[9], field[10]]

*注意:addr_to_int 是我编写的将点分 IPv4 地址转换为整数的函数*

DHCP 转储

不幸的是,由于公司政策,我无法包含真正的 DHCP 服务器转储。但是我试图从文件中删除的行如下所示:

Dhcp Server \\servername.company.local Scope 172.16.104.0 添加reservedip 172.16.104.207 003386dd00gg "hostname.company.local" "Host Description" "BOTH"

在此先感谢,帕斯卡

4

3 回答 3

1

消除结束行字符问题的一种方法是使用 re 将结束行字符设置为 Unix 样式:

import re

dhcp_file = open( path_to_dhcp_file, 'r' )
for line in dhcp_file:
    # Change en line char to UNIX style
    line = re.sub( "\r\n", r"\n", line )

    # now do your things on line
于 2012-12-17T10:28:16.510 回答
1

基于这两行,您作为 DHCP 转储文件内容的示例,我制作了以下测试用例(为了在此示例中清晰起见,我在开头添加了 l1、l2、l3、...每行,指的是行号)

所以这是我在 Linux Fedora Core 17 (x86_64) data.txt 上创建的转储文件:

l1: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
l2: 003386dd00gg "hostname.company.local" "Host Description" "BOTH"
l3: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
l4: 003386dd00gg "hostname.company.local" "Host Description" "BOTH"
l5: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add  172.16.104.207 
l6: 003386dd00gg "hostname.company.local" "Host Description" "BOTH"
l7: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add  172.16.104.207 
l8: 003386dd00gg "hostname.company.local" "Host Description" "BOTH"
l9: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
l10: 003386dd00gg "hostname.company.local" "Host Description" "BOTH"  

你之前这么说:

另请注意,我无法使用 Linux 的 grep 命令在文件中进行搜索

这是我使用上述示例文件运行 grep 时得到的结果

$ cat data.txt | grep reservedip
l1: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
l3: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
l9: Dhcp Server \\servername.company.local Scope 172.16.104.0 Add reservedip 172.16.104.207 
$ 

这也是我使用 python 脚本进行的测试,以检查脚本是否能够在示例文件中找到关键字“reservedip”:

lineNumber = 0
with open("./data.txt") as dhcpDumpFile:
    for line in dhcpDumpFile:
        lineNumber += 1
        if "reservedip" in line:
            print("Found 'reservedip' at the line: ", lineNumber)

我得到的结果是:

$ python -tt myscript.py
("Found 'reservedip' at the line: ", 1)
("Found 'reservedip' at the line: ", 3)
("Found 'reservedip' at the line: ", 9)
$

所以,它对我有用。

问候,

达里约什

于 2012-12-17T12:14:25.023 回答
1

文件中这些字符串的编码可能不是 ASCII 兼容的字符编码。UTF-8 和 latin 应该兼容,因为它们对 ASCII 字符只使用一个字节。UTF-16和 UTF-32 不兼容,它们总是使用每个字符超过一个字节。UTF-16 在 MS 文件中并不少见,有时文件甚至是混合的。

转储可能使用 2 个字节,即使是 ASCII 字符也是如此。然后你会r~e~s~e~r~v~e~d~i~p在文件中有~一些其他字节(也可以是~r~~至仍然编码为r.

Just a wild guess, since you are not allowed to post the actual file and I don't know anything about MS DHCP server dumps.

What does

file file.txt

give you?

What about

file --mime-type --mime-encoding

That won't necessarily tell you the encoding if it is a "mixed" binary/strings file, but if it is plain UTF/ASCII text, it should tell you.

于 2012-12-17T12:42:15.767 回答