正如所承诺的,这是 Python 中的解决方案。该程序适用于 Python 3.x 或 Python 2.7。如果您对编程非常陌生,我建议您使用 Python 3.x,因为我认为它更容易学习。你可以从这里免费获得 Python:http: //python.org/download/
Python的最新版本是3.2.3版;我建议你得到那个。
将 Python 代码保存在一个名为的文件中add_null.py
并使用以下命令运行它:
python add_null.py input_file.txt output_file.txt
代码,有很多评论:
# import brings in "modules" which contain extra code we can use.
# The "sys" module has useful system stuff, including the way we can get
# command-line arguments.
import sys
# sys.argv is an array of command-line arguments. We expect 3 arguments:
# the name of this program (which we don't care about), the input file
# name, and the output file name.
if len(sys.argv) != 3:
# If we didn't get the right number of arguments, print a message and exit.
print("Usage: python add_null.py <input_file> <output_file>")
sys.exit(1)
# Unpack the arguments into variables. Use '_' for any argument we don't
# care about.
_, input_file, output_file = sys.argv
# Define a function we will use later. It takes two arguments, a string
# and a width.
def s_padded(s, width):
if len(s) >= width:
# if it is already wide enough, return it unchanged
return s
# Not wide enough! Figure out how many spaces we need to pad it.
len_padding = width - len(s)
# Return string with spaces appended. Use the Python "string repetition"
# feature to repeat a single space, len_padding times.
return s + ' ' * len_padding
# These are the column numbers we will use for splitting, plus a width.
# Numbers put together like this, in parentheses and separated by commas,
# are called "tuples" in Python. These tuples are: (low, high, width)
# The low and high numbers will be used for ranges, where we do use the
# low number but we stop just before the high number. So the first pair
# will get column 0 through column 11, but will not actually get column 12.
# We use 999 to mean "the end of the line"; if the line is too short, it will
# not be an error. In Python "slicing", if the full slice can't be done, you
# just get however much can be done.
#
# If you want to cut off the end of lines that are too long, change 999 to
# the maximum length you want the line ever to have. Longer than
# that will be chopped short by the "slicing".
#
# So, this tells the program where the start and end of each column is, and
# the expected width of the column. For the last column, the width is 0,
# so if the last column is a bit short no padding will be added. If you want
# to make sure that the lines are all exactly the same length, change the
# 0 to the width you want for the last column.
columns = [ (0, 12, 12), (12, 29, 17), (29, 999, 0) ]
num_columns = len(columns)
# Open input and output files in text mode.
# Use a "with" statement, which will close the files when we are done.
with open(input_file, "rt") as in_f, open(output_file, "wt") as out_f:
# read the first line that has the field headings
line = in_f.readline()
# write that line to the output, unchanged
out_f.write(line)
# now handle each input line from input file, one at a time
for line in in_f:
# strip off only the line ending
line = line.rstrip('\n')
# start with an empty output line string, and append to it
output_line = ''
# handle each column in turn
for i in range(num_columns):
# unpack the tuple into convenient variables
low, high, width = columns[i]
# use "slicing" to get the columns we want
field = line[low:high]
# Strip removes spaces and tabs; check to see if anything is left.
if not field.strip():
# Nothing was left after spaces removed, so put "NULL".
field = "NULL"
# Append field to output_line. field is either the original
# field, unchanged, or else it is a "NULL". Either way,
# append it. Make sure it is the right width.
output_line += s_padded(field, width)
# Add a line ending to the output line.
output_line += "\n"
# Write the output line to the output file.
out_f.write(output_line)
运行此程序的输出:
field1 field2 field3
AAAAA BBBBB CCCCC
DDDDD NULL EEEEE
FFFFF NULL NULL
GGGGG HHHHH NULL