1

What's the easiest way to count the number of newlines in a string that contains newlines that conform to the cross-platform newline pattern: '\r\n?|\n'.

Say we're skipping white space, or white space plus some other characters in a buffer, but in the mean time we would like to increment the line count. I'm doing something like:

nlinePat = re.compile(r'\r\n?|\n')

wsPat = re.compile(r'[ \t\r\n]+') # skip (specific) white space chars
commaPat = re.compile(r'[ \t\r\n]*,[ \t\r\n]*') # skip comma and surrounding white space
#...

m1 = wsPat.match(buffer)
bufferPos += len(m1.group(0))

m2 = nlinePat.findall(m1.group(0))
nlineCounter += len(m2))

(For example: can the above be done using a single regex operation, I feel it's an overhead to skip newlines first then to count them)

4

2 回答 2

3

If all you want to do is count newlines, regardless of how they're represented ('\r', '\r\n', '\n') then open your file in Universal newline mode ('rU') and then every newline occurrence will appear as a '\n' character (so then you only need to count '\n' chars).

If you're trying to parse CSV, then use the CSV module that's already built into python (e.g. see: Handling extra newlines (carriage returns) in csv files parsed with Python?).

于 2013-08-19T05:52:32.187 回答
1

What you're doing is pretty good. Another way to do it is to split the buffer on nlinePat and process each line, knowing that you can add 1 to nlineCount each time you process a line. My solution means you won't be keeping track of the number of characters (because the split may split on one or two charcters, and you don't know how many whitespace characters are stripped).

I think you will have a hard time finding a way to do this "in python", you need to do more than one thing at a time (count newlines and count characters) so maybe you should parse it character by character yourself.

My example:

#!/usr/bin/env python

import re

buffer = '''
\tNow is the time\t
for all good men\r\tto come to the aid\t\r
of their party.
'''


nlinePat = re.compile(r'\r\n?|\n')

bufferPos = 0
nlineCounter = 0

bl = nlinePat.split (buffer)

for line in bl:
    print(line.strip ())
    nlineCounter += 1

print nlineCounter
于 2013-08-18T20:40:43.923 回答