1

为了给您一个想法,我正在尝试使用此信息完成抓取任何字符串。

IP Address for: John Doe on 05/20/13

我基本上需要找到该格式的所有字符串..

date '+%m/%d/%y'用来获取今天的日期。

基本上我需要:

"'IP Address for: '[A-Za-z]'on 'date ''+%m/%d/%y''"

编辑:

示例字符串

IP Address for: John Doe on 05/20/13
another random string
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13
random string
IP Address for: Mr. Beans on 05/14/13
IP Address for: Steve Jobs on 05/03/13
IP Address for: Bill Gates on 05/19/13

我需要返回的是这个。它符合以下条件:“+”上的“+”“IP 地址date

IP Address for: John Doe on 05/20/13
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13
4

4 回答 4

1

我为你写了一个很好的方法。

import re

s = '''
IP Address for: John Doe on 05/20/13
another random string
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13
random string
IP Address for: Mr. Beans on 05/14/13
IP Address for: Steve Jobs on 05/03/13
IP Address for: Bill Gates on 05/19/13
'''

regex = re.compile(r'IP Address for: (.+) on (\d\d/\d\d/\d\d)')

def method(data, matcher, name=None, date=None):
    '''
    Takes data and runs the matcher on it to find name and date.
    ARGS:
    data    := the data (string, or fileobject)
    matcher := the regex object to match with.
    name    := specify only specific name to find (optional)
    date    := specify only specific date to find (optional)
    '''
    if isinstance(data, str):
        content = data.split('\n')
    elif isinstance(data, file):
        content = data
    for line in content:
        line = line.strip()
        ms = matcher.match(line)
        if not ms:
            continue
        if name and ms.group(1) != name:
            continue
        if date and ms.group(2) != date:
            continue
        yield ms.groups()

使用它:

# no options
for result in method(s, regex):
    print result   

('John Doe', '05/20/13')
('Jane Doe', '05/20/13')
('John Appleseed', '05/20/13')
('Mr. Beans', '05/14/13')
('Steve Jobs', '05/03/13')
('Bill Gates', '05/19/13')

# with a name
for result in method(s, regex, name='John Doe'):
    print result

('John Doe', '05/20/13')

# with a date
for result in method(s, regex, date='05/20/13'):
    print result 

('John Doe', '05/20/13')
('Jane Doe', '05/20/13')
('John Appleseed', '05/20/13')
于 2013-05-22T15:02:11.817 回答
1

对于 AppleScript 标签:

set myText to "Starting Text
IP Address for: Mr. Beans on 05/14/13
Leading Text IP Address for: Steve Jobs on 05/03/13 Trailing Text
Middle Text
IP Address for: Bill Gates on 05/19/13
Ending Text
"

set variableName to do shell script "grep -Eo 'IP Address for:.*on ([[:digit:]]{2}/){2}[[:digit:]]{2}' <<< " & quoted form of myText
于 2013-05-22T16:13:24.543 回答
0

如果格式始终被锁定,您可以在名称上进行更广泛的搜索。如果您不关心验证,您也可以非常笼统地进行日期匹配。

当我们编写正则表达式时,我们从不包含字符串引号,除非我们将它与代码示例一起显示。

匹配您的字符串的示例,

IP Address for: John Doe on 05/20/13

可能是以下正则表达式:

1. 
IP Address for: .+ on (\d\d/\d\d/\d\d)

这将为您提供第 1 组中的日期,但它将允许使用任何字符作为名称,并允许使用任何数字作为日期。如果您希望限制允许使用的字符,可以通过将其替换为字符组来实现,就像您在示例中所做的那样:

[A-Za-z]+

该字符组的问题是您无法匹配空格,并且它不适用于John Doe. 为了匹配名称之间的空格,您需要将其包含在字符组中

2.
[A-Za-z\s]+

或匹配多个单词。

3.
([A-Za-z]+\s?)+

后一种在这里的优点是它不会识别没有名称的情况,或者名称不包含任何 az 字符。

几个例子:

IP Address for: .$%1 on 05/20/13       matches 1.
IP Address for:   on 05/20/13          matches 1. and 2.
IP Address for: John Doe on 05/20/13   matches 1., 2. and 3.

因此,根据输入的外观,您可能希望避免.*在其中使用正则表达式。人们一直在使用它们,它通常工作得很好,但我尽量不要使用点,除非我找不到其他方法。

于 2013-05-22T15:17:49.050 回答
0

鉴于您提到date,我假设您只想要与今天日期匹配的行,无论您进行检查的日期是什么。

$ grep "IP Address for: .* on $(date +'%m/%d/%Y')" file.txt
于 2013-05-22T15:30:17.133 回答