1

I have:

TYPO3 4.2 is installed on machine ...
Winamp is installed on machine ...
Winrar 3.20 is installed on machine ...

How can i make a regular expression for separating the software package name in a sentence. Above there is an example for a software\version, but the sentence is not always the same, and also there are times where the version is not displayed. Any hints how can the re be? I found this topic but it is just for version: Regular expression for version numbers

As i read some comments i forgot to put some stuff like:

  • Software version hasn't got a standard form, but it is dot separated

  • The name of the software is before the version

  • It could happen that i have the software name, but is there a way to find it's version in some text that hasn't got the same structure as the sentences mention above?
  • The above sentence is not standard !
4

3 回答 3

6

对于您显示的数据:

version = sentence.partition(" is installed on")[0]

不需要正则表达式,只需在“安装”之前获取所有内容。

于 2012-06-27T11:27:45.017 回答
0

请提供有关数据的更多信息(请参阅我的评论)

如果程序名称始终是一个单词:
m = re.search(r'(?P<name>\S+?) (?P<version>([\d.]+ )?)', text)

如果句子以“is”或“installed”开头:
m = re.search(r'(?P<name>(\S\s)+?)(?P<version>([\d.]+ )?)(is|installed)', text)

name = m.group('name').strip()
version = m.group('version').strip()
于 2012-06-27T12:31:26.040 回答
0

好吧,我们可以使用以下启发式方法:

  1. “已安装”是软件名称和版本结束的标记
  2. 该版本不包含空格,仅包含数字或点
  3. 版本之前的一切都是软件的名字

然后我们可以使用类似下面的东西:

(.*?) ([\d.]+ )?is installed

第一组是软件名称,第二组是版本(如果存在)。

快速 PowerShell 测试:

PS> $strings = 'TYPO3 4.2 is installed on machine ...','Winamp is installed on machine ...', 'Winrar 3.20 is installed on machine ...'
PS> $strings | %{ $null = $_ -match '(.*?) ([\d.]+ )?is installed'; "Software: " + $Matches[1] + ", version: " + $Matches[2] }
Software: TYPO3, version: 4.2
Software: Winamp, version:
Software: Winrar, version: 3.20
于 2012-06-27T11:30:00.303 回答