在计算句子时,您要寻找的是每个句子的结束位置。但是,拆分会返回围绕这些结尾字符的句子片段的集合,结尾本身由元素之间的间隙表示。因此,句子的数量将等于间隙的数量,即拆分结果中的片段数量减一。
当然,正如Keith Hill在上面的评论中指出的那样,当您可以直接计算末端时,实际的拆分是不必要的。
foreach( $Sentence in (Get-Content test.txt) ) {
# Split at every occurrence of '.' and '?', and count the gaps.
$Split = $Sentence.Split( '.?' )
$SplitSentences += $Split.Count - 1
# Count every occurrence of '.' and '?'.
$Ends = [char[]]$Sentence -match '[.?]'
$CountedSentences += $Ends.Count
}
文件内容test.txt
:
Is this a sentence? This is a
sentence. Is this a sentence?
This is a sentence. Is this a
very long sentence that spans
multiple lines?
此外,为了澄清对Vasili 回答的评论:PowerShell-split
运算符默认将字符串解释为正则表达式,而 .NETSplit
方法仅适用于文字字符串值。
例如:
'Unclosed [bracket?' -split '[?]'
将[?]
视为正则表达式字符类并匹配?
字符,返回两个字符串'Unclosed [bracket'
和''
'Unclosed [bracket?'.Split( '[?]' )
将调用Split(char[])
重载并匹配每个[
, ?
, 和]
字符,返回三个字符串'Unclosed '
, 'bracket'
, 和''