0

我有中文新闻提要,我想将句子分成更小的块传递给 API。

我怎么能在ios中做到这一点?我为英语设置了 50 个字符的字符长度。

目前我正在使用rangeOfString:函数来查找点、逗号和分成句子。

NSString *str  = nil, *rem = nil;

str = [final substringToIndex:MAX_CHAR_Private];
rem = [final substringFromIndex:MAX_CHAR_Private];
NSRange rng = [rem rangeOfString:@"?"];
if (rng.location == NSNotFound) {
    rng = [rem rangeOfString:@"!"];
    if (rng.location == NSNotFound) {
        rng = [rem rangeOfString:@","];
        if (rng.location == NSNotFound) {
            rng = [rem rangeOfString:@"."];
            if (rng.location == NSNotFound) {
                rng = [rem rangeOfString:@" "];
            }
        }
    }
}
if (rng.location+1 + MAX_CHAR_Private > MAXIMUM_LIMIT_Private) {
    rng = [rem rangeOfString:@" "];
}

if (rng.location == NSNotFound) {
    remaining = [[final substringFromIndex:MAX_CHAR_Private] retain];
}
else{
    //NSRange rng = [rem rangeOfString:@" "];
    str = [str stringByAppendingString:[rem substringToIndex:rng.location]];
    remaining = [[final substringFromIndex:MAX_CHAR_Private + rng.location+1] retain];
}

这不适用于中文和日文字符。

4

2 回答 2

1

检查 NSLinguisticTagger,它应该适用于中文:

来自 Apple:“NSLinguisticTagger 类用于自动分割自然语言文本并用信息标记它,例如词性。它还可以标记语言、脚本、词干形式等。”

Apple 文档NSLinguisticTagger 类参考

另请参阅NSHipster NSLinguisticTagger

另见objc.io 问题 7

于 2014-07-02T12:19:19.963 回答
0

NSString 提供了开箱即用的 NSStringEnumerationBySentences 枚举选项:

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationBySentences
                       usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop)
    {
        NSString *sentence = [substring stringByTrimmingCharactersInSet:whiteSpaceSet];
        // process sentence
    }
];
于 2014-07-02T12:20:29.143 回答