我正在计算给定音节组合字符串列表的可能单词数。音节组合列表如下所示:
syllable_combinations <- c("C", "CC", "CCCV-CCV", "CCCV-CCV-CV", "CCCV-CV-CCV", "CCCV-CCV-CCV-CV", "CCCV-CC-CV", "CCCV-CCV-C", "CCCV-CV", "CV-C-CCCV")
在这个列表的基础上,我想计算给定音位规则的英语中可能单词的数量。为此,我需要遍历音节组合列表中的各个项目,并计算给定音节音节组合的可能单词数。
要为给定的音节组合生成可能的单词数,我需要遍历音节组合并根据其环境依次查看每个字符。例如,对于第一个音节组合,我需要执行以下操作:
- 确定该单词以单个辅音 C 开头(而不是 2 或 3 个辅音);
- 确定第一个单辅音后面跟着元音 V;
- 确定该单词以下一个音节继续(由连字符表示);
- 确定第二个音节也以单个辅音 C 开头;
- 并以另一个元音 V 结尾。
此信息需要与可能出现在这些位置的声音信息相关联:
number_of_vowels <- 20
number_of_initial_consonants_length_1 <- 22
number_of_initial_consonants_length_2 <- 47
number_of_final_consonants_length_1 <- 24
为了计算英语中具有“CVCV”音节结构的可能单词的数量:
number_of_CVCV_words <- number_of_initial_consonants_length_1*number_of_vowels*number_of_initial_consonants_length_1*number_of_vowels
number_of_CVCV_words
193600
关于如何做到这一点的任何建议?
我对此有所了解,但遇到了一些问题。
首先,将音节组合拆分为单独的音节:
split_syllables <- c()
for(i in 1:length(syllable_combinations)){
strsplit(as.character(syllable_combinations[i]), split = "-") -> split_syllable
split_syllables <- append(split_syllables, split_syllable)
}
然后,一个可以匹配每个音节的函数(唯一的音节数量有限,所以这是可行的)(在给定特定音节结构的情况下,counter1 变量给出了英语中可能的声音组合的数量):
detect_syllables <- function(syllable){
if(syllable == "C") {
counter1 <- 25
} else if(syllable == "CC") {
counter1 <- 528
} else if(syllable == "CCCV") {
counter1 <- 200
} else if(syllable == "CCV") {
counter1 <- 940
} else if(syllable == "CV") {
counter1 <- 440
} else if(syllable == "CVC") {
counter1 <- 10560
} else
print(syllable, "syllable not matched")
}
然后,对原始音节组合中的每个音节执行 detect_syllables 函数的函数:
one_syllable <- function(first_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
first_syl -> number1
print(number1)
}
two_syllables <- function(first_syllable, second_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2)
}
three_syllables <- function(first_syllable, second_syllable, third_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
}
four_syllables <- function(first_syllable, second_syllable, third_syllable, fourth_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
lapply(split_syllables[[i]][4], FUN = detect_syllables)
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
}
还有一个 for 循环,以确保正确使用 detect_syllables 函数:
for(i in 1:10){
if(length(split_syllables[[i]]) == 1) {
lapply(split_syllables[[i]][1], FUN = one_syllable)
} else if(length(split_syllables[[i]]) == 2) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], FUN = two_syllables)
} else if(length(split_syllables[[i]]) == 3) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], FUN = three_syllables)
} else if(length(split_syllables[[i]]) == 4) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], split_syllables[[i]][4], FUN = four_syllables)
} else
print("number of syllables is bigger than 4")
}
但是,当我尝试使用 for 循环时,我收到以下错误消息:
Error in four_syllables(split_syllables[[1]]) : object 'counter1' not found
我意识到这与评估“counter1”的环境有关,如此处所述: Using get inside lapply, inside a function,但我不知道如何解决它。如果我尝试将它们指向正确的环境(FUN 中的错误(“C”[[1L]],...):未使用的参数),那么两个 lapply 似乎都不喜欢它。
如果不使用 lapply(),则可能会非常不雅地获得所需的结果。如果有人有其他解决方案,我很乐意了解它。
for(i in 1:10){
if(length(split_syllables[[i]]) == 1) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
first_syl -> number1
print(number1)
} else if(length(split_syllables[[i]]) == 2) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2)
} else if(length(split_syllables[[i]]) == 3) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
} else if(length(split_syllables[[i]]) == 4) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
detect_syllables(split_syllables[[i]][4]) -> counter1
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
} else
print("number of syllables is bigger than 4")
}