1

我最近才开始使用 R 和软件包googleLanguageR- 太棒了!

我有超过 400 个 .flac 文件(每个大约 20 秒),我想使用 .flac 将它们转换为文本gl_speech

我写了以下循环:

for (i in files) {
  possibleError <- tryCatch({ 
    participantid = str_sub(i, 1, 3)
    patha = paste(path_in, i, sep='')
    result <- gl_speech(patha, 
                     sampleRateHertz = 44100L,
                     customConfig = my_config)
    transcript2 <- subset(result$transcript, select=-c(languageCode, channelTag))
    transcriptid <- cbind(participantid, transcript2)
    write_delim(as.data.frame(transcriptid),file.path(path_in,paste0(participantid,'_text.csv')))
  }, 
    error = function(e) e)
  
    if(inherits(possibleError, "error"))  next 
  
    }

为了完整起见, my_config 是:

my_config <- list(audioChannelCount = 2, 
                  encoding = "FLAC",
                   maxAlternatives = 30,
                  languageCode = "en-US",
                  model = "video"
)

我一次为 2、5 或 10 个文件执行此操作。其中有几个返回错误。错误信息是:

ℹ 2020-10-08 20:07:56 > Request Status Code:  408
Error : lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 408 (Request Timeout)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>408.</b> <ins>That’s an error.</ins>
  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>

但是,当我稍后单独运行返回错误的文件时,gl_speech不会返回此错误。

是否有不同的方法来循环多个文件以避免尽可能多的错误?

4

0 回答 0