30

我需要解析文件名的前 10 个字符以查看它们是否都是数字。执行此操作的明显方法是 fileName =~ m/^\d{10}/ 但我在 applescript 参考中没有看到任何 regExy,所以,我很好奇我还有哪些其他选项可以进行此验证。

4

7 回答 7

25

不要绝望,因为 OSX 你也可以通过“do shell script”访问sed和 grep。所以:

set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"

我的 sed 技能并不太火爆,因此可能有比将 *good* 附加到任何匹配 [0-9]{10} 的名称然后在结果开头查找 *good* 更优雅的方法。但基本上,如果文件名是“1234567890dfoo.mov”,这将运行命令:

echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"

注意 Applescript 中转义的引号 \" 和转义的反斜杠 \\。如果要在 shell 中转义内容,则必须转义转义。因此,要运行其中包含反斜杠的 shell 脚本,您必须将其转义为shell 像 \\ 然后转义 applescript 中的每个反斜杠 \\\\。这可能很难阅读。

因此,您可以在命令行上执行的任何操作都可以通过从 applescript 调用它来执行(哇哦!)。标准输出上的任何结果都会作为结果返回到脚本。

于 2009-07-17T10:58:35.623 回答
19

有一种更简单的方法可以使用 shell(适用于 bash 3.2+)进行正则表达式匹配:

set isMatch to "0" = (do shell script ¬
  "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")

笔记:

  • 使用[[ ... ]]带有正则表达式匹配运算符的现代 bash 测试表达式=~在 bash 3.2+ 上必须不引用正确的操作数(或至少是特殊的正则表达式字符。),除非您预先添加shopt -s compat31;
  • do shell script语句执行测试并通过附加命令返回其退出命令(感谢@LauriRanta);"0"表示成功。
  • 请注意,=~运算符不支持快捷字符类\d和断言,例如\b(从 OS X 10.9.4 开始为真 - 这不太可能很快改变)。
  • 对于不区分大小写的匹配,在命令字符串前面加上shopt -s nocasematch;
  • 对于locale-awareness,在命令字符串前面加上export LANG='" & user locale of (system info) & ".UTF-8';.
  • 如果正则表达式包含捕获组,您可以通过内置${BASH_REMATCH[@]}数组变量访问捕获的字符串。
  • 正如在接受的答案中一样,您必须\-escape 双引号和反斜杠。

这是使用的替代方法egrep

set isMatch to "0" = (do shell script ¬
  "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")

尽管这可能表现更差,但它有两个优点:

  • 您可以使用快捷字符类,例如\d和断言,例如\b
  • egrep您可以通过调用更轻松地使匹配不区分大小写-i
  • 但是,您不能通过捕获组访问子匹配项;[[ ... =~ ... ]]如果需要,请使用该方法。

最后,这里是打包这两种方法的实用函数(语法高亮已关闭,但它们确实有效):

# SYNOPIS
#   doesMatch(text, regexString) -> Boolean
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
#   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
#   there is a match or not.
#    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
#      a 'considering case' block.
#    - The current user's locale is respected.
# EXAMPLE
#    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
    local ignoreCase, extraGrepOption
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraGrepOption to "i"
    else
        set extraGrepOption to ""
    end if
    # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
    #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
    tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch

# SYNOPSIS
#   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language and
#   *returns the matching string and substrings matching capture groups, if any.*
#   
#   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
#     a 'considering case' block.
#   - The current user's locale is respected.
#   
#   IMPORTANT: 
#   
#   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
#   Instead, use one of the following POSIX classes (see `man re_format`):
#       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
#       [[:alnum:]] [[:digit:]] [[:xdigit:]]
#       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
#       [[:graph:]]  [[:print:]] 
#   
#   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#   
#   Always returns a *list*:
#    - an empty list, if no match is found
#    - otherwise, the first list element contains the matching string
#       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
#  EXAMPLE
#       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
    local ignoreCase, extraCommand
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraCommand to "shopt -s nocasematch; "
    else
        set extraCommand to ""
    end if
    # Note: 
    #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
    #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
    #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
    tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
    return paragraphs of result
end getMatch
于 2012-09-06T02:30:48.437 回答
11

我最近需要在脚本中使用正则表达式,并想找到一个脚本添加来处理它,这样更容易阅读正在发生的事情。我找到了 Satimage.osax,它可以让你使用如下语法:

find text "n(.*)" in "to be or not to be" with regexp

唯一的缺点是(截至 2010 年 8 月 11 日)它是 32 位加法,因此当从 64 位进程调用它时会引发错误。这让我陷入了Snow Leopard 的 Mail 规则,因为我必须在 32 位模式下运行 Mail。不过,从一个独立的脚本调用,我没有任何保留 - 它真的很棒,可以让你选择你想要的任何正则表达式语法,并使用back-references

2011 年 5 月 28 日更新

感谢 Mitchell Model 在下面的评论中指出他们已将其更新为 64 位,因此不再需要保留 - 它可以满足我的一切需求。

于 2010-11-08T14:09:30.163 回答
4

我确信有一个 Applescript Addition 或一个 shell 脚本可以被调用来将正则表达式引入折叠,但我避免依赖于简单的东西。我一直使用这种样式模式...

set filename to "1234567890abcdefghijkl"

return isPrefixGood(filename)

on isPrefixGood(filename) --returns boolean
    set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}

    set thePrefix to (characters 1 thru 10) of filename as text

    set badPrefix to false

    repeat with thisChr from 1 to (get count of characters in thePrefix)
        set theChr to character thisChr of thePrefix
        if theChr is not in legalCharacters then
            set badPrefix to true
        end if
    end repeat

    if badPrefix is true then
        return "bad prefix"
    end if

    return "good prefix"
end isPrefixGood
于 2009-06-16T12:27:30.930 回答
3

这是检查任何字符串的前十个字符是否为数字的另一种方法。

    on checkFilename(thisName)
        set {n, isOk} to {length of fileName, true}
        try
            repeat with i from 1 to 10
                set isOk to (isOk and ((character i of thisName) is in "0123456789"))
            end repeat
            return isOk
        on error
            return false
        end try
    end checkFilename
于 2014-02-01T07:27:45.993 回答
2

我可以使用以下命令直接从 AppleScript(在 High Sierra 上)调用 JavaScript。

# Returns a list of strings from _subject that match _regex
# _regex in the format of /<value>/<flags>
on match(_subject, _regex)
    set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
    set _result to run script _js in "JavaScript"
    if _result is null or _result is missing value then
        return {}
    end if
    return _result
end match

match("file-name.applescript", "/^\\d+/g") #=> {}
match("1234_file.js", "/^\\d+/g") #=> {"1234"}
match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}

似乎大多数JavaScript 字符串方法都按预期工作。我没有找到适用于 macOS 自动化的 JavaScript 兼容哪个版本的 ECMAScript 的参考,因此请在使用前进行测试。

于 2019-03-01T22:26:29.420 回答
1

我有一个替代方案,直到我为 Thompson NFA 算法实现了字符类,我才在 AppleScript 中完成了基本的工作。如果有人有兴趣寻找使用 Applescript 解析非常基本的正则表达式,那么代码将发布在 MacScripters 的 CodeExchange 中,请查看!

这是确定文本/字符串的前十个字符是否存在的解决方案:

 set mstr to "1234567889Abcdefg"
set isnum to prefixIsOnlyDigits for mstr
to prefixIsOnlyDigits for aText
    set aProbe to text 1 thru 10 of aText
    set isnum to false
    if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
        try
            set aNumber to aProbe as number
            set isnum to true
        end try
    end if
    return isnum
end prefixIsOnlyDigits
于 2013-07-07T16:37:21.037 回答