1

在为我的项目 ( ) 创建一个 TAGS 文件后,find . -name "*.py" | xargs etags我可以使用它M-.来跳转到函数的定义。那太棒了。但是如果我想要一个全局常量的定义——比如说x = 3——Emacs 不知道在哪里可以找到它。

有什么方法可以向 Emacs 解释其中定义了常量,而不仅仅是函数?对于函数(或 for 循环或诸如此类)中定义的任何内容,我不需要它,只是全局的。

更多详情

该问题的先前版本使用“顶级”而不是“全局”,但在@Thomas 的帮助下,我意识到这是不精确的。我所说的全局定义是指模块定义的任何东西。因此在

import m

if m.foo:
  def f():
    x = 3
    return x
  y, z = 1, 2
else:
  def f():
    x = 4
    return x
  y, z = 2, 3
del(z)

模块定义的东西是fy,尽管这些定义的位置向右缩进。x是一个局部变量,z在模块结束前删除 's 的定义。

相信捕获所有全局分配的充分规则是在def表达式中简单地忽略它们(注意def关键字本身可能在任何级别缩进),否则解析左侧的任何符号=(注意可能有超过一,因为 Python 支持元组赋值)。

4

2 回答 2

2

Etags 似乎无法为 Python 文件生成此类信息,您可以通过在一个简单的测试文件上运行它来轻松验证:

x = 3

def fun():
    pass

运行etags test.py会生成一个包含以下内容的 TAGS 文件:

/tmp/test.py,13
def fun(3,7

如您所见,x这个文件中完全没有,所以 Emacs 没有机会找到它。

调用etags' 手册页告诉我们有一个选项--globals

   --globals
          Create tag entries for global variables in  Perl  and  Makefile.
          This is the default in C and derived languages.

但是,这似乎是文档与实现不同步的可悲案例之一,因为此选项似乎不存在。(etags -h也没有列出它,只是--no-globals- 可能是因为--globals它是默认值,如上所述。)

但是,即使--globals是默认设置,文档片段也表示它仅适用于 Perl、Makesfiles、C 和派生语言。我们可以通过创建另一个简单的测试文件来检查是否是这种情况,这次是针对 C:

int x = 3;

void fun() {
}

事实上,运行etags test.c会产生以下 TAGS 文件:

/tmp/test.c,26
int x 1,0
void fun(3,12

您会看到C 的正确标识。因此,Pythonx似乎根本不支持全局变量。etags

但是,由于 Python 使用了空格,因此在源文件中识别全局变量定义并不难——您基本上可以grep针对所有不以空格开头但包含=符号的行(当然,也有例外)。

因此,我编写了以下脚本来执行此操作,您可以将其用作 的替代品etags,因为它在内部调用etags

#!/bin/bash

# make sure that some input files are provided, or else there's
# nothing to parse
if [ $# -eq 0 ]; then
    # the following message is just a copy of etags' error message
    echo "$(basename ${0}): no input files specified."
    echo "  Try '$(basename ${0}) --help' for a complete list of options."
    exit 1
fi

# extract all non-flag parameters as the actual filenames to consider
TAGS2="TAGS2"
argflags=($(etags -h | grep '^-' | sed 's/,.*$//' | grep ' ' | awk '{print $1}'))
files=()
skip=0 
for arg in "${@}"; do
    # the variable 'skip' signals arguments that should not be
    # considered as filenames, even though they don't start with a
    # hyphen
    if [ ${skip} -eq 0 ]; then
        # arguments that start with a hyphen are considered flags and
        # thus not added to the 'files' array
        if [ "${arg:0:1}" = '-' ]; then
            if [ "${arg:0:9}" = "--output=" ]; then
                TAGS2="${arg:9}2"
            else
                # however, since some flags take a parameter, we also
                # check whether we should skip the next command line
                # argument: the arguments for which this is the case are
                # contained in 'argflags'
                for argflag in ${argflags[@]}; do
                    if [ "${argflag}" = "${arg}" ]; then
                        # we need to skip the next 'arg', but in case the
                        # current flag is '-o' we should still look at the
                        # next 'arg' so as to update the path to the
                        # output file of our own parsing below
                        if [ "${arg}" = "-o" ]; then
                            # the next 'arg' will be etags' output file
                            skip=2                  
                        else
                            skip=1
                        fi
                        break
                    fi
                done
            fi
        else
            files+=("${arg}")
        fi
    else
        # the current 'arg' is not an input file, but it may be the
        # path to the etags output file
        if [ "${skip}" = 2 ]; then
            TAGS2="${arg}2"
        fi
        skip=0
    fi
done

# create a separate TAGS file specifically for global variables
for file in "${files[@]}"; do
    # find all lines that are not indented, are not comments or
    # decorators, and contain a '=' character, then turn them into
    # TAGS format, except that the filename is prepended
    grep -P -Hbn '^[^[# \t].*=' "${file}" | sed -E 's/([0-9]+):([0-9]+):([^= \t]+)\s*=.*$/\3\x7f\1,\2/'
done |\

# count the bytes of each entry - this is needed for the TAGS
# specification
while read line; do
    echo "$(echo $line | sed 's/^.*://' | wc -c):$line"
done |\

# turn the information above into the correct TAGS file format
awk -F: '
    BEGIN { filename=""; numlines=0 }
    { 
        if (filename != $2) {
            if (numlines > 0) {
                print "\x0c\n" filename "," bytes+1

                for (i in lines) {
                    print lines[i]
                    delete lines[i]
                }
            }

            filename=$2
            numlines=0
            bytes=0
        }

        lines[numlines++] = $3;
        bytes += $1;
    }
    END {
        if (numlines > 0) {
            print "\x0c\n" filename "," bytes+1

            for (i in lines)
                print lines[i]
        }
    }' > "${TAGS2}"

# now run the actual etags, instructing it to include the global
# variables information
if ! etags -i "${TAGS2}" "${@}"; then
    # if etags failed to create the TAGS file, also delete the TAGS2
    # file
    /bin/rm -f "${TAGS2}"
fi

$PATH使用方便的名称(我建议 sth. like )将此脚本存储在您的身上etags+,然后像这样调用它:

find . -name "*.py" | xargs etags+

除了创建一个 TAGS 文件外,该脚本还为所有全局变量定义创建一个 TAGS2 文件,并在原始 TAGS 文件中添加一行来引用后者。

从 Emacs 的角度来看,用法没有区别。

于 2021-03-23T08:55:50.403 回答
0

另一个答案只考虑没有缩进的行包含全局变量声明。虽然这有效地排除了函数和类定义的主体,但它错过了在if声明中定义的全局变量。这样的声明并不少见,例如,对于因使用的操作系统而异的常量等。

正如在问题下的评论中所争论的那样,任何静态分析都必然是不完美的,因为 Python 的动态特性使得无法完全准确地决定哪些变量是全局定义的,除非程序实际执行。

因此,以下也只是一个近似值。但是,它确实考虑if了上面列出的 s 内部的全局变量定义。由于这最好通过实际分析源文件的解析树来完成,因此 bash 脚本不再是合适的选择。ast不过,方便的是,Python 本身允许通过这里使用的包轻松访问解析树。

from argparse import ArgumentParser, SUPPRESS
import ast
from collections import Counter
from re import match as re_startswith
import os
import subprocess
import sys

# extract variable information from assign statements
def process_assign(target, results):
    if isinstance(target, ast.Name):
        results.append((target.lineno, target.col_offset, target.id))
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_assign(child, results)

# extract variable information from delete statements
def process_delete(target, results):
    if isinstance(target, ast.Name):
        results[:] = filter(lambda t: t[2] != target.id, results)
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_delete(child, results)

# recursively walk the parse tree of the source file
def process_node(node, results):
    if isinstance(node, ast.Assign):
        for target in node.targets:
            process_assign(target, results)
    elif isinstance(node, ast.Delete):
        for target in node.targets:
            process_delete(target, results)
    elif type(node) not in [ast.FunctionDef, ast.ClassDef]:
        for child in ast.iter_child_nodes(node):
            process_node(child, results)

def get_arg_parser():
    # create the parser to configure
    parser = ArgumentParser(usage=SUPPRESS, add_help=False)

    # run etags to find out about the supported command line parameters
    dashlines = list(filter(lambda line: re_startswith('\\s*-', line),
                            subprocess.check_output(['etags', '-h'],
                                                    encoding='utf-8').split('\n')))

    # ignore lines that start with a dash but don't have the right
    # indentation
    most_common_indent = max([(v,k) for k,v in
                              Counter([line.index('-') for line in dashlines]).items()])[1]
    arglines = filter(lambda line: line.index('-') == most_common_indent, dashlines)

    for argline in arglines:
        # the various 'argline' entries contain the command line
        # arguments for etags, sometimes more than one separated by
        # commas.
        for arg in argline.split(','):
            if 'or' in arg:
                arg = arg[:arg.index('or')]
            if ' ' in arg or '=' in arg:
                arg = arg[:min(arg.index(' ') if ' ' in arg else len(arg),
                               arg.index('=') if '=' in arg else len(arg))]
                action='store'
            else:
                action='store_true'
            arg = arg.strip()
            if arg and not (arg == '-h' or arg == '--help'):
                parser.add_argument(arg, action=action)

    # we know we need files to run on
    parser.add_argument('files', nargs='*', metavar='file')

    # the parser is configured now to accept all of etags' arguments
    return parser


if __name__ == '__main__':
    # construct a parser for the command line arguments, unless
    # -h/-help/--help is given in which case we just print the help
    # screen
    etags_args = sys.argv[1:]
    if '-h' in etags_args or '-help' in etags_args or '--help' in etags_args:
        unknown_args = True
    else:
        argparser = get_arg_parser()
        known_ns, unknown_args = argparser.parse_known_args()

    # if something's wrong with the command line arguments, print
    # etags' help screen and exit
    if unknown_args:
        subprocess.run(['etags', '-h'], encoding='utf-8')
        sys.exit(1)

    # we base the output filename on the TAGS file name.  Other than
    # that, we only care about the actual filenames to parse, and all
    # other command line arguments are simply passed to etags later on
    tags_file = 'TAGS2' if hasattr(known_ns, 'o') is None else known_ns.o + '2'
    filenames = known_ns.files

    if filenames:
        # TAGS file sections, one per source file
        sections = []

        # process all files to populate the 'sections' list
        for filename in filenames:
            # read source file
            offsets = [0]; lines = []
            offsets, lines = [0], []
            with open(filename, 'r') as f:
                for line in f.readlines():
                    offsets.append(offsets[-1] + len(bytes(line, 'utf-8')))
                    lines.append(line)

            offsets = offsets[:-1]

            # parse source file
            source = ''.join(lines)
            root_node = ast.parse(source, filename)

            # extract global variable definitions
            vardefs = []
            process_node(root_node, vardefs)

            # create TAGS file section
            sections.append("")
            for lineno, column, varname in vardefs:
                line = lines[lineno-1]
                offset = offsets[lineno-1]
                end = line.index('=') if '=' in line else -1
                sections[-1] += f"{line[:end]}\x7f{varname}\x01{lineno},{offset + column - 1}\n"

        # write TAGS file
        with open(tags_file, 'w') as f:
            for filename, section in zip(filenames, sections):
                if section:
                    f.write("\x0c\n")
                    f.write(filename)
                    f.write(",")
                    f.write(str(len(bytes(section, 'utf-8'))))
                    f.write("\n")
                    f.write(section)
                    f.write("\n")

        # make sure etags includes the newly created file
        etags_args += ['-i', tags_file]

    # now run the actual etags to take care of all other definitions
    try:
        cp = subprocess.run(['etags'] + etags_args, encoding='utf-8')
        status = cp.returncode
    except:
        status = 1

    # if etags did not finish successfully, remove the tags_file
    if status != 0:
        try:
            os.remove(tags_file)
        except FileNotFoundError:
            # nothing to be removed
            pass

与另一个答案一样,此脚本旨在替代标准etags,因为它在内部调用后者。因此它也接受所有etags' 命令行参数(但目前不尊重-a)。

建议使用别名修改一个 shell 的 init 文件,例如将以下行添加到~/.bashrc

alias etags+=python3 -u /path/to/script.py

where/path/to/script.py是保存上述代码的文件的路径。有了这样的别名,您可以简单地调用

etags+ /path/to/file

等等

于 2021-04-13T13:56:06.853 回答