python - 用python解析代码

Question

我对python真的很陌生。所以请原谅我说的任何愚蠢的话

我有一个脚本，它逐行遍历 aa fortran 模块，然后在每一行上执行一个 .split() 并将其保存到一个数组中。但是 .split() 不包括换行符，无论如何我可以让它做到这一点。

另外，我的代码在 fortran 模块中的编写方式，大部分是

integer x, & !comment
        y, & !comment 
        z    !comment

我不想包含任何评论文本。我只想要模块中的变量列表。无论如何使用 .split() 或正则表达式来实现这一点。也许只将子字符串带到 & 后跟逗号？

score 1 · Accepted Answer

这样做的方法可能是使用缓冲区。

>>> s = """Some code with\n newlines and other stuff\n"""
>>> from StringIO import StringIO
>>> buffer = StringIO(s)
>>> list(buffer)
['Some code with\n', ' newlines and other stuff\n']
>>>

注意：在 Python 3.x 中，替换from StringIO import StringIO为from io import StringIO.

然而...

我猜您正在使用 Pythonfile对象从单独的文件中读取 FORTRAN 代码。file对象已经表现得像缓冲区。假设文件whatever.f95包含文本Some code with\n newlines and other stuff\n。然后你可以简单地做：

with open('whatever.f95') as f:
    print list(f)

哪个会打印

['Some code with\n', ' newlines and other stuff\n']

score 1 · Accepted Answer

由于您使用的是“！” 要开始评论，我假设您使用的是 Fortran 90 或更高版本。

您可以使用正则表达式来查找变量声明。

这是一个查找integer变量的简单示例：

In [1]: import re

In [2]: integer_re = re.compile('[ ]*integer[^:]*::\s+(.+)')

In [3]: progtext = '''  program average

  ! Read in some numbers and take the average
  ! As written, if there are no data points, an average of zero is returned
  ! While this may not be desired behavior, it keeps this example simple

  implicit none

  real, dimension(:), allocatable :: points
  integer                         :: number_of_points
  real                            :: average_points=0., positive_average=0., negative_average=0.

  write (*,*) "Input number of points to average:"
  read  (*,*) number_of_points

  allocate (points(number_of_points))

  write (*,*) "Enter the points to average:"
  read  (*,*) points

  ! Take the average by summing points and dividing by number_of_points
  if (number_of_points > 0) average_points = sum(points) / number_of_points

  ! Now form average over positive and negative points only
  if (count(points > 0.) > 0) then
     positive_average = sum(points, points > 0.) / count(points > 0.)
  end if

  if (count(points < 0.) > 0) then
     negative_average = sum(points, points < 0.) / count(points < 0.)
  end if

  deallocate (points)

  ! Print result to terminal
  write (*,'(a,g12.4)') 'Average = ', average_points
  write (*,'(a,g12.4)') 'Average of positive points = ', positive_average
  write (*,'(a,g12.4)') 'Average of negative points = ', negative_average

  end program average'''

In [4]: integer_re = re.compile('[ ]*integer[^:]*::\s+(.+)')

In [5]: integer_re.findall(progtext)
Out[5]: ['number_of_points']

其他类型也可以这样做，例如 real：

In [6]: real_re = re.compile('[ ]*real[^:]*::\s+(.*)')

In [7]: real_re.findall(progtext)
Out[7]: ['average_points=0., positive_average=0., negative_average=0.']

您可以优化正则表达式以删除初始化程序并仅获取变量名称。但是拆分可能更容易。

In [8]: real_re.findall(progtext)[0].split()
Out[8]: ['average_points=0.,', 'positive_average=0.,', 'negative_average=0.']

或者您可以使用另一个正则表达式：

In [9]: re.findall('([a-z_]+)', real_re.findall(progtext)[0])
Out[9]: ['average_points', 'positive_average', 'negative_average']

score 0 · Accepted Answer

首先从 fortran 脚本中获取所有行的数组：

with open(fortran_script) as f:
    script = [i.strip() for i in f]

这将为您提供所需的数组，每行（去掉'\n'）作为一个单独的元素。

然后，删除评论：

for i, line in enumerate(script):
        script[i] = line[:line.find('!')] if '!' in line else line

这将执行以下操作： --> 遍历每一行，并测试它是否包含注释；--> 如果有注释，则截断该行以仅包含前面的命令

--- 编辑有人指出（见下面的评论）这不允许出现'！' 字符串内。为此，我们需要单独解析每一行并维护我们所处的“状态”的内存（即is_literal）：

output = []
def parse_fortran(script, output):
    for line in script:

        # flag to maintain state is_literal
        is_literal = False

        line_out = ''
        for c in line:

            # enter is_literal state if ' or " found
            if c == '"' or "'":
                is_literal = not is_literal

            # break to next line as soon as comment is reached
            elif c == '!' and not is_literal: 
                break

            # otherwise, add the statement to the output
            line_out += c
        output.append(line_out)

希望这可以帮助

python - 用python解析代码

3 回答 3

然而...

Related

Reference