我正在编写一个 bash 脚本,用于检测 SQL 查询中的某些字符串类别(如所有大写字母、所有小写字母、所有数字字符等)。在进行分类之前,我想提取所有引用的字符串。我无法获得一个可以从查询字符串中正确提取带引号的字符串的正则表达式。例如,从 TPCH 基准中获取以下查询:
select
o_year,
sum(case
when nation = 'JAPAN' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'MEDIUM BRUSHED BRASS'
) as all_nations
group by
o_year
order by
o_year;
这是一个复杂的查询,但这不是重点。我需要能够从该文件中提取所有单引号字符串并将它们打印在自己的行上。IE:
'JAPAN'
'ASIA'
'1995-01-01'
'1996-12-31'
'MEDIUM BRUSHED BRASS'
现在,(因为我对正则表达式不太熟悉)我所拥有的是:
printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'*'"
但这不支持带空格的字符串,并且当多个字符串在文件的同一行时不起作用。理想情况下,我可以让它在我的 bash 脚本中工作,所以最好的解决方案是 grep/sed/perl。我做了一些谷歌搜索,并找到了类似问题的解决方案,但我无法让他们特别为此工作。
有什么想法可以实现这一目标吗?谢谢。