bash - 用于更新 postgres 数据库的 bash 脚本

Question

我现在有一些 html 数据存储在文本文件中。我最近决定将 HTML 数据存储在 pgsql 数据库中而不是平面文件中。现在，“条目”表包含指向文件的“路径”列。我添加了一个“内容”列，现在应该将数据存储在“路径”指向的文件中。完成后，“路径”列将被删除。我遇到的问题是这些文件包含使我的脚本无法正常工作的撇号。我能做些什么来纠正这个问题？

这是脚本

#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"

for f in $fileroot
do
psql $dbname $username -c "
  UPDATE entries
  SET content='`cat $f`'
  WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done

注意：中的逻辑id=SELECT...FROM...WHERE path LIKE ""不是问题。我已经在 pgsql 环境中使用示例文件名对此进行了测试。

问题是当 I 时， Edit: the contents ofcat $f $f中的任何撇号都会关闭 SQL 字符串，并且出现语法错误。

score 2 · Accepted Answer

For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:

`sed "s/'/''/g" < "$f"`

to include the file contents instead of the cat, and for the second invocation in the LIKE where you appeared to intend to use the file name use:

${f/"'"/"''"/}

to include the literal string content of $f instead of executing it, and double the quotes. The ${varname/match/replace} expression is bash syntax and may not work in all shells; use:

`echo "$f" | sed "s/'/''/g"`

if you need to worry about other shells.

There are a bunch of other problems in that SQL.

You're trying to execute $f in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string.
Your subquery is also wrong, it lacks parentheses; (SELECT ...) not just SELECT.
Your LIKE expression is also probably not doing what you intended; you probably meant % instead of *, since % is the SQL wildcard.

If I also change backticks to $() (because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql's stdin, the result is:

psql $dbname $username <<__END__
  UPDATE entries
  SET content=$(sed "s/'/''/g" < "$f")
  WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__

The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on. If you aren't, change the regexp to escape apostrophes with \ instead of doubling them, and prefix the string with E, so O'Brien becomes E'O\'Brien'. In modern PostgreSQL it'd instead become 'O''Brien'.

In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.

For example, I'd write this as follows:

import os
import sys
import psycopg2

try:
        connstr = sys.argv[1]
        filename = sys.argv[2]
except IndexError as ex:
        print("Usage: %s connect_string filename" % sys.argv[0])
        print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
        sys.exit(1)


def load_file(connstr,filename):
        conn = psycopg2.connect(connstr)
        curs = conn.cursor()
        curs.execute("""
        UPDATE entries
        SET content = %s
        WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
        """, (filename, open(filename,"rb").read()))
        curs.close()

if __name__ == '__main__':
        load_file(connstr,filename)

Note the SQL wildcard % is doubled to escape it, so it results in a single % in the final SQL. That's because Python is using % as its format-specifier so a literal % must be doubled to escape it.

You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.

score 0 · Accepted Answer

在原始帖子中，我听起来好像 $f 表示的文件名中有撇号。事实并非如此，所以一个简单echo "$f"的就可以解决我的问题。

为了更清楚起见，我的文件内容被格式化为 html 片段，通常类似于<p>Blah blah <b>blah</b>...</p>. 在尝试了 Craig 发布的解决方案后，我意识到我在一些锚标记中使用了单引号，我不想将它们更改为其他内容。只有少数文件发生了这种违规行为，所以我只是手动将它们更改为双引号。我还意识到，与其转义撇号，不如将它们转换为'以下是我最终使用的最终脚本：

dbname="myDB"
username="username"
fileroot="/path/to/files/*"

for f in $fileroot
do
psql $dbname $username << __END__
  UPDATE entries
  SET content='$(sed "s/'/\&apos;/g" < "$f")'
  WHERE id=(SELECT e.id FROM entries e WHERE path LIKE '%$(echo "$f")');
__END__
done

此处的格式着色可能使它看起来语法不正确，但我已经验证它是正确的。

bash - 用于更新 postgres 数据库的 bash 脚本

2 回答 2

Related

Reference