想象一下我的页面有一堆看起来像这样的部分(示例页面):
<div class="content">
</div>
我的目标是将整个页面抓取到 MySQL 数据库条目中。我目前这样做:
//Declare SQL statement
String sql = "INSERT into rns " +
"(rns_pub_date, rns_headline, rns_link, rns_fulltext, constituent_id) values (\""+
rns.getRnsPubDate() + "\",\"" +
rns.getRnsHeadline() + "\",\"" +
rns.getRnsLink() + "\",\"" +
rns.getRnsFullText() + "\",\"" +
"(select constituent_id from constituent where constituent_name = " + rns.getRnsConstituentName() + "\")";
//SQL Statement Debug
Log.d(CLASS_NAME, "createRns. sqlStatement: " + sql);
//Initialize insertValues
insertValues = connect.prepareStatement(sql);
但是,这会失败,因为页面中有多个 " 标记。
我可以看到几个选项:
- 像这样转义字符:' \" '
- 将字符替换为:' " '
- 删除所有不相关的数据(HTML)并仅将相关数据保存到数据库
我意识到在防止 SQL 注入方面也有最佳实践。然而,这是一个独立的系统,所以目前不是问题。话虽如此,如果有任何答案可以解释如何防止这种情况发生,我宁愿实施它。
编辑 1: 继@chrylis 评论之后。这就是我所拥有的:
//Insert values into variables
String rns_pub_date = rns.getRnsPubDate();
String rns_headline = rns.getRnsHeadline();
String rns_link = rns.getRnsLink();
String rns_fulltext = rns.getRnsFullText();
String rns_constituent_name = rns.getRnsConstituentName();
//Prepare the SQL string
String sql = "INSERT into rns (rns_pub_date, rns_headline, rns_link, rns_fulltext,constituent_id) VALUES" + "(?,?,?,?,(select constituent_id from constituent where constituent_name = \"" + rns.getRnsConstituentName() + "\")";
//Prepare the statement
PreparedStatement prest = connect.prepareStatement(sql);
prest.setString(1, rns_pub_date);
prest.setString(2, rns_headline);
prest.setString(3, rns_link);
prest.setString(4, rns_fulltext);
prest.setString(5, rns_constituent_name);
但是它提供了这个错误:
Parameter index out of range (5 > number of parameters, which is 4).
编辑2:
通过删除第 5 个参数的转义双引号来修复插入:
String sql = "INSERT into rns (rns_pub_date, rns_headline, rns_link, rns_fulltext, constituent_id) VALUES" + "(?,?,?,?,(select constituent_id from constituent where constituent_name = ?))";