0

嗨,伙计,我的文件夹中有很多文件 html 相等(唯一的区别是字符串)。我想在我文件夹的每个文件 html 中删除一个标签。我的文件 html 的示例:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>P0001 Generic DTC: Fuel Volume Regulator Control Circuit/Open</title>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <meta name="description" content="P0001 diagnostic trouble code details." />
    <meta name="keywords" content="P0001, obd code, obd codes, diagnostic codes, trouble codes, diagnostic trouble codes, ford, gm, toyota, chrysler, dodge, nissan, chevy, dtc, dtcs, engine code, engine codes, check engine light" />
    <link rel="stylesheet" type="text/css" href="/static/css/base.css"/>
    <link rel="shortcut icon" href="/static/img/favicon.ico" />

    <script type="text/javascript">
      var _gaq = _gaq || [];
      _gaq.push(['_setAccount', 'UA-1196991-3']);
      _gaq.push(['_trackPageview']);
      (function() {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
      })();
    </script>
  </head>
  <body  class="results one">
    <div id="main">
      <div id="header">
        <div id="logo"><a href="/">DTCSearch.com</a></div>
        <form action="/" method="post">
          <p>
            <input type="text" id="query" name="query" value="P0001"/>
            <input type="submit" id="submit" value=""/>
          </p>

        </form>
      </div>
      <div id="content">

<h1>P0001 OBD Trouble Code</h1>
<p>1 result found</p>


<div id="content-body">

  <table cellspacing="0" class="one">
    <caption>P0001 - Generic</caption>
    <tr>
      <th>Type</th>
      <td>Powertrain - Fuel and Air Metering - ISO/SAE Controlled</td>
    </tr>
    <tr>
      <th>Description</th>
      <td><p>Fuel Volume Regulator Control Circuit/Open</p></td>
    </tr>



  </table>

  <p style="font-weight:bold">Try also: <a href="http://www.obd-codes.com/p0001" target="_blank">http://www.obd-codes.com/p0001</a></p>
</div>

      </div>
      <div id="footer">
   <div id="footer_banner">

<a href="http://affiliates.eautorepair.net/z/15/CD65/&dp=84"><img src="http://affiliates.eautorepair.net/42/65/15/&dp=84" alt="Do it Yourself Automobile Repair Information" border="0"></a>

   </div>
        <p>Copyright &copy; 2008&ndash;2012 DTCSearch.com<br/>
        DTCSearch.com is hosted by <a href="http://www.scantool.net">ScanTool.net, LLC</a></p>
      </div>
    </div>
  </body>
</html>`

我想在每个文件中删除这个标签:

<a href="http://affiliates.eautorepair.net/z/15/CD65/&dp=84"><img src="http://affiliates.eautorepair.net/42/65/15/&dp=84" alt="Do it Yourself Automobile Repair Information" border="0"></a>

这是怎么做的?告诉我使用 java 或 bash 或其他技术的解决方案。

4

2 回答 2

1

您可以使用sed从文件中删除一行。

例如,命令:

sed '/foo/d' myfile

将删除所有包含单词foofrom的行myfile

如果您有多个文件,您可以运行:

sed -i '/foo/d' *.html

-i选项告诉sed就地编辑文件。

于 2012-10-04T09:58:30.903 回答
0

我用了:

sed -i 's#<a href="http://affiliates.eautorepair.net/z/15/CD65/&dp=84"><img src="http://affiliates.eautorepair.net/42/65/15/&dp=84" alt="Do it Yourself Automobile Repair Information" border="0"></a>##g' *

现在我有一个更严重的问题。我想删除

<p>
            <input type="text" id="query" name="query" value="XXXXX"/>
            <input type="submit" id="submit" value=""/>
          </p>

其中 xxxxx 在每个文件中都有 5 个不同的字符。怎么办?

于 2012-10-04T13:14:39.320 回答