html - 如何从 XML 文件中的特定标签中提取值到 HTML 页面中？

Question

我有一个 XML 文件。

<key>457</key>
    <dict>
        <key>Track ID</key><integer>457</integer>
        <key>Name</key><string>Love me do</string>
        <key>Artist</key><string>The Beatles</string>
        <key>Album Artist</key><string>The Beatles</string>
        <key>Composer</key><string>John Lennon/Paul McCartney</string>
        <key>Album</key><string>The Beatles No.1</string>
        <key>Genre</key><string>Varies</string>
        <key>Kind</key><string>AAC audio file</string>
</dict>

为此，我删除了很多文件（这是一首歌曲，每首歌曲大约多出 20-30 行 XML）。我想做的是从每首歌曲中提取“艺术家”字符串，然后删除所有重复的字符串，然后将其输出到 HTML 文件中；最好以一种在找到新版本的 .xml 时自动刷新的方式，从而保持更新的文件，但如果这过于复杂，那很好。

我已经研究过用 jQuery 做这件事的方法，并且我已经建议了 PHP，但我不确定哪个更好/更干净；而且我不确定我将如何去做。

非常感谢，

亨利。

score 1 · Accepted Answer

你到底想达到什么目的？如果您需要基于 XML 文件定期重新生成的 HTML 文件，那么您可能想要为它编写一个程序（例如，BeautifulSoup Python 库允许您非常轻松地解析 XML/HTML 文件）并在每次运行时运行它需要更新 HTML 文件（您也可以为其设置 cron 作业）。

如果您需要能够动态地从 XML 中获取数据，您可以使用一些 JavaScript 库并从 xml 文件加载 XML，然后将其动态添加到页面中。

例如，这个 Python 程序将解析一个 XML 文件 (file.xml) 并创建一个包含 XML 文件数据的 HTML 文件 (song_information.html)。

from BeautifulSoup import BeautifulStoneSoup

f = open("file.xml")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = open("song_information.html", "w")
f.write(html)
f.close()

它将以下 HTML 写入 song_information.html 文件：

<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
<h1>Track ID</h1>
<p>457</p>
<h1>Name</h1>
<p>Love me do</p>
<h1>Artist</h1>
<p>The Beatles</p>
<h1>Album Artist</h1>
<p>The Beatles</p>
<h1>Composer</h1>
<p>John Lennon/Paul McCartney</p>
<h1>Album</h1>
<p>The Beatles No.1</p>
<h1>Genre</h1>
<p>Varies</p>
<h1>Kind</h1>
<p>AAC audio file</p>
</body>
</html>

当然，这是简化的。如果您需要实现 unicode 支持，您需要像这样编辑它：

from BeautifulSoup import BeautifulStoneSoup
import codecs

f = codecs.open("file.xml", "r", "utf-8")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = codecs.open("song_information.html", "w", "utf-8")
f.write(html)
f.close()

此外，您可能需要生成更复杂的 HTML，因此您可能想尝试一些模板系统，例如Jinja2。

score 1 · Accepted Answer

我会在 PHP 中执行此操作：将您的 XML 放入一个字符串，然后（因为只有您将使用它），将其编码为 JSON，将其解码为 assoc 数组，然后运行 foreach 循环以提取艺术家，最后删除重复项，然后将其另存为 HTML。然后，您可以添加一个 cron 作业来定期运行它，并生成 HTML。运行此代码，然后链接到它给出的结果。

$contents = '<key>Blah.... lots of XML';

$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);

print_r($array);

一旦我知道了生成的数组的结构，我就可以完成代码了。但它看起来像这样：

foreach($array['dict']['artist'] as $artist) {
    $artists[] = $artist;
}

// Now $artists holds an array of the artists

$arists = array_unique($artists);

// Now there are no duplicates

foreach($artists as $artist) {
    $output .= '<p>',$artist,'</p>';
}

// Now each artist is put in it's own paragraph.

// Either output the output
echo $output;

// Or save it to a file (in this case, 'artists.html')

$fh = fopen('artists.html', 'w') or die("Can't open file");
fwrite($fh, $output);
fclose($fh);

由于第一个循环中的行需要进行一些调整，因此这并不完全有效foreach，但这是一个起点。

html - 如何从 XML 文件中的特定标签中提取值到 HTML 页面中？

2 回答 2

Related

Reference