3

我正在做一个混合语言脚本,父脚本是 bash (不要问为什么,这是一个很长的故事)。我的脚本的一部分将 XML 页面的源提取到一个变量中。我想使用 bash 将变量中的 XML 处理成几个数组。XML 设置如下:

<event>
    <id>34287352</id>
    <what>New Post</what>
    <when>1 Minute Ago 03:50 PM</when>
    <title>This is a title</title>
    <preview>sdfasd</preview>
    <poster>
            <![CDATA[ USERNAME ]]>
    </poster>
    <threadid>2346566</threadid>
    <postid>34287352</postid>
    <lastpost>1360021837</lastpost>
    <userid>3291696</userid>
    <forumid>2</forumid>
    <forumname>General Discussion</forumname>
    <views>201,913</views>
    <replies>6,709</replies>
    <statusicon>images/statusicon/thread.gif</statusicon>
</event>

XML 文件中有 20 个<event>。我想从 XML 中提取什么标题和预览,并将它们全部放入自己的数组中

我在 SOF 上遵循了一个示例

for tag in  what title preview 
do
OUT=`grep  $tag $source | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/' `

# This is what I call the eval_trick, difficult to explain in words.
eval ${tag}=`echo -ne \""${OUT}"\"`
done

W_ARRAY=( `echo ${what}` )
T_ARRAY=( `echo ${title}` )
P_ARRAY=( `echo ${preview}` )

echo ${W_ARRAY[0]}
echo ${T_ARRAY[0]}
echo ${P_ARRAY[0]}

但是使用上面我的脚本总是会吓坏并重复grep: <part of the xml>: No such file or directory

想法?

编辑:

好吧,它很难看,但我设法将 sudoxml 放入一个数组

windex=0
tindex=0
pindex=0
while read -r line
do
WHAT=$(echo ${line} | awk -F "</?what>" '{ print $2 }')
if [ "$WHAT" != "" ]; then
    W_ARRAY[$windex]=$OUT
    let windex+=1
fi
TITLE=$(echo ${line} | awk -F "</?title>" '{ print $2 }')
if [ "$TITLE" != "" ]; then
    T_ARRAY[$tindex]=$OUT
    let tindex+=1
fi
PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print $2 }')
if [ "$PREVIEW" != "" ]; then
    P_ARRAY[$pindex]=$OUT
    let pindex+=1
fi
done <<< "$source"
4

4 回答 4

1

我有一些非常相似的东西,解析明智,这是一个被黑的版本

我使用 xsltproc(在 ubuntu 中,但不记得我是否专门安装了它)

命令行

xsltproc tfile.xslt tfile.xml

tfile.xml (您的示例被复制了 3 次),包含在事件标签中,即。

<events>
     <event> ... </event>
     <event> ... </event>
     <event> ... </event>
</events>

tfile.xsl:

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method='text'/>
<!-- ================================================================== -->
<xsl:template match="/">
    <xsl:apply-templates select="//event"/>
</xsl:template>

<xsl:template match="event">
 <xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['id']=</xsl:text>
 <xsl:value-of select="id"/> <xsl:text> </xsl:text>

 <xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['what']=</xsl:text>
 <xsl:value-of select="what"/><xsl:text> </xsl:text>

 <xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['preview']=</xsl:text>
 <xsl:value-of select="preview"/><xsl:text> </xsl:text>

 <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

输出

event[1]['id']=34287352 event[1]['what']=New Post event[1]['preview']=sdfasd 
event[2]['id']=34287353 event[2]['what']=New Post3 event[2]['preview']=sdfasd 
event[3]['id']=34287354 event[3]['what']=New Post4 event[3]['preview']=sdfasd

希望您了解一点 xslt 处理,根据需要更改输出。

于 2013-02-05T01:41:10.363 回答
0

好吧,现在这完全没有帮助,但我目前正在使用命令行 xml 解析器。如果它完成了(如果我没有被顶级编码器马拉松游行分散注意力的话,它已经完成了......),你可以简单地写成:

eval $(echo "$source" | xidel - -e '<event>
    <what>{$W_ARRAY}</what>
    <title>{$T_ARRAY}</title>
    <preview>{$P_ARRAY}</preview>
</event>*' --output-format bash)

看起来很神奇,不是吗?

于 2013-02-05T00:38:54.643 回答
0

回顾一下我的评论,您的代码有什么问题:

1-由于您的$source变量不是文件名,因此您应该在 grep 中使用:

OUT=`echo $source | grep  $tag | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/' `

2-您的tr命令替换了tab类 XML 变量中的所有 s。但是,rour 变量不包含tabs,而是包含 4 个空格。

因此,您需要拥有:

... | tr -d '    ' | ...

3-另一种解决方案是:

OUT=`echo $source | grep  $tag | sed 's/<.*>\([^<].*\)<.*>$/\1/' `

(请注意,删除了^中的sed

于 2013-02-05T01:28:32.510 回答
0

一切正常。对于那些曾经计划在这里做类似事情的人来说,这是 haps:

on run argv
set region to item 1 of argv
set XML_URL to "http://" & region & ".<URL REMOVED>.com/board/vaispy-secret.php?do=xml"
try
    tell application "Safari"
        set URL of tab 1 of front window to XML_URL
        my waitforload()
        --delay 5
        -- Get page source
        set currentTab to current tab of front window
        set currentSource to currentTab's source
        return currentSource
    end tell
on error err
    log "Could not retrieve source."
    log err
    display dialog err
    --return "NULL"
end try

end run

on waitforload()
--check if page has loaded
local loadflag, zarg, test_html
set loadflag to 0
repeat until loadflag is 1
    delay 0.5
    tell application "Safari"
        set test_html to source of document 1
    end tell
    try
        set zarg to text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html
        if "</events>" is in text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html then
            set loadflag to 1
        end if
    end try
end repeat
end waitforload

创建 bash 脚本:

#!/bin/bash
clear

if [ "$1" == "na" ]; then
region="na"
elif [ "$1" == "eu" ]; then
region="euw"
else
echo "FRcli requires an argument."
echo "usage: [eu|na]"
echo "[eu scans EUW & EUNE]"
echo "[na scans NA]"
exit $?
fi


while true; do
clear
echo "Region: $region"
echo "...Importing Naughty"

declare -a NAUGHTY=()
nindex=0
while read line
do
    NAUGHTY[$nindex]=$line
    let nindex+=1

done < $HOME/Desktop/naughty.txt
NC=${#NAUGHTY[@]}
let NC-=1
echo "...Pulling Source"

source=$(osascript FRcli.scpt $region)

echo "...Extracting Arrays"

windex=0
tindex=0
pindex=0
dindex=0
while read -r line
do
    #WHAT=$(echo ${line} | awk -F "</?what>" '{ print $2 }')
    WHAT=$(echo ${line} | sed -n 's/^.*<what>\([^<]*\).*/\1/p')
    if [ "$WHAT" != "" ]; then
        W_ARRAY[$windex]=$WHAT
        let windex+=1
    fi

    #TITLE=$(echo ${line} | awk -F "</?title>" '{ print $2 }')
    TITLE=$(echo ${line} | sed -n 's/^.*<title>\([^<]*\).*/\1/p')
    if [ "$TITLE" != "" ]; then
        T_ARRAY[$tindex]=$TITLE
        let tindex+=1
    fi

    #PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print $2 }')
    #PREVIEW=$(echo ${line} | sed -n '/<preview*/,/<\/preview>/p')
    PREVIEW=$(echo ${line} | sed -n 's/^.*<preview>\([^<]*\).*/\1/p')
    if [ "$PREVIEW" != "" ]; then
        P_ARRAY[$pindex]=$PREVIEW
        let pindex+=1
    fi

    POSTID=$(echo ${line} | sed -n 's/^.*<postid>\([^<]*\).*/\1/p')
    if [ "$POSTID" != "" ]; then
        D_ARRAY[$dindex]=$POSTID
        let dindex+=1
    fi


done <<< "$source"

echo "What: ${#W_ARRAY[@]}"
echo "Title: ${#T_ARRAY[@]}"
echo "Preview: ${#P_ARRAY[@]}"
echo "PostID: ${#D_ARRAY[@]}"

for ((i=0; i <= 19; i++))
do
    found=0
    fpid=""
    if [ "${W_ARRAY[$i]}" = "New Thread" ]; then
        echo "Scanning Thread"
        scan=$(echo ${T_ARRAY[$i]} ${P_ARRAY[$i]})
        echo "Title: ${T_ARRAY[$i]}"
        echo "Post: ${P_ARRAY[$i]}"
    else
        echo "Scanning Post"
        scan=$(echo ${P_ARRAY[$i]})
        echo "Post: ${scan}"        
    fi
    sleep .5
    for ((n=0; n<=$NC; n++))
    do
        nw=${NAUGHTY[$n]}
        a=$(echo ${scan} | tr [:lower:] [:upper:])
        b=$(echo ${nw} | tr [:lower:] [:upper:])
        echo "Checking: $b"
        #echo "$a"

        if [[ $a == *$b* ]]; then
        ## Change != to == in release
            echo "Found: $b"
            found=1
            echo "...Loading PID"
            declare -a PID=()
            pindex=0
            while read line
            do
                PID[$pindex]=$line
                let pindex+=1

            done < $HOME/Desktop/pid.txt
            PIDC=${#PID[@]}

            for (( p=0; p<=$PIDC ; p++))
            do
                lpid=${PID[$p]}
                if [ "$region ${D_ARRAY[$i]}" == "$lpid" ]; then
                    echo "Found: $lpid"
                    echo "Ignoring Flag"
                    fpid=1
                elif [ "$region ${D_ARRAY[$i]}" != "$lpid" ]; then
                    echo "$region ${D_ARRAY[$i]} $lpid"
                    echo "PID not found, opening URL."
                    fpid=0
                    break
                else
                    echo "Hi"
                    fpid=1
                fi

            done


            if [ "$found" == "1" -a "$fpid" == "0" ]; then
                FFURL="http://$region.<URL REMOVED>.com/board/showthread.php?p=${D_ARRAY[$i]}&highlight=$nw"
                open -a Firefox "$FFURL"
                echo $region ${D_ARRAY[$i]} >> $HOME/Desktop/pid.txt            
                found=0
                fipd=""
            fi
        fi
    done
    sleep .5
done

if [ "$1" == "eu" ]; then
    if [ "$region" == "euw" ]; then
        region="eune"
    else
        region="euw"
    fi
fi
clear

完成我相信他们是更有效的方法。在 bash 脚本中使用 cURL 将使这成为一次脚本交易(由于此板 iSpy 的安全性,无法使用此脚本)。但这很有效,而且非常活泼。仅使用 AVG 32.7 Mem,据我所知没有任何内存泄漏(就像我的 100% applescript 版本一样)

于 2013-02-05T19:23:25.453 回答