2

I have a shell script using a phantom js function

The Phantom JS function crashes when it is run on too many urls apparently, I tried to rewrite it to only do part at a time but I get an error where the $url variable does not change so it always copies down the same url though it appears to write down the actual size of the site.

If I could be helped through that error that would be glorious.

#!/bin/bash
echo "Importing URLs..."
file=sizeurls.csv
url=""
while IFS= read -r line
do
     url+=" "
     url+=$line
done < "$file"
echo "Gathering page sizes..."
phantomjs yslow.js --info basic --format plain $url | grep 'url\|size' > temp.txt
echo "Formatting data..."
sed -i 's/size://g' temp.txt
sed -i 's/url://g' temp.txt
paste - - -d, < temp.txt > pagesize.csv
echo "Done!"

version that is supposed to do part at a time, it may be mortally screwed because I just messed with it a little bit an d I am not sure I returned it to the previous state

#!/bin/bash 
echo "Importing URLs..."
file=sizeurls.csv
url=""
i=0;

while  IFS= read -r line  
do

  while [ $i -le 10 ] #10 at a time i < 10
  do
     url+=" "
     url+=$line
     i=$((i+1));

  done < "$file"

phantomjs yslow.js --info basic --format plain $url | grep 'url\|size' >> temp.txt
#echo "Formatting data..."
sed -i 's/size://g' temp.txt
sed -i 's/url://g' temp.txt
paste - - -d, < temp.txt >> pagesize.csv
done < "$file"
i = 0

echo "Done!"
4

2 回答 2

1

为什么不一次做一个呢?

#!/bin/bash
echo "Importing URLs..."
file=sizeurls.csv

echo "Gathering page sizes..."
while IFS= read -r url
do
  phantomjs yslow.js --info basic --format plain $url | grep 'url\|size'
done < "$file" > temp.txt

echo "Formatting data..."
sed -i -e 's/size://g' -e 's/url://g' temp.txt
paste - - -d, < temp.txt > pagesize.csv

echo "Done!"
于 2013-02-19T20:00:05.950 回答
1

这可能会给您一些想法(未经测试,但看起来不错)。处理转移到一个函数,每 10 个 URL 调用一次,如果有任何剩余的 URL,则在最后再次调用。

#!/bin/bash
echo "Importing URLs..."
file=sizeurls.csv
rm pagesize.csv

ProcessURLs () {
    echo "Gathering page sizes..."
    phantomjs yslow.js --info basic --format plain $@ | grep 'url\|size' > temp.txt
    echo "Formatting data..."
    sed -i 's/size://g' temp.txt
    sed -i 's/url://g' temp.txt
    paste - - -d, < temp.txt >> pagesize.csv
}


url=""
count=0
while IFS= read -r line
do
    url+="$line$ "
    (( count++ ))
    # Procss URLs in 10-URL chunks
    if [[ $count -ge 10 ]] ; then
        ProcessURLs $url
        url=''
        count=0
    fi
done < "$file"

# Handle any remaining URLs
[ -n "$url" ] && ProcessURLs $url

echo "Done!"
于 2013-02-19T19:55:13.790 回答