在本地运行 Nokogiri 与在我的服务器上运行它时,我得到了一些奇怪的差异。在我的本地机器上,整个文档似乎可以解析并可用,但在服务器上,我似乎得到了 doctype 选项卡和一些随机注释标签。
首先,为了确保 open-uri 没有问题,我检查了它——结果不准确,但包含正确的标记。
当地的:
ruby-1.8.7-p352 :005 > s = open('http://www.pennstateind.com/store/PK2WAY.html')
=> #<File:/var/folders/G8/G8bsAGBk1o82Eyks3ZmFtq-+3Y6/-Tmp-/open-uri20120626-5891-10y2ncr-0>
ruby-1.8.7-p352 :006 > s.length
=> 88408
服务器:
rb(main):008:0> s = open('http://www.pennstateind.com/store/PK2WAY.html')
=> #<File:/tmp/open-uri20120626-22167-1td2l72-0>
irb(main):009:0> s.length
=> 98184
当我在本地机器上运行它时,我得到了这个:
ruby-1.8.7-p352 :003 > d = Nokogiri::HTML(open('http://www.pennstateind.com/store/PK2WAY.html'))
=> [ OUTPUT OMITTED FOR BREVITY - CAN SUPPLY ON REQUEST ]
ruby-1.8.7-p352 :004 > d.to_s.length
=> 85212
但是当我在服务器上运行它时,我得到了这个:
rb(main):006:0> d = Nokogiri::HTML(open('http://www.pennstateind.com/store/PK2WAY.html'))
=> #<Nokogiri::HTML::Document:0x36620e14b580 name="document" children= [#<Nokogiri::XML::DTD:0x36620e14b1c0 name="html">, #<Nokogiri::XML::Comment:0x36620e14b170 " Open Graph Tags ">, #<Nokogiri::XML::Comment:0x36620e14a98c " Customer_Session_Verified: 0 ">]>
irb(main):007:0> d.to_s.length
=> 172
唯一明显的 gem 区别在于 JS 编译器 - 所有其他 gem 都是本地和服务器之间的确切版本:
Local => libv8 (3.3.10.4 x86-darwin-10)
Server => libv8 (3.3.10.4 x86_64-linux)
任何想法如何弄清楚发生了什么和/或解决这个问题?
更新 - 为了找出问题的实际出处,我从服务器和本地主机中提取了一个文件,然后在每个文件上运行它们。下面的结果表明问题肯定出在Nokogiri - 问题是什么我仍然困惑......
本地运行:
# FILE ORIGINALLY PULLED FROM SERVER
ruby-1.8.7-p352 :015 > server_file = File.open("/Users/jmcdonald/Desktop/files/SERVER.txt", "r")
=> #<File:/Users/jmcdonald/Desktop/files/SERVER.txt>
ruby-1.8.7-p352 :016 > server_file.read.length
=> 93071
ruby-1.8.7-p352 :022 > Nokogiri::HTML(server_file).to_s.length
=> 98793
# FILE ORIGINALLY PULLED FROM LOCALHOST
=> #<File:/Users/jmcdonald/Desktop/files/LOCAL.txt>
ruby-1.8.7-p352 :018 > local_file.read.length
=> 89622
ruby-1.8.7-p352 :026 > Nokogiri::HTML(local_file).to_html.length
=> 94632
在服务器上运行:
# FILE ORIGINALLY PULLED FROM SERVER
irb(main):001:0> sf = File.open('/home/charlest/public_html/files/nokogiri_issue/SERVER.txt', 'r')
=> #<File:/home/charlest/public_html/files/nokogiri_issue/SERVER.txt>
irb(main):002:0> sf.read.length
=> 93071
irb(main):004:0> Nokogiri::HTML(sf).to_s.length
=> 896 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WRONG
# FILE ORIGINALLY PULLED FROM LOCALHOST
irb(main):008:0> lf = File.open('/home/charlest/public_html/files/nokogiri_issue/LOCAL.txt', 'r')
=> #<File:/home/charlest/public_html/files/nokogiri_issue/LOCAL.txt>
irb(main):009:0> lf.read.length
=> 89622
irb(main):011:0> Nokogiri::HTML(lf).to_s.length
=> 896 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WRONG