0

I'm working on a gem. Here is the homepage of it: https://github.com/scaryguy/vakit

If you check out the source code, you can see that I'm parsing an external HTML page to filter some data from it.

The issue is, eventhough I fetch all data I want with one request, each time I call Vakit.sabah or Vakit.oglen a new request is done.

require "vakit/version"
require 'vakit/connect'
require 'Nokogiri'
require 'open-uri'

module Vakit

  def self.today
    Vakit::Connect.shaber
  end

  def self.imsak
    Vakit::Connect.shaber[:imsak]
  end

  def self.sabah
    Vakit::Connect.shaber[:sabah]
  end

  def self.oglen
    Vakit::Connect.shaber[:oglen]
  end

  def self.ikindi
    Vakit::Connect.shaber[:ikindi]
  end

  def self.aksam
    Vakit::Connect.shaber[:aksam]
  end

  def self.yatsi
    Vakit::Connect.shaber[:yatsi]
  end

end

I don't think that it's an efficient way.

I should be able to access attributes of my hash without new request, shouldn't I?

module Vakit
class Connect
    def initialize(opt={})
        @path = opt[:path]
    end

    def self.shaber
        doc = Nokogiri::HTML(open('http://www.samanyoluhaber.com/'))
        x = doc.css('#hnmzT')

        times = []
        x.each do |vakit|
            data = vakit.children.first.children.last.content
            data_add = data.slice(0..data.length-2)
            times.push(data_add)
        end
        times
        vakit = {

            imsak: times[0],
            sabah: times[1],
            oglen: times[2],
            ikindi: times[3],
            aksam: times[4],
            yatsi: times[5]


        }

    end

end
end

I need some enlightment.

4

3 回答 3

1

Every time you use shaber you explicitly open and reparse the content. You're not making an attempt to locally store the content or parsed DOM and check to see if you already have it.

Instead of doc = use @@doc ||= and change the occurrences of doc to @@doc.

The ||= operator will only assign when @@doc is empty. Once it's assigned to a non-nil or non-false value it won't trigger again so it's a poor-mans "memoize".

Because you are using a class-method I recommended using a class variable. @@doc could be an instance variable @doc instead if you have multiple instances of the class that are looking at different pages. As is, that won't make a difference because you've hard-coded only one page, but for future code-growth it might be useful.


The code you've written to access the page isn't very idiomatic Ruby. I'd write it more like the following, which doesn't work because the URL isn't returning a page that has enough time values:

require 'nokogiri'
require 'open-uri'

module Vakit

  URL = 'http://www.samanyoluhaber.com/'

  class Connect
    def initialize(opt={})
      @path = opt[:path]
      @url = opt[:url] || URL
    end

    def shaber(url=nil)
      doc = Nokogiri::HTML(open(url || @url))
      doc.at_css('#hnmzT').to_html # => "<li id=\"hnmzT\" name=\"imsak\"><a id=\"at\"><span>\u0130msak:</span>4:22\u00A0</a></li>"

      x = doc.at_css('li#hnmzT a')
      x.to_html # => "<a id=\"at\"><span>\u0130msak:</span>4:22\u00A0</a>"

      times = x.text.scan(/\d+:\d+/)

      Hash[[:imsak, :sabah, :oglen, :ikindi, :aksam, :yatsi].zip(times)]
    end

  end
end

connection = Vakit::Connect.new
connection.shaber # => {:imsak=>"4:22", :sabah=>nil, :oglen=>nil, :ikindi=>nil, :aksam=>nil, :yatsi=>nil}

Connect isn't a good name for a class. A class is an object, a thing. Connect is a verb, something that occurs, or happens, to a thing. connect would be a good name for a method.

于 2013-08-11T14:10:15.770 回答
0

This line doc = Nokogiri::HTML(open('http://www.samanyoluhaber.com/')) is what's making the requested call several times.

I tested this with your gem and this seems to work.

if @doc.nil?
  @doc = Nokogiri::HTML(open('http://www.samanyoluhaber.com/'))
end

In addition in vakit.rb change

require 'Nokogiri' to require 'nokogiri' (simple n)

于 2013-08-11T14:07:02.247 回答
0

I will make the following suggestions:

  1. Vakit::Connect should be an instance method inside a class. so it can be instantiated & saved in memory. this way each request will be different instances of the same object. if class doesn't exist. Do create it.
  2. use memorization to cache costly operations.
  3. If the operation takes long time, say greater than 3sec. convert it into a background job & use workers to process them.however, since you are authoring a gem. I think this responsibility should be delagated to developers using the gem.

so I would do it something like this: the connect class:

module Vakit
class Connect
attr_accessor :path, :doc
def initialize(opt={}, url)
  @path = opt[:path]
  @doc = Nokogiri::HTML(open(url))
end

def shaber
 x = @doc.css('#hnmzT')
  #no changes here
end
end

the Vakit module

module Vakit
 class Main #name it something more meaningful   
  def initialize(opt={})
   @connected = Connect.new(opt[:path], 'http://www.samanyoluhaber.com/')
  end
  def today
   @connected.shaber
  end

  def imsak
   today[:imsak]
  end

  def sabah
   today[:sabah]
  end

  def oglen
   today[:oglen]
  end

  def ikindi
   today[:ikindi]
  end

  def aksam
   today[:aksam]
  end

  def yatsi
   today[:yatsi]
  end

end

then in code, you can use it like this:

@v = Vakit::Main.new
@v.aksam

This may not be 100% perfect because I really don't understand the purpose of your code, O understand what its doing but not why. But this will not make new request everytime you access your hash.

于 2013-08-11T14:10:07.923 回答