python - PyCurl 请求在执行时无限挂起

Question

我编写了一个脚本来从 Qualys 获取扫描结果，每周运行一次以收集指标。

该脚本的第一部分涉及为过去一周运行的每个扫描获取参考列表以进行进一步处理。

问题是，虽然有时这会很好地工作，但有时脚本会挂起c.perform()。这在手动运行脚本时是可以管理的，因为它可以重新运行直到它工作。但是，我希望每周将其作为计划任务运行，而无需任何手动交互。

是否有一种万无一失的方法可以检测是否发生挂起并重新发送 PyCurl 请求直到它起作用？

我尝试设置c.TIMEOUTandc.CONNECTTIMEOUT选项，但这些似乎没有效果。此外，由于没有抛出异常，简单地将它放在 try-except 块中也不会飞。

有问题的功能如下：

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("getting scan references...")

    with open('refs_raw.txt','wb') as refsraw: 
        today = DT.date.today()
        week_ago = today - DT.timedelta(days=7)
        strtoday = str(today)
        strweek_ago = str(week_ago)

        c = pycurl.Curl()

        c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
        c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
        c.setopt(c.USERPWD, usr + ':' + pwd)
        c.setopt(c.POST, 1)
        c.setopt(c.PROXY, 'companyproxy.net:8080')
        c.setopt(c.CAINFO, certifi.where())
        c.setopt(c.SSL_VERIFYPEER, 0)
        c.setopt(c.SSL_VERIFYHOST, 0)
        c.setopt(c.CONNECTTIMEOUT, 3)
        c.setopt(c.TIMEOUT, 3)

        refsbuffer = BytesIO()
        c.setopt(c.WRITEDATA, refsbuffer)
        c.perform()

        body = refsbuffer.getvalue()
        refsraw.write(body)
        c.close()

    print("Got em!")

score 1 · Accepted Answer

我通过启动一个单独的进程自己解决了这个问题，multiprocessing用于在一个单独的进程中启动 API 调用，如果它持续超过 5 秒，则终止并重新启动。它不是很漂亮，但它是跨平台的。对于那些寻求更优雅但仅适用于 *nix的解决方案的人，请查看信号库，特别是 SIGALRM。

下面的代码：

# As this request for scan references sometimes hangs it will be run in a separate thread here
# This will be terminated and relaunched if no response is received within 5 seconds
def performRequest(usr, pwd):
    today = DT.date.today()
    week_ago = today - DT.timedelta(days=7)
    strtoday = str(today)
    strweek_ago = str(week_ago)

    c = pycurl.Curl()

    c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
    c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
    c.setopt(c.USERPWD, usr + ':' + pwd)
    c.setopt(c.POST, 1)
    c.setopt(c.PROXY, 'companyproxy.net:8080')
    c.setopt(c.CAINFO, certifi.where())
    c.setopt(c.SSL_VERIFYPEER, 0)
    c.setopt(c.SSL_VERIFYHOST, 0)

    refsBuffer = BytesIO()
    c.setopt(c.WRITEDATA, refsBuffer)
    c.perform()
    c.close()
    body = refsBuffer.getvalue()
    refsraw = open('refs_raw.txt', 'wb')
    refsraw.write(body)
    refsraw.close()

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("Getting scan references...") 

    # Occasionally the request will hang infinitely. Launch in separate method and retry if no response in 5 seconds
    success = False
    while success != True:
        sendRequest = multiprocessing.Process(target=performRequest, args=(usr, pwd))
        sendRequest.start()

        for seconds in range(5):
            print("...")
            time.sleep(1)

        if sendRequest.is_alive():
            print("Maximum allocated time reached... Resending request")
            sendRequest.terminate()
            del sendRequest
        else:
            success = True

    print("Got em!")

score 1 · Accepted Answer

这个问题很老，但我会添加这个答案，它可能会对某人有所帮助。

执行“perform()”后终止正在运行的 curl 的唯一方法是使用回调：

1-使用 CURLOPT_WRITEFUNCTION: 如文档所述：

您的回调应返回实际处理的字节数。如果该数量与传递给回调函数的数量不同，它将向库发出错误条件信号。这将导致传输中止，并且使用的 libcurl 函数将返回 CURLE_WRITE_ERROR。

这种方法的缺点是 curl 仅在从服务器接收到新数据时才调用 write 函数，因此在服务器停止发送数据的情况下，curl 只会在服务器端等待并且不会收到您的终止信号

2-迄今为止最好的替代方法是使用进度回调：

进度回调的美妙之处在于 curl 将至少每秒调用一次，即使没有来自服务器的数据，这将使您有机会返回 0 作为 curl 的终止开关

使用选项 CURLOPT_XFERINFOFUNCTION，注意它比使用文档中引用的 CURLOPT_PROGRESSFUNCTION 更好：

如果可以，我们鼓励用户改用较新的 CURLOPT_XFERINFOFUNCTION。

您还需要设置选项 CURLOPT_NOPROGRESS

CURLOPT_NOPROGRESS 必须设置为 0 才能真正调用此函数。

这是一个示例，向您展示 Python 中的 write 和 progress 函数实现：

# example of using write and progress function to terminate curl
import pycurl

open('mynewfile', 'w') as f  # used to save downloaded data
counter = 0

# define callback functions which will be used by curl
def my_write_func(data):
    """write to file"""
    f.write(data)
    counter += len(data)

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning -1 or any number 
    # not equal to len(data) 
    if counter >= 1024:
        return -1

def progress(*data):
    """it receive progress figures from curl"""
    d_size, downloaded, u_size, uploade = data

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning 0 
    if downloaded >= 1024:
        return 0


# initialize curl object and options
c = pycurl.Curl()

# callback options
c.setopt(pycurl.WRITEFUNCTION, my_write_func)

self.c.setopt(pycurl.NOPROGRESS, 0)  # required to use a progress function
self.c.setopt(pycurl.XFERINFOFUNCTION, self.progress) 
# self.c.setopt(pycurl.PROGRESSFUNCTION, self.progress)  # you can use this option but pycurl.XFERINFOFUNCTION is recommended
# put other curl options as required

# executing curl
c.perform()

python - PyCurl 请求在执行时无限挂起

2 回答 2

Related

Reference