0

我有一个带有POST启动爬虫的端点的快速服务器。当爬虫完成时,它会关闭整个服务器。难道我做错了什么?我怎样才能防止它发生?

该项目看起来像这样:

// server.js
const express = require('express')
const bodyParser = require('body-parser')
const startSearch = require('./crawler.js')

const app = express()

app.use(bodyParser.json())

app.post('/crawl', async (req, res) => {
  const { foo, bar } = req.body

  startSearch({ foo, bar })
  res.end()
})

app.listen(PORT, () => console.log(`listening on port ${PORT}`))

// crawler.js
const Apify = require('apify')

const startSearch = ({ foo, bar }) => {
  Apify.main(async () => {
    const sources = [{
      url: 'https://path_to_website.com',
      userData: { foo, bar }
    }]
    const requestList = await Apify.openRequestList(null, sources)

    const crawler = new Apify.PuppeteerCrawler({
      requestList,
      handlePageFunction: async ({ request, page }) => {
          // do things using puppeteer
        }
      }
    })

    await crawler.run()
  })
}
4

1 回答 1

0

只是避免使用Apify.main(). 有关详细信息,请参阅如何在 Google Cloud Functions 上使用 Apify

(我以为我正在发送答案,但似乎只是评论)

于 2020-01-13T19:16:11.710 回答