2

我正在尝试使用Swiftsoup解析一些网站,假设其中一个网站来自Medium。如何像 Instapaper 那样提取网站正文并将正文加载到另一个 UIViewController?

在此处输入图像描述

这是我用来提取标题的代码:

import SwiftSoup

class WebViewController: UIViewController, UIWebViewDelegate {

...

override func viewDidLoad() {
        super.viewDidLoad()

        let url = URL(string: "https://medium.com/@timjwise/stop-lying-to-yourself-when-you-snub-panhandlers-its-not-for-their-own-good-199d0aa7a513")
        let request = URLRequest(url: url!)
        webView.loadRequest(request)

        guard let myURL = url else {
        print("Error: \(String(describing: url)) doesn't seem to be a valid URL")
            return
        }
        let html = try! String(contentsOf: myURL, encoding: .utf8)

        do {
            let doc: Document = try SwiftSoup.parseBodyFragment(html)
            let headerTitle = try doc.title()
            print("Header title: \(headerTitle)")
        } catch Exception.Error(let type, let message) {
            print("Message: \(message)")
        } catch {
            print("error")
        }

}

}

但是我没有运气提取网站或任何其他网站的正文,有什么办法让它工作吗?CSS 或 JavaScript(我对 CSS 或 Javascript 一无所知)?

4

1 回答 1

3

使用函数体https://github.com/scinfu/SwiftSoup#parsing-a-body-fragment 试试这个:

let html = try! String(contentsOf: myURL, encoding: .utf8)

    do {
        let doc: Document = try SwiftSoup.parseBodyFragment(html)
        let headerTitle = try doc.title()

        // my body
        let body = doc.body()
        // elements to remove, in this case images
        let undesiredElements: Elements? = try body?.select("img[src]")
        //remove
        undesiredElements?.remove()


        print("Header title: \(headerTitle)")
    } catch Exception.Error(let type, let message) {
        print("Message: \(message)")
    } catch {
        print("error")
    }
于 2018-03-07T09:22:26.543 回答