0

Essentially given any url, I could fetch the webpage in Ruby using

doc = Nokogiri::HTML(open(my_url))
title = doc.at('meta[property="og:title"]')['content']
...

and extract the elements I need

Is there a best practice before fetching any links? It seems like a potential security risk as well.

I'm assuminig large compaines like facebook might run an image through some model to determine if it should be censored?

4

1 回答 1

1

Essentially given any url, I could fetch the webpage in Ruby using

I am using metainspector to get OG data from various media URLs. It works very well and it might save you some headaches.

Is there a best practice before fetching any links? It seems like a potential security risk as well.

It depends on your application, what info you scrape and what you show to the user. If you are concerned about obscene words, you can filter them out (there might be some gems), but usually in the OG meta I didn't see any of them. You could blacklist adult website domains, or allow just some domains..

I'm assuminig large compaines like facebook might run an image through some model to determine if it should be censored?

Image recognition is a way to do it but it requires a lot of work. A lot.

于 2021-03-17T08:31:00.340 回答