
我会使用$email =~ /\.edu$/,但非美国大学不会将自己限制在.edu域中。


我会执行一个 SPARQL 查询,将电子邮件地址预处理到它的超级域 ( zu.de,然后搜索University主页 ( web.zu.de) 超级域 ( zu.de) 与电子邮件地址超级域匹配的对象,但是......

Wikipedia/dbpedia 没有针对每所大学的文章,有时不太受欢迎的大学会组合在一篇文章中。



1 回答 1


If you don't find a reliable university domain list, you might go with moderation: keep your own database of university domains, and only if you get an unrecognized domain, mark it as pending review, and have a real person verify if the domain belongs to a university or not, and then add it to your database as either belonging to a university or not. Then you could pre-populate your database with known universities (according to what you expect from your expected userbase), and blacklist major free email providers (i.e. mark as non-university), so you don't have to verify them one by one.

This approach has the additional benefit that you will end up with a filtered database of university-related email domains, which would be a useful resource which you can then offer as a service.

于 2012-11-09T22:50:28.333 回答