0

我有关于在单词中提取类别的问题。我在一个集群中有几个词(“apple”、“iMac”、“snowleopard”),我想在这些词中检索类别。

("apple","iMac","snowleopard") --> "Mac OS X"

我尝试过使用 WordNet 等词汇数据库,但它不起作用。我一直在寻找其他方法,发现维基百科可能会有所帮助。任何用于维基百科的 Java 库?以及如何完成我上面提到的此类任务?谢谢

4

1 回答 1

0

您可以尝试使用 Wikipedia 从这些术语中提取一些含义。例如,针对 Wikipedia API 的以下查询:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&format=json&clshow=!hidden&cllimit=10&generator=search&gsrsearch=apple%20iMac%20snowleopard%22&gsrnamespace=0&gsrprop=titlesnippet&gsrredirects=&gsrlimit=10

产生以下结果:

    {
        "query": {
            "searchinfo": {
                "totalhits": 3,
                "suggestion": "apple iMac snow leopard\"\""
            },
            "pages": {
                "2020710": {
                    "pageid": 2020710,
                    "ns": 0,
                    "title": "Apple's transition to Intel processors",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc."
                        },
                        {
                            "ns": 14,
                            "title": "Category:Intel Corporation"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        }
                    ]
                },
                "14059031": {
                    "pageid": 14059031,
                    "ns": 0,
                    "title": "Mac OS X Snow Leopard",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:2009 software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        }
                    ]
                },
                "20640": {
                    "pageid": 20640,
                    "ns": 0,
                    "title": "OS X",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:1999 software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc. operating systems"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc. software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mach"
                        }
                    ]
                }
            }
        },
        "query-continue": {
            "categories": {
                "clcontinue": "14059031|X86-64 operating systems"
            }
        }
    }

可能不容易从这些数据中确定什么是“正确”类别,但这是一个开始。

于 2012-05-20T01:46:57.093 回答