0

我是 python 新手,并且在数据类型概念及其转换方面遇到了困难。

我有 NLTK 树格式的句子(从斯坦福解析器获得并转换为 NLTK 树)。我需要应用为 NLTK Chunker 编写的函数。但是,NLTK 树格式与 NLTK Chunker 格式不同。两种格式都是 NLTK 树,但元素结构似乎不同(见下文)。

您能帮我将 NLTK 树转换为 NLTK Chunker 输出格式吗?

提前致谢!

这是一个 NLTK Chunker 输出:

(S
  (NP Pierre/NNP Vinken/NNP)
  ,/,
  (NP 61/CD years/NNS old/JJ)
  ,/,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

现在按元素和每个元素类型打印:

class 'nltk.tree.Tree' (NP Pierre/NNP Vinken/NNP)
type 'tuple' (',', ',')
class 'nltk.tree.Tree' (NP 61/CD years/NNS old/JJ)
type 'tuple' (',', ',')
type 'tuple' ('will', 'MD')
type 'tuple' ('join', 'VB')
class 'nltk.tree.Tree' (NP the/DT board/NN)
type 'tuple' ('as', 'IN')
class 'nltk.tree.Tree' (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
type 'tuple' ('.', '.')

这是一个 NLTK “纯”树输出(与 NLTK 文档中的完全相同):

(S
  (NP
    (NP (NNP Pierre) (NNP Vinken))
    (, ,)
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (, ,))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director) (NNP Nov.) (CD 29)))
      ))
  (. .))

现在按元素和每个元素类型打印:

class 'nltk.tree.Tree' (NP
  (NP (NNP Pierre) (NNP Vinken))
  (, ,)
  (ADJP (NP (CD 61) (NNS years)) (JJ old))
  (, ,))
class 'nltk.tree.Tree' (NP (NNP Pierre) (NNP Vinken))
class 'nltk.tree.Tree' (NNP Pierre)
type 'str' Pierre
class 'nltk.tree.Tree' (NNP Vinken)
type 'str' Vinken
class 'nltk.tree.Tree' (, ,)
type 'str' ,
class 'nltk.tree.Tree' (ADJP (NP (CD 61) (NNS years)) (JJ old))
class 'nltk.tree.Tree' (NP (CD 61) (NNS years))
class 'nltk.tree.Tree' (CD 61)
type 'str' 61
class 'nltk.tree.Tree' (NNS years)
type 'str' years
class 'nltk.tree.Tree' (JJ old)
type 'str' old
class 'nltk.tree.Tree' (, ,)
type 'str' ,
class 'nltk.tree.Tree' (VP
  (MD will)
  (VP
    (VB join)
    (NP (DT the) (NN board))
    (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
    (NP (NNP Nov.) (CD 29))))
class 'nltk.tree.Tree' (MD will)
type 'str' will
class 'nltk.tree.Tree' (VP
  (VB join)
  (NP (DT the) (NN board))
  (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
  (NP (NNP Nov.) (CD 29)))
class 'nltk.tree.Tree' (VB join)
type 'str' join
class 'nltk.tree.Tree' (NP (DT the) (NN board))
class 'nltk.tree.Tree' (DT the)
type 'str' the
class 'nltk.tree.Tree' (NN board)
type 'str' board
class 'nltk.tree.Tree' (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
class 'nltk.tree.Tree' (IN as)
type 'str' as
class 'nltk.tree.Tree' (NP (DT a) (JJ nonexecutive) (NN director))
class 'nltk.tree.Tree' (DT a)
type 'str' a
class 'nltk.tree.Tree' (JJ nonexecutive)
type 'str' nonexecutive
class 'nltk.tree.Tree' (NN director)
type 'str' director
class 'nltk.tree.Tree' (NP (NNP Nov.) (CD 29))
class 'nltk.tree.Tree' (NNP Nov.)
type 'str' Nov.
class 'nltk.tree.Tree' (CD 29)
type 'str' 29
class 'nltk.tree.Tree' (. .)
type 'str' .
4

1 回答 1

2

部分答案(即没有代码):

NLTK 使用类表示分块数据,Tree该类实际上是为任意句法树设计的。分块句子是一棵只有一层分组的树,因此要从完整解析变为分块结构,您需要丢弃除一种非递归组之外的所有组。哪些组?这取决于您的应用程序,因为有不同种类的“块”(例如,命名实体)。

您的示例显示了 NP 块,因此您可以遍历树并省略除 NP 顶层之外的所有结构(或最低层,如果您想将复杂的 NP 块分解为小块)。

于 2014-06-28T08:15:02.813 回答