假设我们有一个这样的字符串:
4 pallets of books with a weight of 437 kg. The pallets measure 80 x 120 x 120 cm each and are protected with red shrinkwrap.
使用 OpenNLP 提取此类信息(尤其是颜色、重量和尺寸)的最佳方法是什么……考虑一些定制的语料库和自己的培训……但我不知道哪种方法是最好的开始。
<pallet amount>4</pallet amount> pallets of <product>books</product> with a weight of <weight>437</weight> <weightUnit>kg</weightUnit>. The pallets measure <height>80</height> x <width> 120 </width> x <length>120 </length> <measurementUnit>cm</measurementUnit> each and are protected with <color>red</color> shrinkwrap.