1

我已经使用树标记器来找出文本文件的语音和词干,现在我需要读取每个单词的词干输出(即在下面显示的输出文件的第 3 列中)。怎么做?如何跳过两栏并阅读第三栏?

the DT  the
girls   NNS girl
own VVP own
paper## JJ  paper##
vol##   JJ  vol##
viii##  NN  viii##
no##    NN  no##
357##   JJ  357##
october NN  October
30  CD  @card@
1886##  JJ  1886##
price   NN  price
one CD  one
penny## NN  penny##
as  IN  as
the DT  the
baron   NN  baron
had VHD have
conjectured VVN conjecture
the DT  the
housemaid   NN  housemaid
whom    WP  whom
he  PP  he
had VHD have
called  VVN call
out RP  out
of  IN  of
the DT  the
nursery NN  nursery
to  TO  to
look    VV  look
for##   JJ  for##
lons    NNS lons
cane    NN  cane
on  IN  on
finding VVG find
her PP$ her
master  NN  master
had VHD have
gone    VVN go
without IN  without
it  PP  it
did VVD do
not RB  not
hurry   VV  hurry
back    RB  back
but CC  but
stopped VVN stop
talking##   NN  talking##
to  TO  to
some    DT  some
of  IN  of
the DT  the
other   JJ  other
servants    NNS servant
for IN  for
perhaps RB  perhaps
a   DT  a
quarter NN  quarter
of  IN  of
an  DT  an
hour    NN  hour
when    WRB when
she PP  she
returned    VVD return
to  TO  to
the DT  the
nursery##   NN  nursery##
and CC  and
to  TO  to
her PP$ her
amazement   NN  amazement
found   VVD find
the DT  the
baby    NN  baby
was VBD be
gone##  JJ  gone##
she PP  she
was VBD be
not RB  not
alarmed VVN alarm
at  IN  at
first   JJ  first
except  IN  except
she PP  she
supposed##  JJ  supposed##
she PP  she
should  MD  should
get VV  get
a   DT  a
scolding    NN  scolding
from    IN  from
the DT  the
nurse   NN  nurse
who WP  who
she PP  she
imagined    VVD imagine
had VHD have
come    VVN come
in  IN  in
and CC  and
taken   VVN take
the DT  the
child## NN  child##
to  TO  to
another DT  another
room    NN  room
however RB  however
having  VHG have
the DT  the
excellent   JJ  excellent
excuse  NN  excuse
that    IN/that that
her PP$ her
master  NN  master
had VHD have
called  VVN call
her PP$ her
away    RB  away
she##   NN  she##
went    VVD go
in  IN  in
search  NN  search
of  IN  of
the DT  the
nurse   NN  nurse
but CC  but
now RB  now
not RB  not
finding VVG find
her PP  her
anywhere    RB  anywhere
and CC  and
hearing VVG hear
from    IN  from
the DT  the
footman##   NN  footman##
that    IN/that that
she PP  she
was VBD be
not RB  not
expected    VVN expect
back    RB  back
till    IN  till
very    RB  very
late    JJ  late
marie   NN  marie
became  VVD become
seriously   RB  seriously
alarmed##   JJ  alarmed##
perhaps RB  perhaps
madame  NN  madame
has VHZ have
taken   VVN take
it  PP  it
into    IN  into
her PP$ her
room    NN  room
she PP  she
might   MD  might
have    VH  have
heard   VVN hear
it  PP  it
crying  VVG cry
and CC  and
fetched VVD fetch
it##    NN  it##
suggested   VVD suggest
the DT  the
footman NN  footman
and CC  and
marie   NN  marie
very    RB  very
much    RB  much
against IN  against
her PP  her
will    MD  will
felt    VVD feel
she PP  she
was VBD be
in  IN  in
duty    NN  duty
bound   VVN bind
to##    JJ  to##
go  NN  go
and CC  and
see##   NN  see##
so  IN  so
knocking    VVG knock
at  IN  at
her PP$ her
mistresss   NP  mistresss
door    NN  door
she PP  she
called  VVD call
out RP  out
madame  NN  madame
has VHZ have
she PP  she
taken   VVN take
the DT  the
baby##  NN  baby##
the DT  the
poor    JJ  poor
little  JJ  little
baroness    NN  baroness
who WP  who
was VBD be
asleep  RB  asleep
started VVN start
up  RP  up
and CC  and
called  VVD call
to  TO  to
the DT  the
servant NN  servant
to  TO  to
come    VV  come
in##    JJ  in##
madame  NN  madame
has VHZ have
she PP  she
the DT  the
baby    NN  baby
repeated    VVD repeat
the DT  the
girl##  NN  girl##
the DT  the
baby    NN  baby
no  DT  no
what    WP  what
do  VVP do
you PP  you
mean    VV  mean
where   WRB where
is  VBZ be
it  PP  it
and CC  and
where   WRB where
is  VBZ be
nurse   NN  nurse
cried   VVD cry
the DT  the
baroness    NN  baroness
jumping##   NN  jumping##
up  RB  up
and CC  and
slipping    VVG slip
on  IN  on
a   DT  a
dressinggown    NN  dressinggown
and CC  and
slippers##  NN  slippers##
marie   NN  marie
began   VVD begin
to  TO  to
cry VV  cry
and CC  and
to  TO  to
pour    VV  pour
forth   RB  forth
such    PDT such
a   DT  a
volley  NN  volley
of  IN  of
words   NNS word
excuses NNS excuse
fears   VVZ fear
alarms  NNS alarm
and CC  and
wonders##   NN  wonders##
that    IN/that that
the DT  the
baroness    NN  baroness
could   MD  could
make    VV  make
out RP  out
nothing NN  nothing
and CC  and
rushed  VVD rush
to  TO  to
the DT  the
nursery NN  nursery
to  TO  to
see VV  see
for IN  for
herself PP  herself
what##  NN  what##
had VHD have
happened##  NN  happened##
the DT  the
empty   JJ  empty
cradle  NN  cradle
did VVD do
not RB  not
however RB  however
throw   VV  throw
much    RB  much
light   JJ  light
upon    IN  upon
it  PP  it
and CC  and
the DT  the
servants##  NN  servants##
who WP  who
answered    VVD answer
the DT  the
bell    NN  bell
which   WDT which
the DT  the
baroness    NN  baroness
clashed VVD clash
wildly  RB  wildly
looked  VVN look
as  IN  as
scared  VVN scare
as  IN  as
the DT  the
sobbing VVG sob
marie## NN  marie##
to  TO  to
find    VV  find
the DT  the
baby    NN  baby
had VHD have
disappeared##   NN  disappeared##
a   DT  a
search  NN  search
from    IN  from
attic   NN  attic
to  TO  to
basement    NN  basement
was VBD be
at  IN  at
once    RB  once
instituted  VVN institute
the##   JJ  the##
menservants NNS manservant
were    VBD be
sent    VVN send
into    IN  into
the DT  the
grounds NNS ground
with    IN  with
lanterns    NNS lantern
the DT  the
whole   JJ  whole
house   NN  house
was VBD be
turned  VVN turn
topsyturvy##    NN  topsyturvy##
in  IN  in
the DT  the
midst   NN  midst
of  IN  of
which   WDT which
the DT  the
nurse   NN  nurse
returned    VVD return
and CC  and
finding VVG find
her PP$ her
baby    NN  baby
was VBD be
gone    VVN go
went    VVD go
into    IN  into
violent##   JJ  violent##
hysterics   NNS hysteric
while   IN  while
the DT  the
young   JJ  young
baroness    NN  baroness
with    IN  with
flying  VVG fly
hair    NN  hair
and CC  and
dilated VVN dilate
eyes    NNS eye
rushed  VVN rush
about   IN  about
wringing##  NN  wringing##
her PP$ her
hands   NNS hand
and CC  and
looking VVG look
as  IN  as
she PP  she
felt    VVD feel
distracted  VVN distract
with    IN  with
grief## NN  grief##
the DT  the
search  NN  search
was VBD be
of  IN  of
course  NN  course
in  IN  in
vain    JJ  vain
and CC  and
they    PP  they
were    VBD be
just    RB  just
coming  VVG come
to  TO  to
the DT  the
conclusion  NN  conclusion
that    IN/that that
the DT  the
baby##  NN  baby##
had VHD have
been    VBN be
stolen  VVN steal
when    WRB when
the DT  the
baron   NN  baron
returned    VVN return
from    IN  from
seeing  VVG see
lon NN  lon
off##   NN  off##
the DT  the
moment  NN  moment
the DT  the
baroness    NN  baroness
heard   VVD hear
his PP$ his
voice   NN  voice
in  IN  in
the DT  the
hall    NN  hall
she PP  she
flew    VVD fly
down    RP  down
the DT  the
wide    JJ  wide
oak NN  oak
staircase   NN  staircase
crying##    NN  crying##
arnaud  NN  arnaud
arnaud  NN  arnaud
my  PP$ my
precious    JJ  precious
baby    NN  baby
is  VBZ be
gone    VVN go
it  PP  it
is  VBZ be
stolen  JJ  stolen
find    VV  find
her PP  her
find    VV  find
her PP  her
or  CC  or
i   NP  i
shall   MD  shall
go##    JJ  go##
mad##   NN  mad##
and CC  and
a   DT  a
glance  NN  glance
at  IN  at
her PP$ her
wild    JJ  wild
eyes    NNS eye
almost  RB  almost
testified   VVD testify
she PP  she
spoke   VVD speak
the DT  the
truth## NN  truth##
she PP  she
is  VBZ be
not RB  not
stolen  VVN steal
she PP  she
is  VBZ be
safe    JJ  safe
enough  RB  enough
said    VVD say
the DT  the
baron   NN  baron
sulkily##   NN  sulkily##
4

1 回答 1

0

假设您想要每行的第三个标记,其中标记由空格分隔,您可以如下所示:

  • 逐行读取文件(使用 a BufferedReder
  • 将每一行分成 3 部分(使用String#split

split 方法返回一个字符串数组,第三部分是array[2]

split 方法将正则表达式作为参数和一个可选限制,如果 positiv 告诉您希望应用该模式的次数。

按空格分割的正则表达式是\s+.

如果您想3在数组中最多包含元素,则通过 limit 3,如果正则表达式可以应用 3 次,则最后一个元素将包含字符串的所有其余部分。

以下是如何使用split方法:

@Test
public void split3() {
    String[] array = "the DT  the".split("\\s+", 3);
    System.out.println(array[2]);

    array = "the DT  the foo bar".split("\\s+", 3);
    // last element will contain all the rest of the string.
    System.out.println(array[2]);

    array = "the DT".split("\\s+", 3);
    System.out.println(array[2]); // java.lang.ArrayIndexOutOfBoundsException: 2

    array = "the DT".split("\\s+");
    System.out.println(array[2]); // java.lang.ArrayIndexOutOfBoundsException: 2

    array = "the DT  the foo bar".split("\\s+");
    System.out.println(array[2]);
}

输出是:

the
> will cause java.lang.ArrayIndexOutOfBoundsException: 2
> will cause java.lang.ArrayIndexOutOfBoundsException: 2
the foo bar
the

为了安全起见,ArrayIndexOutOfBoundsException在尝试访问第三个元素之前始终检查结果数组的长度:

array = "the DT the".split("\\s+");
if(array.length > 2) {
    System.out.println(array[2]);
} else {
    // there is no 3rd column
}
于 2013-10-30T08:55:11.267 回答