8

所以 python 的行为就像它根本听不到我的麦克风里的任何声音。

这就是问题所在。我有一个Python (2.7) 脚本,假设它正在使用Gstreamer访问我的麦克风并通过Pocketsphinx为我进行语音识别。我正在使用Pulse Audio,我的设备是Raspberry Pi。我的麦克风是Playstation 3 Eye

现在,我已经让 pocketsphinx_continuous 正确运行并识别我在 .dict 和 .lm 文件中定义的单词。经过几次试运行后,准确率约为 85-90%。所以一开始我就知道我的麦克风通过pocketsphinx +脉冲音频正常拾取声音。

仅供参考,我运行了以下内容:

pocketsphinx_continuous -lm /home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm -dict /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic -hmm /home/pi/dev/scarlettPi/config/speech/model/hmm/en_US/hub4wsj_sc_8k -silprob  0.1 -wip 1e-4 -bestpath 0

在我的 python 代码中,我试图做同样的事情,但我使用 gstreamer 来访问 python 中的麦克风。(注意:我对 Python 有点陌生)

这是我的代码(感谢 Josip Lisec 让我走到这一步):

import pi
from pi.becore import ScarlettConfig
from recorder import Recorder
from brain import Brain

import os
import json
import tempfile
#import sys

import pygtk
pygtk.require('2.0')
import gtk
import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

scarlett_config=ScarlettConfig()

class Listener:
  def __init__(self, gobject, gst):
    self.failed = 0

    self.pipeline = gst.parse_launch(' ! '.join(['pulsesrc',
                                               'audioconvert',
                                               'audioresample',
                                               'vader name=vader auto-threshold=true',
                                               'pocketsphinx lm=' + scarlett_config.get('LM') + ' dict=' + scarlett_config.get('DICT') + ' hmm=' + scarlett_config.get('HMM') + ' name=listener',
                                               'fakesink']))
    listener = self.pipeline.get_by_name('listener')
    listener.connect('result', self.__result__)
    listener.set_property('configured', True)
    print "KEYWORDS WE'RE LOOKING FOR: " + scarlett_config.get('ourkeywords')

    bus = self.pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect('message::application', self.__application_message__)
    self.pipeline.set_state(gst.STATE_PLAYING)

  def result(self, hyp, uttid):
    if hyp in scarlett_config.get('ourkeywords'):
      self.failed = 0
      self.listen()
    else:
      self.failed += 1
      if self.failed > 4:
        pi.speak("" + scarlett_config.get('scarlett_owner') + ", if you need me, just say my name.")
        self.failed = 0

  def listen(self):
    self.pipeline.set_state(gst.STATE_PAUSED)
    pi.play('pi-listening')
    Recorder(self)

  def cancel_listening(self):
    pi.play('pi-cancel')
    self.pipeline.set_state(gst.STATE_PLAYING)

  # question - sound recording
  def answer(self, question):
    pi.play('pi-cancel')

    print " * Contacting Google"
    destf = tempfile.mktemp(suffix='piresult')
    os.system('wget --post-file %s --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" --header="Content-Type: audio/x-flac; rate=16000" -O %s -q "https://www.google.com/speech-api/v1/recognize?client=chromium&lang=en-US"' % (question, destf))
    #os.system("speech2text %s > %s" % (question, destf))
    b = open(destf)
    result = b.read()
    b.close()

    os.unlink(question)
    os.unlink(destf)

    if len(result) == 0:
      print " * nop"
      pi.play('pi-cancel')
    else:
      brain = Brain(json.loads(result))
      if brain.think() == False:
        print " * nop2"
        pi.play('pi-cancel')

    self.pipeline.set_state(gst.STATE_PLAYING)

  def __result__(self, listener, text, uttid):
    struct = gst.Structure('result')
    struct.set_value('hyp', text)
    struct.set_value('uttid', uttid)
    listener.post_message(gst.message_new_application(listener, struct))

  def __application_message__(self, bus, msg):
    msgtype =  msg.structure.get_name()
    if msgtype == 'result':
      self.result(msg.structure['hyp'], msg.structure['uttid'])

假设应用程序匹配关键字“Scarlett”,然后执行操作。

当我运行我的应用程序时,我得到以下输出:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ ./pi 
/usr/lib/python2.7/dist-packages/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
INFO: cmd_ln.c(691): Parsing command line:
gst-pocketsphinx \
    -samprate 8000 \
    -cmn prior \
    -fwdflat no \
    -bestpath no \
    -maxhmmpf 2000 \
    -maxwpf 20 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ascale     20.0        2.000000e+01
-aw     1       1
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   no      no
-bestpathlw 9.5     9.500000e+00
-bghist     no      no
-ceplen     13      13
-cmn        current     prior
-cmninit    8.0     8.0
-compallsen no      no
-debug              0
-dict               
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     no
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                
-input_endian   little      little
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lextreedump    0       0
-lifter     0       0
-lm             
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhmmpf   -1      2000
-maxnewoov  20      20
-maxwpf     -1      20
-mdef               
-mean               
-mfclogdir          
-min_endfr  0       0
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mllr               
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-pl_beam    1e-10       1.000000e-10
-pl_pbeam   1e-5        1.000000e-05
-pl_window  0       0
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       8.000000e+03
-seed       -1      -1
-sendump            
-senlogdir          
-senmgau            
-silprob    0.1     1.000000e-01
-smoothspec no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-topn_beam  0       0
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        1e-4        1.000000e-04
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 56,-3,1 \
    -varnorm no 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     56,-3,1
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   1s_c_d_dd
-frate      100     100
-input_endian   little      little
-lda                
-ldadim     0       0
-lifter     0       0
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       8.000000e+03
-seed       -1      -1
-smoothspec no      no
-svspec             0-12/13-25/26-38
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4120 * 20 bytes (80 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 13 words read
INFO: dict.c(341): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=18, 3=17
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516):       12 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533):       18 = #bigrams created
INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555):       17 = #trigrams created
INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 12 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 152
INFO: ngram_search_fwdtree.c(338): after: 12 root, 24 non-root channels, 11 single-phone words
KEYWORDS WE'RE LOOKING FOR: [ 'scarlett', 'SCARLETT' ]    

但它无法匹配任何东西。我几乎认为 python 无法从麦克风中听到任何声音,甚至没有任何尝试识别任何东西。在pocketsphinx_continuious 中,它通常会在准备开始收听时打印出 READY 状态……我希望在 python 中也一样?

这是我的python包:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ dpkg -l | grep -i python
ii  idle                                  2.7.3-4                              all          IDE for Python using Tkinter (default version)
ii  idle-python2.7                        2.7.3-6                              all          IDE for Python (v2.7) using Tkinter
rc  idle3                                 3.2.3-6                              all          IDE for Python using Tkinter (default version)
ii  libpyside1.1:armhf                    1.1.1-3                              armhf        Python bindings for Qt 4 (base files)
ii  libpython2.6                          2.6.8-1.1                            armhf        Shared Python runtime library (version 2.6)
ii  libpython2.7                          2.7.3-6                              armhf        Shared Python runtime library (version 2.7)
ii  libshiboken1.1:armhf                  1.1.1-1                              armhf        CPython bindings generator for C++ libraries - shared library
ii  python                                2.7.3-4                              all          interactive high-level object-oriented language (default version)
ii  python-alsaaudio                      0.5+svn36-1                          armhf        Alsa bindings for Python
ii  python-cairo                          1.8.8-1                              armhf        Python bindings for the Cairo vector graphics library
ii  python-dbg                            2.7.3-4                              all          debug build of the Python Interpreter (version 2.7)
ii  python-dbus                           1.1.1-1                              armhf        simple interprocess messaging system (Python interface)
ii  python-dbus-dev                       1.1.1-1                              all          main loop integration development files for python-dbus
ii  python-dev                            2.7.3-4                              all          header files and a static library for Python (default)
ii  python-gi                             3.2.2-2                              armhf        Python 2.x bindings for gobject-introspection libraries
ii  python-gi-dbg                         3.2.2-2                              armhf        Python bindings for the GObject library (debug extension)
ii  python-gi-dev                         3.2.2-2                              all          development headers for GObject Python bindings
ii  python-gobject                        3.2.2-2                              all          Python 2.x bindings for GObject - transitional package
ii  python-gobject-2                      2.28.6-10                            armhf        deprecated static Python bindings for the GObject library
ii  python-gobject-2-dbg                  2.28.6-10                            armhf        deprecated static Python bindings for the GObject library (debug extension)
ii  python-gobject-2-dev                  2.28.6-10                            all          development headers for the static GObject Python bindings
ii  python-gobject-dbg                    3.2.2-2                              all          Python 2.x debugging modules for GObject - transitional package
ii  python-gobject-dev                    3.2.2-2                              all          Python 2.x development headers for GObject - transitional package
ii  python-gst0.10                        0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-dbg                    0.10.22-3                            armhf        generic media-playing framework (Python debug bindings)
ii  python-gst0.10-dev                    0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-rtsp                   0.10.8-3                             armhf        GStreamer RTSP server plugin (Python bindings)
ii  python-gtk2                           2.24.0-3                             armhf        Python bindings for the GTK+ widget set
ii  python-iplib                          1.1-3                                all          Python library to convert amongst many different IPv4 notations
ii  python-libxml2                        2.8.0+dfsg1-7+nmu1                   armhf        Python bindings for the GNOME XML library
ii  python-minimal                        2.7.3-4                              all          minimal subset of the Python language (default version)
ii  python-numpy                          1:1.6.2-1.2                          armhf        Numerical Python adds a fast array facility to the Python language
ii  python-pexpect                        2.4-1                                all          Python module for automating interactive applications
ii  python-pip                            1.1-3                                all          alternative Python package installer
ii  python-pkg-resources                  0.6.24-1                             all          Package Discovery and Resource Access using pkg_resources
ii  python-pyalsa                         1.0.25-1                             armhf        Official ALSA Python binding library
ii  python-pyside                         1.1.1-3                              all          Python bindings for Qt4 (big metapackage)
ii  python-pyside.phonon                  1.1.1-3                              armhf        Qt 4 Phonon module - Python bindings
ii  python-pyside.qtcore                  1.1.1-3                              armhf        Qt 4 core module - Python bindings
ii  python-pyside.qtdeclarative           1.1.1-3                              armhf        Qt 4 Declarative module - Python bindings
ii  python-pyside.qtgui                   1.1.1-3                              armhf        Qt 4 GUI module - Python bindings
ii  python-pyside.qthelp                  1.1.1-3                              armhf        Qt 4 help module - Python bindings
ii  python-pyside.qtnetwork               1.1.1-3                              armhf        Qt 4 network module - Python bindings
ii  python-pyside.qtopengl                1.1.1-3                              armhf        Qt 4 OpenGL module - Python bindings
ii  python-pyside.qtscript                1.1.1-3                              armhf        Qt 4 script module - Python bindings
ii  python-pyside.qtsql                   1.1.1-3                              armhf        Qt 4 SQL module - Python bindings
ii  python-pyside.qtsvg                   1.1.1-3                              armhf        Qt 4 SVG module - Python bindings
ii  python-pyside.qttest                  1.1.1-3                              armhf        Qt 4 test module - Python bindings
ii  python-pyside.qtuitools               1.1.1-3                              armhf        Qt 4 UI tools module - Python bindings
ii  python-pyside.qtwebkit                1.1.1-3                              armhf        Qt 4 WebKit module - Python bindings
ii  python-pyside.qtxml                   1.1.1-3                              armhf        Qt 4 XML module - Python bindings
ii  python-rpi.gpio                       0.5.3a-1                             armhf        Python GPIO module for Raspberry Pi
ii  python-setuptools                     0.6.24-1                             all          Python Distutils Enhancements (setuptools compatibility)
ii  python-simplejson                     2.5.2-1                              armhf        simple, fast, extensible JSON encoder/decoder for Python
ii  python-support                        1.0.15                               all          automated rebuilding support for Python modules
ii  python-tk                             2.7.3-1                              armhf        Tkinter - Writing Tk applications with Python
ii  python-yaml                           3.10-4                               armhf        YAML parser and emitter for Python
ii  python-yaml-dbg                       3.10-4                               armhf        YAML parser and emitter for Python (debug build)
ii  python2.6                             2.6.8-1.1                            armhf        Interactive high-level object-oriented language (version 2.6)
ii  python2.6-minimal                     2.6.8-1.1                            armhf        Minimal subset of the Python language (version 2.6)
ii  python2.7                             2.7.3-6                              armhf        Interactive high-level object-oriented language (version 2.7)
ii  python2.7-dbg                         2.7.3-6                              armhf        Debug Build of the Python Interpreter (version 2.7)
ii  python2.7-dev                         2.7.3-6                              armhf        Header files and a static library for Python (v2.7)
ii  python2.7-minimal                     2.7.3-6                              armhf        Minimal subset of the Python language (version 2.7)
pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $

也只是为了确认 pocketsphinx 正确地与正确的库相符合:

pi@scarlettpi ~ $ ldd /usr/local/bin/pocketsphinx_continuous 
    /usr/lib/arm-linux-gnueabihf/libcofi_rpi.so (0xb6f9b000)
    libpocketsphinx.so.1 => /usr/local/lib/libpocketsphinx.so.1 (0xb6f5a000)
    libsphinxad.so.0 => /usr/local/lib/libsphinxad.so.0 (0xb6f4e000)
    libsphinxbase.so.1 => /usr/local/lib/libsphinxbase.so.1 (0xb6f07000)
    libpulse.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse.so.0 (0xb6ea8000)
    libpulse-simple.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse-simple.so.0 (0xb6e9c000)
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6e7d000)
    libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6e0c000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6cdd000)
    libjson.so.0 => /lib/arm-linux-gnueabihf/libjson.so.0 (0xb6ccd000)
    libpulsecommon-2.0.so => /usr/lib/arm-linux-gnueabihf/pulseaudio/libpulsecommon-2.0.so (0xb6c6b000)
    libdbus-1.so.3 => /lib/arm-linux-gnueabihf/libdbus-1.so.3 (0xb6c29000)
    libcap.so.2 => /lib/arm-linux-gnueabihf/libcap.so.2 (0xb6c1e000)
    librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb6c0f000)
    libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb6c04000)
    libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6bdb000)
    /lib/ld-linux-armhf.so.3 (0xb6fa8000)
    libX11-xcb.so.1 => /usr/lib/arm-linux-gnueabihf/libX11-xcb.so.1 (0xb6bd2000)
    libX11.so.6 => /usr/lib/arm-linux-gnueabihf/libX11.so.6 (0xb6abe000)
    libxcb.so.1 => /usr/lib/arm-linux-gnueabihf/libxcb.so.1 (0xb6a9f000)
    libICE.so.6 => /usr/lib/arm-linux-gnueabihf/libICE.so.6 (0xb6a82000)
    libSM.so.6 => /usr/lib/arm-linux-gnueabihf/libSM.so.6 (0xb6a73000)
    libXtst.so.6 => /usr/lib/arm-linux-gnueabihf/libXtst.so.6 (0xb6a67000)
    libwrap.so.0 => /lib/arm-linux-gnueabihf/libwrap.so.0 (0xb6a57000)
    libsndfile.so.1 => /usr/lib/arm-linux-gnueabihf/libsndfile.so.1 (0xb69ee000)
    libasyncns.so.0 => /usr/lib/arm-linux-gnueabihf/libasyncns.so.0 (0xb69e2000)
    libattr.so.1 => /lib/arm-linux-gnueabihf/libattr.so.1 (0xb69d4000)
    libXau.so.6 => /usr/lib/arm-linux-gnueabihf/libXau.so.6 (0xb69ca000)
    libXdmcp.so.6 => /usr/lib/arm-linux-gnueabihf/libXdmcp.so.6 (0xb69be000)
    libuuid.so.1 => /lib/arm-linux-gnueabihf/libuuid.so.1 (0xb69b1000)
    libXext.so.6 => /usr/lib/arm-linux-gnueabihf/libXext.so.6 (0xb699b000)
    libXi.so.6 => /usr/lib/arm-linux-gnueabihf/libXi.so.6 (0xb6986000)
    libnsl.so.1 => /lib/arm-linux-gnueabihf/libnsl.so.1 (0xb696a000)
    libFLAC.so.8 => /usr/lib/arm-linux-gnueabihf/libFLAC.so.8 (0xb691f000)
    libvorbisenc.so.2 => /usr/lib/arm-linux-gnueabihf/libvorbisenc.so.2 (0xb67b2000)
    libvorbis.so.0 => /usr/lib/arm-linux-gnueabihf/libvorbis.so.0 (0xb6782000)
    libogg.so.0 => /usr/lib/arm-linux-gnueabihf/libogg.so.0 (0xb6775000)
    libresolv.so.2 => /lib/arm-linux-gnueabihf/libresolv.so.2 (0xb6761000)
pi@scarlettpi ~ $

如果您需要查看有关我的麦克风 ( ps3 eye ) 的任何信息:

不得不把这个扔进pastebin,在这个帖子里没有空间了。

http://pastebin.com/gSDZwRHc

有谁知道为什么这不起作用?如果我的问题需要任何澄清,或者我是否可以提供更多信息来帮助调试,请告诉我。

谢谢。

4

1 回答 1

4

所以我终于让这个人工作了。

我需要实现的几个关键事项:

1. 即使您在 Raspberry Pi 上使用 Pulseaudio,只要仍然安装了 Alsa,您仍然可以使用它。(对其他人来说,这可能看起来很简单,但老实说,我没有意识到我仍然可以同时使用这两种方法)通过 ( syb0rg ) 提示。

2. 在通过 Gstreamer 向 Pocketsphinx发送大量原始音频数据(在我的情况下为.wav格式)时,(队列)是您的朋友。

在命令行上弄乱 gst-launch-0.10 一段时间后,我遇到了一些实际有效的东西:

gst-launch-0.10 alsasrc device=hw:1 ! queue ! audioconvert ! audioresample ! queue ! vader name=vader auto-threshold=true ! pocketsphinx lm=/home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm dict=/home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic hmm=/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k name=listener ! fakesink dump=1

那么这里发生了什么?

  • Gstreamer 正在监听设备 hw:1(这是我的 Ps3 Eye USB 设备)。此设备可能会有所不同,您可以通过运行来确定:
pi@scarlettpi ~ $ pacmd dump
Welcome to PulseAudio! Use "help" for usage information.

....

load-module module-alsa-card device_id="0" name="platform-bcm2835_AUD0.0"

card_name="alsa_card.platform-b​​cm2835_AUD0.0" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

load-module module-udev-detect

load-module module-bluetooth-discover

load-module module-esound-protocol-unix

load-module module-native-protocol-unix

load-module module-gconf

load-module module-default-device-restore

load-module module-rescue-streams

load-module module-always-sink

load-module module-intended-roles

load-module module-console-kit

load-module module-systemd-login

load-module module-position-event-sounds

load-module module-role-cork

load-module module-filter-heuristics

load-module module-filter-apply

load-module module-dbus-protocol

load-module module-switch-on-port-available

load-module module-cli-protocol-unix

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no

deferred_volume=yes card_properties="module-udev-detect.discovered=1"

....

需要注意的重要线路是:

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

那是我的 Playstation 3 Eye,在 device_id=1 上。因此hw:1

  • 来自 ps3 eye 的音频数据被重新采样并添加到 gstreamer 队列中,并且必须通过 ( vader ) 元素,然后才能进入 pocketsphinx。通过打开带有 auto-threshold=true 标志的 vader 元素传递音频,gstreamer 可以确定背景噪声级别,如果您有一个糟糕的声卡或远场麦克风,这可能很重要。这就是 pocketsphinx 元素如何知道话语何时开始和结束的方式。

  • 将常规的 pocketspix 参数添加到我们已经确定的管道中(此处)。

  • 将所有内容都传递到 fakesink 中,因为我们现在不需要听到任何内容,我们只需要 pocketsphinx 来听所有内容。dump=1 标志为我们提供了更多调试信息,以查看正在处理的内容/是否完全接受音频。

** 成功运行后,新的 python 代码如下所示:**

self.pipeline = gst.parse_launch(' ! '.join(['alsasrc device=' + scarlett_config.gimmie('audio_input_device'),
                                           'queue',
                                           'audioconvert',
                                           'audioresample',
                                           'queue',
                                           'vader name=vader auto-threshold=true',
                                           'pocketsphinx lm=' + scarlett_config.gimmie('LM') + ' dict=' + scarlett_config.gimmie('DICT') + ' hmm=' + scarlett_config.gimmie('HMM') + ' name=listener',
                                           'fakesink dump=1']))

希望这可以帮助某人。

注意:如果我的 Gstreamer 管道使用了过多的元素,请见谅。我对 Gstreamer 还很陌生,我对更有效的方法持开放态度。

于 2013-08-13T22:48:56.000 回答