2

下载并训练SyntaxNet 后,我​​正在尝试编写一个程序,该程序可以打开新的/现有文件,例如 AutoCAD 文件,并通过分析文本将文件保存在特定目录中: 打开 LibreOffice 文件 X。考虑 SyntaxNet 的输出为:

echo "save AUTOCAD file  X in directory Y" | ./test.sh > output.txt


Input: save AUTOCAD file X in directory Y
Parse:
save VB ROOT
 +-- X NNP dobj
 |   +-- file NN compound
 |       +-- AUTOCAD CD nummod
 +-- directory NN nmod
     +-- in IN case
     +-- Y CD nummod

首先,我考虑将解析后的文本更改为 XML 格式,然后使用语义分析(如SPARQL)解析 XML 文件以找到 ROOT=save、dobj=X 和 nummode=Y,然后编写一个可以执行相同操作的 python 程序。在文中

  1. 我不知道如果我将解析的文本更改为 XML,然后使用使用查询的语义分析,以便 ROOT与其对应的函数或保存的脚本 匹配dobj,在一个目录中提及nummode

  2. 我有一些想法用subprocess包将 python 连接到终端,但我没有找到任何可以帮助我从终端保存例如 AUTOCAD 文件或任何其他文件的东西,或者我是否需要.sh使用蟒蛇的帮助?

我对文本的句法和语义分析进行了大量研究,例如Christian Chiarcos, 2011 , Hunter and Cohen, 2006Verspoor et al., 2015,现在还研究了Microsoft Cortana , Sirius , google但没有一个通过他们如何将解析的文本更改为执行命令的详细信息,这使我得出结论,这项工作太容易被谈论,但由于我不是计算机科学专业的,所以我无法弄清楚我能做些什么。

4

1 回答 1

10

我是计算机科学世界和 SyntaxNet 的初学者。我写了一个简单的 SyntaxNet-Python 算法,它使用 SyntaxNet 分析用户插入的文本命令,“打开我用 LibreOffice writer 用实验室编写器编写的文件簿”,然后用 python 算法分析 SyntaxNet 输出以便打开它是一个执行命令,在这种情况下,使用 Linux、Ubuntu 14.04 环境中的 LibreOffice 以任何支持的格式打开一个文件。你可以在这里看到LibreOffice 定义的不同命令行,以便在这个包中使用不同的应用程序。

  1. 安装并运行 SyntaxNet(此处解释的安装过程)后,shell 脚本在目录中打开demo.sh并删除函数 ( ) 以便从 SyntaxNet 获得输出而不是树格式输出。~/models/syntaxnet/suntaxnet/conl2treeline 54 to 56tab delimited

  2. 在终端窗口中键入此命令:

    echo '用libreOffice writer打开我和实验室作家写的档案簿' | 语法网/demo.sh > output.txt

output.txt文档保存在demo.sh存在的目录中,它将类似于下图:

在此处输入图像描述

  1. 作为output.txt输入文件并使用下面的 python 算法分析 SyntaxNet 输出,并从 LibreOffice 包中识别您想要目标应用程序的文件的名称以及用户想要使用的命令。

#!/bin/sh

import csv

import subprocess

import sys

import os

#get SyntaxNet output as the Python algorithm input file
filename='/home/username/models/syntaxnet/work/output.txt'

#all possible executive commands for opening any file with any format with Libreoffice file
commands={
('open',  'libreoffice',  'writer'):  ('libreoffice', '--writer'),
('open',  'libreoffice',  'calculator'):  ('libreoffice' ,'--calc'),
('open',  'libreoffice',  'draw'):  ('libreoffice' ,'--draw'),
('open',  'libreoffice',  'impress'): ('libreoffice' ,'--impress'),
('open',  'libreoffice',  'math'):  ('libreoffice' ,'--math'),
('open',  'libreoffice',  'global'):  ('libreoffice' ,'--global'),
('open',  'libreoffice',  'web'): ('libreoffice' ,'--web'),
('open',  'libreoffice',  'show'):  ('libreoffice', '--show'),
}
#all of the possible synonyms of the application from Libreoffice 
comments={
 'writer': ['word','text','writer'],
 'calculator': ['excel','calc','calculator'],
 'draw': ['paint','draw','drawing'],
 'impress': ['powerpoint','impress'],
 'math': ['mathematic','calculator','math'],
 'global': ['global'],
 'web': ['html','web'],
 'show':['presentation','show']
 }

root ='ROOT'            #ROOT of the senctence
noun='NOUN'             #noun tagger
verb='VERB'             #verb tagger
adjmod='amod'           #adjective modifier
dirobj='dobj'           #direct objective
apposmod='appos'        # appositional modifier
prepos_obj='pobj'       # prepositional objective
app='libreoffice'       # name of the package
preposition='prep'      # preposition
noun_modi='nn'          # noun modifier 

#read from Syntaxnet output tab delimited textfile
def readata(filename):
    file=open(filename,'r')
    lines=file.readlines()
    lines=lines[:-1]
    data=csv.reader(lines,delimiter='\t')
    lol=list(data)
    return  lol

# identifies the action, the name of the file and whether the user mentioned the name of the application implicitely  
def exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi):
    interprete='null'
    lists=readata(filename)
    for sublist in lists:
        if sublist[7]==root and sublist[3]==verb: # when the ROOT is verb the dobj is probably the name of the file you want to have
                action=sublist[1]
                dep_num=sublist[0]
                for sublist in lists:
                    if sublist[6]==dep_num and sublist[7]==dirobj:
                        direct_object=sublist[1]
                        dep_num=sublist[0]
                        dep_num_obj=sublist[0]
                        for sublist in lists:
                            if direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==apposmod:
                                direct_object=sublist[1]
                            elif  direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==adjmod:
                                direct_object=sublist[1]
                for sublist in lists:
                    if sublist[6]==dep_num_obj and sublist[7]==adjmod:
                            for key, v in  comments.iteritems():
                                if sublist[1] in v:
                                    interprete=key
                for sublist in lists:
                    if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
                        dep_num_nn=sublist[0]
                        for key, v in  comments.iteritems():
                            if sublist[1] in v:
                                interprete=key
                                print interprete
                        if interprete=='null':
                            for sublist in lists:
                                if sublist[6]==dep_num_nn and sublist[7]==noun_modi:
                                    for key, v in  comments.iteritems():
                                        if sublist[1] in v:
                                            interprete=key
        elif  sublist[7]==root and sublist[3]==noun: # you have to find the word which is in a adjective form and depends on the root
            dep_num=sublist[0]
            dep_num_obj=sublist[0]
            direct_object=sublist[1]
            for sublist in lists:
                if sublist[6]==dep_num and sublist[7]==adjmod:
                    actionis=any(t1==sublist[1] for (t1, t2, t3) in commands)
                    if actionis==True:
                        action=sublist[1]
                elif sublist[6]==dep_num and sublist[7]==noun_modi:
                    dep_num=sublist[0]
                    for sublist in lists:
                        if sublist[6]==dep_num and sublist[7]==adjmod:
                            if any(t1==sublist[1] for (t1, t2, t3) in commands):
                                action=sublist[1]
            for sublist in lists:
                if direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==apposmod and sublist[1]!=action:
                    direct_object=sublist[1]
                if  direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==adjmod and sublist[1]!=action:
                    direct_object=sublist[1]
            for sublist in lists:
                if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
                    dep_num_obj=sublist[0]
                    for key, v in  comments.iteritems():
                        if sublist[1] in v:
                            interprete=key
                        else:
                            for sublist in lists:
                                if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
                                    for key, v in  comments.iteritems():
                                        if sublist[1] in v:
                                            interprete=key
    return action, direct_object, interprete

action, direct_object, interprete = exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi)

# find the application (we assume we know user want to use libreoffice but we donot know what subapplication should be used)
def application(app,prepos_obj,preposition,noun_modi):
    lists=readata(filename)
    subapp='not mentioned'
    for sublist in lists:
        if sublist[1]==app:
            dep_num=sublist[6]
            for sublist in lists:
                if sublist[0]==dep_num and sublist[7]==prepos_obj:
                    actioni=any(t3==sublist[1] for (t1, t2, t3) in commands)
                        if actioni==True:
                            subapp=sublist[1]
                        else:
                            for sublist in lists:
                                if sublist[6]==dep_num and sublist[7]==noun_modi:
                                    actioni=any(t3==sublist[1] for (t1, t2, t3) in commands)
                                    if actioni==True:
                                        subapp=sublist[1]
                        elif sublist[0]==dep_num and sublist[7]==preposition:
                            sublist[6]=dep_num
                            for subline in lists:
                                if subline[0]==dep_num and subline[7]==prepos_obj:
                                    if any(t3==sublist[1] for (t1, t2, t3) in commands):
                                        subapp=sublist[1]
                                    else:
                                        for subline in lists:
                                            if subline[0]==dep_num and subline[7]==noun_modi:
                                                if any(t3==sublist[1] for (t1, t2, t3) in commands):
                                                    subapp=sublist[1]
    return subapp

sub_application=application(app,prepos_obj,preposition,noun_modi)

if sub_application=='not mentioned' and interprete!='null':
    sub_application=interprete
elif sub_application=='not mentioned' and interprete=='null':
    sub_application=interprete

# the format of file
def format_function(sub_application):
    subapp=sub_application
    Dobj=exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi)[1]
    if subapp!='null':
        if subapp=='writer':
            a='.odt'
            Dobj=Dobj+a
        elif subapp=='calculator':
            a='.ods'
            Dobj=Dobj+a
        elif subapp=='impress':
            a='.odp'
            Dobj=Dobj+a
        elif subapp=='draw':
            a='.odg'
            Dobj=Dobj+a
        elif subapp=='math':
            a='.odf'
            Dobj=Dobj+a
        elif subapp=='math':
            a='.odf'
            Dobj=Dobj+a
        elif subapp=='web':
            a='.html'
            Dobj=Dobj+a
    else:
        Dobj='null'
    return Dobj

def get_filepaths(directory):
    myfile=format_function(sub_application)
    file_paths = []  # List which will store all of the full filepaths.
    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
        # Join the two strings in order to form the full filepath.
            if filename==myfile:
                filepath = os.path.join(root, filename)
                file_paths.append(filepath)  # Add it to the list.
    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/home/ubuntu/")

if full_file_paths==[]:
    print 'No file with name %s is found' % format_function(sub_application)
if full_file_paths!=[]:
    path=full_file_paths
    prompt='> '
    if len(full_file_paths) >1:
        print full_file_paths
        print 'which %s do you mean?'% subapp
        inputname=raw_input(prompt)
        if inputname in full_file_paths:
            path=inputname
        #the main code structure
    if sub_application!='null':
        command= commands[action,app,sub_application]
        subprocess.call([command[0],command[1],path[0]])
    else:
        print "The sub application is not mentioned clearly"

我再次说我是一个初学者,代码可能看起来不那么整洁或专业,但我只是试图将我对这个迷人的所有知识 SyntaxNet用于一个实用的算法。 这个简单的算法可以打开文件:

  1. LibreOffice使用例如支持的任何格式.odt,.odf,.ods,.html,.odp

  2. 它可以理解不同应用程序中的隐式引用LibreOffice,例如:“用 libreoffice 打开文本文件簿”而不是“用 libreoffice writer 打开文件簿”

  3. 可以克服 SyntaxNet 解释被称为形容词的文件名的问题。

于 2016-07-18T19:43:09.740 回答