java - 如何使用 Java 在印地语和拉丁语字符之间音译印地语文本？

Question

如何使用 Java 将用英文字母书写的印地语意思转换为印地语？

例如。

输入文本为：anil NE lath marke apko Ganga me hi Fenk diya。

在印地语

输出文本为：अनिल ने लात मार्के आपको गंगा में ही फेंक दिया

如何使用 Java 或任何 Java API 进行转换？

我喜欢谷歌以外的一个名为 Jitter 的 API，但出现错误

Source is: inko
Input is: a2b45xdsfsdf
Output is: 
Matches is: 0
Exception in thread "main" org.jtr.transliterate.CharacterParseException: No valid delimiter found for start of expression
    at org.jtr.transliterate.Perl5Parser.parsePerlString(Perl5Parser.java:133)

源代码是

/*
 * Copyright (c) 2001-2005, Nicholas Cull
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 *  * Neither the name "jtr" nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
 * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

package org.jtr.transliterate;

/**
 * A utility class for providing Perl 5 syntactic sugar on top of the
 * {@link CharacterParser} class. This parses Perl-style transliteration strings
 * into a form suitable for <code>CharacterParser</code>. For instance, the
 * string <code>"tr/a-zA-Z/0-9a-zA-Z/cd"</code> is parsed into two strings
 * <code>"a-zA-Z"</code>, <code>"0-9a-zA-Z"</code>, and the flags
 * <code>COMPLEMENT_MASK | DELETE_UNREPLACEABLES_MASK</code>.
 *
 * @author <a href="mailto:run2000@users.sourceforge.net">Nicholas Cull</a>
 * @version $Id: Perl5Parser.java,v 1.3 2005/03/14 06:40:18 run2000 Exp $
 * @since 1.1
 */
public final class Perl5Parser {

    /** Initial state of the sequence parser. */
    private static final short INITIAL_STATE = 0;
    /** Sequence parser encountered an escape character. */
    private static final short ESCAPE_STATE = 1;
    /** Sequence parser encountered the delimiter. */
    private static final short DELIMITER_STATE = 2;

    /** Private constructor to indicate this is a static class. */
    private Perl5Parser() {
    }

    /**
     * <p>Parses the given string in Perl syntax and returns a populated
     * {@link CharacterReplacer} object that can be used to perform the
     * specified transliteration.</p>
     *
     * <p>This is a simple factory method that calls the {@link #parsePerlString
     * parsePerlString} method below, creates a new {@link CharacterReplacer}
     * object and populates it with the results.</p>
     *
     * @param source the String to be parsed and compiled
     * @return a new CharacterReplacer ready for transliterations
     * @throws CharacterParseException something went wrong during parsing
     * @throws NullPointerException source is <code>null</code>
     */
    public static CharacterReplacer makeReplacer( String source )
            throws CharacterParseException {

        StringBuffer input = new StringBuffer();
        StringBuffer output = new StringBuffer();
        int flags = 0;
        CharacterReplacer replacer;

        flags = parsePerlString( source, input, output );
        replacer = new CharacterReplacer( input.toString(), output.toString() );
        replacer.setFlags( flags );
        return replacer;
    }

    /**
     * <p>Parses the given Perl-style transliteration string into two parts:</p>
     * <ol>
     * <li>The input character string to be transliterated
     * <li>The replacement character string
     * </ol>
     * <p>These strings can then be fed into the constructors for
     * {@link CharacterReplacer}. It also returns any flags encountered at the
     * end of the string into a form suitable for CharacterReplacer.</p>
     *
     * @param source the string to be parsed
     * @param input (out) the characters to be transliterated
     * @param output (out) the replacement characters
     * @return any flags parsed at the end of the character sequence
     * @throws CharacterParseException there was a problem parsing the source
     * String
     * @throws NullPointerException source is <code>null</code>
     */
    public static int parsePerlString( String source, StringBuffer input,
            StringBuffer output ) throws CharacterParseException {

        int length = source.length();
        int pos = 0;
        int flags = 0;
        char delimiter;

        if( length < 3 ) {
            throw new CharacterParseException(
                    "Source is too small to be parsed", pos );
        }

        if( input == null || output == null ) {
            throw new CharacterParseException(
                    "String buffers have not been initialized", pos );
        }

        if( source.startsWith( "tr" )) {
            pos = 2;
        } else if( source.startsWith( "y" )) {
            pos = 1;
        }

        delimiter = source.charAt( pos );
        if( delimiter == '-' || delimiter == '\\' ||
                Character.isLetterOrDigit( delimiter )) {
            throw new CharacterParseException(
                    "No valid delimiter found for start of expression", pos );
        }

        pos++;
        pos = parseSequence( source, pos, delimiter, input );

        if( pos == length ) {
            throw new CharacterParseException(
                    "Cannot parse replacement sequence, no character sequence found",
                    pos );
        }

        pos = parseSequence( source, pos, delimiter, output );

        // Parse any flags at the end
        while( pos < length ) {
            char flag = source.charAt( pos );
            switch( flag ) {
                case 'c':
                    flags = flags | CharacterParser.COMPLEMENT_MASK;
                    break;

                case 'd':
                    flags = flags | CharacterParser.DELETE_UNREPLACEABLES_MASK;
                    break;

                case 's':
                    flags = flags | CharacterParser.SQUASH_DUPLICATES_MASK;
                    break;

                default:
                    throw new CharacterParseException(
                            "Unknown flag passed into character parser", pos );
            }
            pos++;
        }
        return flags;
    }

    /**
     * Parse the first or second character sequence and place the parsed result
     * into the given StringBuffer. The source string is scanned from initial
     * position pos until an unescaped delimiter character is found. We use a
     * simple finite state machine to determine when we encounter an escape
     * character or a delimiter.
     *
     * @param source the source String to be scanned
     * @param pos the starting position for the scan
     * @param delimiter the delimiter character to indicate the end of the
     * sequence
     * @param buffer the character buffer to store the parsed result
     * @return the new position of the parser
     * @throws NullPointerException source or buffer are <code>null</code>
     */
    private static int parseSequence( String source, int pos, char delimiter,
                                      StringBuffer buffer ) {
        int length = source.length();
        short state = INITIAL_STATE;
        int startPos = pos;
        char curr = '\0';

        while(( pos < length ) && ( state != DELIMITER_STATE )) {
            curr = source.charAt( pos );
            switch( state ) {
                case INITIAL_STATE:
                    if( curr == '\\' ) {
                        state = ESCAPE_STATE;
                    } else if( curr == delimiter ) {
                        // Copy the current source to the buffer
                        buffer.append( source.substring( startPos, pos ));
                        state = DELIMITER_STATE;
                    }
                    break;

                case ESCAPE_STATE:
                    if( curr == delimiter ) {
                        // Previous character was to escape the delimiter.
                        // Have to add the previous characters to the buffer.
                        buffer.append( source.substring( startPos, pos - 1 ));
                        startPos = pos;
                    }
                    state = INITIAL_STATE;
                    break;
            }
            pos++;
        }

        if( state != DELIMITER_STATE ) {
            buffer.append( source.substring( startPos ));
        }
        return pos;
    }

    /**
     * A simple test case for this class.
     *
     * @param args ignored
     * @throws Exception if an exception is encountered, throw it to the caller
     */
    public static void main( String args[] ) throws Exception {
        String source = "inko";
        String input = "a2b45xdsfsdf";
        String output = "";
        int matches = 0;

        try {
            CharacterReplacer replacer = makeReplacer( source );
            output = replacer.doReplacement( input );
            matches = replacer.getMatches();
        } finally {
            System.out.println( "Source is: " + source );
            System.out.println( "Input is: " + input );
            System.out.println( "Output is: " + output );
            System.out.println( "Matches is: " + matches );
        }
    }
}

at org.jtr.transliterate.Perl5Parser.makeReplacer(Perl5Parser.java:82)
at org.jtr.transliterate.Perl5Parser.main(Perl5Parser.java:240)

score 1 · Accepted Answer

如果我对您的理解正确，那么您希望能够音译成印地语文本。（即从一种书写系统转换为另一种书写系统）。这通常比翻译容易得多（即从一种语言转换为另一种语言）。

我不知道有一个图书馆可以做到这一点，但是这个关于印地语音译的维基词典页面可能会让你开始。

score 0 · Accepted Answer

I seen an android dictionary hinkhoj which does the same thing without internet access. So definitely it is possible in java.

Replacing each letter or group of letters with corresponding Hindi unicode will work. Like replacing 'ma' with '\u092E'

Sample working program;

public class EnglishToHindiTranslator {

    static Map<String, String> phoneticMap = new HashMap<String, String>();
    static Map<String, String> maatraMap = new HashMap<String, String>();

    static {
        phoneticMap.put("ma","\u092E");
        phoneticMap.put("ta","\u0924");
        phoneticMap.put("t","\u0924\u094D");
        phoneticMap.put("ra","\u0930");
        maatraMap.put("a","\u093E");
    }

    public static void main(String[] args) throws UnsupportedEncodingException {
        String engWord = "maatra";
        engWord = engWord.replaceAll("ma",phoneticMap.get("ma") );
        engWord = engWord.replaceAll("t",phoneticMap.get("t") );
        engWord = engWord.replaceAll("ra",phoneticMap.get("ra") );
        engWord = engWord.replaceAll("a",maatraMap.get("a") );

        System.out.println(new String(engWord.getBytes("UTF-8")));
    }
}

java - 如何使用 Java 在印地语和拉丁语字符之间音译印地语文本？

2 回答 2

Related

Reference