3

我有一个我猜是逻辑问题。我正在使用 C# 进行编码,但欢迎使用一般的伪代码解决方案。

我有这个文本文件,其中包含例如这个文本:

blah "hello john"
blah 'the code is "flower" '
blah "good night"

我想遍历双引号并对它们做一些事情,但我想忽略单引号中包含的双引号。我得到了开始双引号和结束双引号的位置(string data包含文本文件的内容):

C#

// Start searching from beginning
int quotestart = 0, quoteend = 0;

while (data.IndexOf('"', quotestart) != -1)
{
  // Get opening double quote
  quotestart = data.IndexOf('"', quotestart);
  // Get ending double quote
  quoteend = data.IndexOf('"', quotestart + 1);

  string sub = data.Substring(quotestart + 1, quoteend - quotestart - 1);
  Console.WriteLine(sub);

  // Set the start position for the next round
  quotestart = quoteend + 1;
}

使用我的代码,输出将是:

hello john
flower
good night

因为“花”在单引号内,我希望我的输出是:

hello john
good night

编辑

我目前正在研究一种方法,首先用“A”填充单引号之间的所有数据。这样,当我遍历双引号时,单引号之间的任何数据都会被忽略。不确定这是否是正确的方法。

4

4 回答 4

7

我尝试在谷歌上搜索有限状态机,但没有接受过计算机工程方面的正式培训,我必须承认我有点迷茫。你有任何额外的指针吗?

FSM 是最简单的计算机形式之一。这个想法是您拥有有限数量的“状态”信息和稳定的输入流。每个输入都会导致状态以可预测的方式发生变化,仅基于当前状态和当前输入,并导致发生可预测的输出

因此,假设您的输入是单个字符,而您的输出是单个字符或“空”字符。这是一个可以满足您要求的 FSM:

  • 状态是OUTSIDE和。INSIDEDOUBLEINSIDESINGLE
  • 输入是"和。(WOLOG代表任何其他角色。)'xx

我们有三个状态和三个输入,所以有九种可能的组合。

  • 如果我们是OUTSIDE和得到x,停留OUTSIDE和发射null
  • 如果我们是OUTSIDE和得到",去INSIDEDOUBLE和发射null
  • 如果我们是OUTSIDE和得到',去INSIDESINGLE和发射null
  • 如果我们是INSIDEDOUBLE并得到x,停留INSIDEDOUBLE并发射x
  • 如果我们是INSIDEDOUBLE和得到",去OUTSIDE和发射null
  • 如果我们是INSIDEDOUBLE并得到',停留INSIDEDOUBLE并发射'
  • 如果我们是INSIDESINGLE并得到x,停留INSIDESINGLE并发射null
  • 如果我们是INSIDESINGLE并得到",停留INSIDESINGLE并发射null
  • 如果我们是INSIDESINGLE和得到',去OUTSIDE和发射null

唯一剩下的就是说开始状态是OUTSIDE.

所以让我们假设输入是1 " 2 " 3 ' 4 " 5 " ' 6. 状态和输出是:

  • OUTSIDE得到1,发出null,停留OUTSIDE
  • OUTSIDE得到",发出null,去INSIDEDOUBLE
  • INSIDEDOUBLE得到2,发出2,停留INSIDEDOUBLE
  • INSIDEDOUBLE得到",发出null,去OUTSIDE
  • OUTSIDE得到3,发出null,停留OUTSIDE
  • OUTSIDE得到',发出null,去INSIDESINGLE

...自己填写其余部分。

这足以让您编写代码吗?

于 2013-10-22T22:49:29.110 回答
5

很好的解决方案;对于小型 FSM,使用 switch 语句是执行此操作的传统方法,但是当状态和输入的数量变得庞大且复杂时,它会变得笨拙。这是一个更易于扩展的替代解决方案:表驱动解决方案。也就是将关于转移和动作的事实放入一个数组中,然后 FSM 无非就是一系列数组查找:

// States
const int Outside = 0;
const int InDouble = 1;
const int InSingle = 2;

// Inputs
const int Other = 0;
const int DoubleQuote = 1;
const int SingleQuote = 2;

static readonly int[,] stateTransitions =
{   /*               Other     DoubleQ   SingleQ */
    /* Outside */  { Outside,  InDouble, InSingle },
    /* InDouble */ { InDouble, Outside,  InDouble },
    /* InSingle */ { InSingle, InSingle, Outside }
};

// Do we emit the character or ignore it?
static readonly bool[,] actions =
{   /*              Other   DoubleQ SingleQ */
    /* Outside */ { false,  false,  false },
    /* InDouble */{ true,   false,  true  },
    /* InSingle */{ false,  false,  false }
};

static int Classify(char c)
{
    switch (c)
    {
        case '\'': return SingleQuote;
        case '\"': return DoubleQuote;
        default: return Other;
    }
}

static IEnumerable<char> FSM(IEnumerable<char> inputs)
{
    int state = Outside;
    foreach (char input in inputs)
    {
        int kind = Classify(input);
        if (actions[state, kind]) 
            yield return input;
        state = stateTransitions[state, kind];
    }
}

现在我们可以得到结果

string.Join("", FSM(@"1""2'3""4""5'6""7'8""9""A'B"))
于 2013-10-23T18:48:57.847 回答
2

非常感谢 Eric Lippert 提供了这个解决方案背后的逻辑。我在下面提供我的 C# 解决方案,以防有人需要。为了清楚起见,我留下了一些不必要的重新分配。

string state = "outside";

for (int i = 0; i < data.Length; i++)
{
    c = data[i];
    switch (state)
    {
        case "outside":
            switch (c)
            {
                case '\'':
                    state = "insidesingle";
                    break;
                case '"':
                    state = "insidedouble";
                    break;
                default:
                    state = "outside";
                    break;
            }
            break;

        case "insidedouble":
            switch (c)
            {
                case '\'':
                    state = "insidedouble";
                    Console.Write(c);
                    break;
                case '"':
                    state = "outside";
                    break;
                default:
                    state = "insidedouble";
                    Console.Write(c);
                    break;
            }
            break;  

        case "insidesingle":
            switch (c)
            {
                case '\'':
                    state = "outside";
                    break;
                case '"':
                    state = "insidesingle";
                    break;
                default:
                    state = "insidesingle";
                    break;
            }
            break;
    }
}
于 2013-10-23T16:33:28.623 回答
2

只是为了好玩,我决定使用一个名为stateless的非常轻量级的 FSM 库来解决这个问题。

如果您要使用这个库,下面是代码的样子。

就像 Eric 的解决方案一样,可以轻松更改以下代码以满足新要求。

void Main()
{
    Console.WriteLine(string.Join("", GetCharacters(@"1""2'3""4""5'6""7'8""9""A'B")));  
}

public enum CharacterType
{
    Other,
    SingleQuote,
    DoubleQuote
}

public enum State
{
    OutsideQuote,
    InsideSingleQuote,
    InsideDoubleQuote
}

public IEnumerable<char> GetCharacters(string input)
{
    //Initial state of the machine is OutSideQuote.
    var sm = new StateMachine<State, CharacterType>(State.OutsideQuote);

    //Below, we configure state transitions.
    //For a given state, we configure how CharacterType 
    //transitions state machine to a new state.

    sm.Configure(State.OutsideQuote)
        .Ignore(CharacterType.Other)        
        //If you are outside quote and you receive a double quote, 
        //state transitions to InsideDoubleQuote.
        .Permit(CharacterType.DoubleQuote, State.InsideDoubleQuote)
        //If you are outside quote and you receive a single quote,
        //state transitions to InsideSingleQuote.
        //Same logic applies for other state transitions below.
        .Permit(CharacterType.SingleQuote, State.InsideSingleQuote);

    sm.Configure(State.InsideDoubleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.SingleQuote)
        .Permit(CharacterType.DoubleQuote, State.OutsideQuote);

    sm.Configure(State.InsideSingleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.DoubleQuote)
        .Permit(CharacterType.SingleQuote, State.OutsideQuote);

    foreach (var character in input)
    {
        var characterType = GetCharacterType(character);
        sm.Fire(characterType);
        if(sm.IsInState(State.InsideDoubleQuote) && characterType != CharacterType.DoubleQuote)
            yield return character;
    }       

}

public CharacterType GetCharacterType(char input)
{
    switch (input)
    {
        case '\'': return CharacterType.SingleQuote;
        case '\"': return CharacterType.DoubleQuote;
        default: return CharacterType.Other;
    }
}
于 2013-10-23T22:49:03.520 回答