I am in an intro to programming class and one of our final projects is to create a sentence generator. The requirements are that we have to take a sample input, strip it down to only lower case letters, use the Markov Model to determine the transition probabilities (a to e, e to t, etc), and store them into dictionaries. For example the dictionary for e would looks something like this:
e_trans = {'em': 0.0769, 'e ': 0.2307, 'ea': 0.3077, 'es': 0.1538, 'et': 0.0769, 'ee': 0.1538}
Then we have to create a generator that uses these probabilities to create random sentences.
I haven't gotten very far because I don't even know where to start to get the probabilities. We cannot use any of the Markov Model packages for python. Any help would be greatly appreciated.
The code I have so far is:
import random
inputFile = open("input.txt", 'r')
rawdata = inputFile.read()
rawdata = rawdata.lower()
rawdata = rawdata.replace('-',' ')
data = (' ')
for character in rawdata:
if ord(character) == 32:
data += character
elif ord(character) > 96 and ord(character) < 123:
data += character
data += ' '
print(data)
S = {}
for letter in data:
if letter not in S:
S += letter
print(S)
inputFile.close()