python - Put bar at the end of every line that includes foo

Question

I have a list with a large number of lines, each taking the subject-verb-object form, eg:

Jane likes Fred
Chris dislikes Joe
Nate knows Jill

To plot a network graph that expresses the different relationships between the nodes in directed color-coded edges, I will need to replace the verb with an arrow and place a color code at the end of each line, thus, somewhat simplified:

Jane -> Fred red;
Chris -> Joe blue;
Nate -> Jill black;

There's only a small number of verbs, so replacing them with an arrow is just a matter of a few search and replace commands. Before doing that, however, I will need to put a color code at the end of every line that corresponds to the line's verb. I'd like to do this using Python.

These are my baby steps in programming, so please be explicit and include the code that reads in the text file.

Thanks for your help!

score 5 · Accepted Answer

It sounds like you will want to research dictionaries and string formatting. In general, if you need help programming, just break down any problem you have into extremely small, discrete chunks, search those chunks independently, and then you should be able to formulate it all into a larger answer. Stack Overflow is a great resource for this type of searching.

Also, if you have any general curiosities about Python, search or browse the official Python documentation. If you find yourself constantly not knowing where to begin, read the Python tutorial or find a book to go through. A week or two investment to get a good foundational knowledge of what you are doing will pay off over and over again as you complete work.

verb_color_map = {
    'likes': 'red',
    'dislikes': 'blue',
    'knows': 'black',
}

with open('infile.txt') as infile: # assuming you've stored your data in 'infile.txt'
    for line in infile:
        # Python uses the name object, so I use object_
        subject, verb, object_ = line.split()
        print "%s -> %s %s;" % (subject, object_, verb_color_map[verb])

score 3 · Accepted Answer

Simple enough; assuming the lists of verbs is fixed and small, this is easy to do with a dictionary and for loop:

VERBS = {
    "likes": "red"
  , "dislikes": "blue"
  , "knows": "black"
  }

def replace_verb (line):
    for verb, color in VERBS.items():
        if verb in line:
            return "%s %s;" % (
                  line.replace (verb, "->")
                , color
                )
    return line

def main ():
    filename = "my_file.txt"
    with open (filename, "r") as fp:
        for line in fp:
            print replace_verb (line)

# Allow the module to be executed directly on the command line
if __name__ == "__main__":
    main ()

score 2 · Accepted Answer

verbs = {"dislikes":"blue", "knows":"black", "likes":"red"}
for s in open("/tmp/infile"):
  s = s.strip()
  for verb in verbs.keys():
    if (s.count(verb) > 0):
      print s.replace(verb,"->")+" "+verbs[verb]+";"
      break

Edit: Rather use "for s in open"

score 1 · Accepted Answer

Are you sure this isn't a little homeworky :) If so, it's okay to fess up. Without going into too much detail, think about the tasks you're trying to do:

For each line:

read it
split it into words (on whitespace - .split() )
convert the middle word into a color (based on a mapping -> cf: python dict()
print the first word, arrow, third word and the color

Code using NetworkX (networkx.lanl.gov/)

'''
plot relationships in a social network
'''

import networkx
## make a fake file 'ex.txt' in this directory
## then write fake relationships to it.
example_relationships = file('ex.txt','w') 
print >> example_relationships, '''\
Jane Doe likes Fred
Chris dislikes Joe
Nate knows Jill \
'''
example_relationships.close()

rel_colors = {
    'likes':  'blue',
    'dislikes' : 'black',
    'knows'   : 'green',
}

def split_on_verb(sentence):
    ''' we know the verb is the only lower cased word

    >>> split_on_verb("Jane Doe likes Fred")
    ('Jane Does','Fred','likes')

    '''
    words = sentence.strip().split()  # take off any outside whitespace, then split
                                       # on whitespace
    if not words:
        return None  # if there aren't any words, just return nothing

    verbs = [x for x in words if x.islower()]
    verb = verbs[0]  # we want the '1st' one (python numbers from 0,1,2...)
    verb_index = words.index(verb) # where is the verb?
    subject = ' '.join(words[:verb_index])
    obj =  ' '.join(words[(verb_index+1):])  # 'object' is already used in python
    return (subject, obj, verb)


def graph_from_relationships(fh,color_dict):
    '''
    fh:  a filehandle, i.e., an opened file, from which we can read lines
        and loop over
    '''
    G = networkx.DiGraph()

    for line in fh:
        if not line.strip():  continue # move on to the next line,
                                         # if our line is empty-ish
        (subj,obj,verb) = split_on_verb(line)
        color = color_dict[verb]
        # cf: python 'string templates', there are other solutions here
        # this is the 
        print "'%s' -> '%s' [color='%s'];" % (subj,obj,color)
        G.add_edge(subj,obj,color)
        # 

    return G

G = graph_from_relationships(file('ex.txt'),rel_colors)
print G.edges()
# from here you can use the various networkx plotting tools on G, as you're inclined.

score 0 · Accepted Answer

Python 2.5:

import sys
from collections import defaultdict

codes = defaultdict(lambda: ("---", "Missing action!"))
codes["likes"] =    ("-->", "red")
codes["dislikes"] = ("-/>", "green")
codes["loves"] =    ("==>", "blue")

for line in sys.stdin:
    subject, verb, object_ = line.strip().split(" ")
    arrow, color = codes[verb]
    print subject, arrow, object_, color, ";"

score 0 · Accepted Answer

In addition to the question, Karasu also said (in a comment on one answer): "In the actual input both subjects and objects vary unpredictably between one and two words."

Okay, here's how I would solve this.

color_map = \
{
    "likes" : "red",
    "dislikes" : "blue",
    "knows" : "black",
}

def is_verb(word):
    return word in color_map

def make_noun(lst):
    if not lst:
        return "--NONE--"
    elif len(lst) == 1:
        return lst[0]
    else:
        return "_".join(lst)


for line in open("filename").readlines():
    words = line.split()
    # subject could be one or two words
    if is_verb(words[1]):
        # subject was one word
        s = words[0]
        v = words[1]
        o = make_noun(words[2:])
    else:
        # subject was two words
        assert is_verb(words[2])
        s = make_noun(words[0:2])
        v = words[2]
        o = make_noun(words[3:])
    color = color_map[v]
    print "%s -> %s %s;" % (s, o, color)

Some notes:

0) We don't really need "with" for this problem, and writing it this way makes the program more portable to older versions of Python. This should work on Python 2.2 and newer, I think (I only tested on Python 2.6).

1) You can change make_noun() to have whatever strategy you deem useful for handling multiple words. I showed just chaining them together with underscores, but you could have a dictionary with adjectives and throw those out, have a dictionary of nouns and choose those, or whatever.

2) You could also use regular expressions for fuzzier matching. Instead of simply using a dictionary for color_map you could have a list of tuples, with a regular expression paired with the replacement color, and then when the regular expression matches, replace the color.

score 0 · Accepted Answer

Here is an improved version of my previous answer. This one uses regular expression matching to make a fuzzy match on the verb. These all work:

Steve loves Denise
Bears love honey
Maria interested Anders
Maria interests Anders

The regular expression pattern "loves?" matches "love" plus an optional 's'. The pattern "interest.*" matches "interest" plus anything. Patterns with multiple alternatives separated by vertical bars match if any one of the alternatives matches.

import re

re_map = \
[
    ("likes?|loves?|interest.*", "red"),
    ("dislikes?|hates?", "blue"),
    ("knows?|tolerates?|ignores?", "black"),
]

# compile the regular expressions one time, then use many times
pat_map = [(re.compile(s), color) for s, color in re_map]

# We dont use is_verb() in this version, but here it is.
# A word is a verb if any of the patterns match.
def is_verb(word):
    return any(pat.match(word) for pat, color in pat_map)

# Return color from matched verb, or None if no match.
# This detects whether a word is a verb, and looks up the color, at the same time.
def color_from_verb(word):
    for pat, color in pat_map:
        if pat.match(word):
            return color
    return None

def make_noun(lst):
    if not lst:
        return "--NONE--"
    elif len(lst) == 1:
        return lst[0]
    else:
        return "_".join(lst)


for line in open("filename"):
    words = line.split()
    # subject could be one or two words
    color = color_from_verb(words[1])
    if color:
        # subject was one word
        s = words[0]
        o = make_noun(words[2:])
    else:
        # subject was two words
        color = color_from_verb(words[1])
        assert color
        s = make_noun(words[0:2])
        o = make_noun(words[3:])
    print "%s -> %s %s;" % (s, o, color)

I hope it is clear how to take this answer and extend it. You can easily add more patterns to match more verbs. You could add logic to detect "is" and "in" and discard them, so that "Anders is interested in Maria" would match. And so on.

If you have any questions, I'd be happy to explain this further. Good luck.

python - Put bar at the end of every line that includes foo

7 回答 7

Related

Reference