1

I work in program FreeMind which allows to create trees and import them as HTML files and I need to get every "path" of this tree and put them into list for example to work with each "path" separately after. enter image description here

For example from this code:

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>1</li>
            <li>2</li>
            <li>3</li>
         </ul>
      </li>
      <li>
         The Second List
         <ul>
            <li>4.1</li>
            <li>4.2</li>
         </ul>
      </li>
   </ul>
</body>

I need to get next separate branches of code:

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>1</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>2</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>3</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The Second List
         <ul>
            <li>4.1</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The Second List
         <ul>
            <li>4.2</li>
         </ul>
      </li>
   </ul>
</body>

I am trying that code and getting error "maximum recursion depth exceeded while calling a Python object":

from bs4 import BeautifulSoup

parsed = BeautifulSoup(open("example.html"))

body = parsed.body

def all_nodes(obj):
    for node in obj:
        print node
        all_nodes(node)

print all_nodes(body)

I think that I should explain what I want to do with all this stuff later. I am writing test cases in FreeMind and I am trying to write tool which could create csv table for example with all test cases. But for now I am just trying to get all test cases as texts.

4

1 回答 1

2

Here's one way to do it. It's not that easy and pythonic though. Personally I don't like the solution, but it should be a good start for you. I bet there is a more beautiful and short way to do the same.

The idea is to iterate over all elements that don't have children. For every such element iterate recursively over it's parents until we hit body:

from bs4 import BeautifulSoup, Tag


data = """
your xml goes here
"""
soup = BeautifulSoup(data)
for element in soup.body.find_all():
    children = element.find_all()
    if not children:
        tag = Tag(name=element.name)
        tag.string = element.string
        for parent in element.parentGenerator():
            parent = Tag(name=parent.name)
            parent.append(tag)
            tag = parent
            if tag.name == 'body':
                break
        print tag

It produces:

<body><p>Example</p></body>
<body><ul><li><ul><li>1</li></ul></li></ul></body>
<body><ul><li><ul><li>2</li></ul></li></ul></body>
<body><ul><li><ul><li>3</li></ul></li></ul></body>
<body><ul><li><ul><li>4.1</li></ul></li></ul></body>
<body><ul><li><ul><li>4.2</li></ul></li></ul></body>

UPD (writing parent's text too):

soup = BeautifulSoup(data)
for element in soup.body.find_all():
    children = element.find_all()
    if not children:
        tag = Tag(name=element.name)
        tag.string = element.string
        for parent in element.parentGenerator():
            parent_tag = Tag(name=parent.name)
            if parent.string:
                parent_tag.string = parent.string
            parent_tag.append(tag)
            tag = parent_tag
            if tag.name == 'body':
                break
        print tag

Hope that helps.

于 2014-04-07T02:38:52.620 回答