python - python数据框从A列创建列表，其中A列存在于B列的文本中以生成二叉树

Question

我正在尝试使用 anytree ( https://anytree.readthedocs.io/en/latest/ ) 模块构建二叉树，该模块使用的数据框包含 B 列中字符串文本中的 A 列中的项目：

|   | Column A    |          Column B          |
|---|-------------|----------------------------|
| 0 | foo         | prelim_foo                 |
| 1 | bar         | nz(0, prelim_bar)          |
| 2 | beyond      | iif(foo>1, foo - bar, bar) |
| 3 | recognition | bar - beyond               |

我想根据 A 列中的任何项目是否存在于 B 列中，从 A 列创建一个列表，所需的输出类似于：

|   | Column A    |          Column B          |  Column C     |
|---|-------------|----------------------------|---------------|
| 0 | foo         | prelim_foo                 | [foo]         |
| 1 | bar         | nz(0, prelim_bar)          | [bar]         |
| 2 | beyond      | iif(foo>1, foo - bar, bar) | [foo, bar]    |
| 3 | recognition | bar - beyond               | [bar, beyond] |

我参考了这些文章（Read data from a file and create a tree using anytree in python，Read data from a pandas DataFrame and create a tree using anytree in python）创建了一个初步的树节点结构，但是我在提取内容时遇到了麻烦将 B 列向下转换为第二级以外分支的可用节点。

我可以检测 B 列是否包含 A 列中的项目：

df['AinB'] = df['Column B'].str.contains('|'.join(df['Column A']), case=False)

但无法找到一种方法在列 A 的系列中向上查找以放置在与列 B 同一行的 python 列表中。

最终我想使用这些列表来构建一个类似于这样的树：

foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
     └── recognition

或者我可能没有正确考虑识别适合父/子节点结构的位置，它应该像这样组织：

foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
recognition
├── bar
└── beyond

python - python数据框从A列创建列表，其中A列存在于B列的文本中以生成二叉树

0 回答 0

Related

Reference