python - 书的 CSV 解析

Question

我在解析一个 csv 文件的项目时遇到问题，该文件将包含教科书的部分和小节，看起来像这样：

Chapter, Section, Lesson  #this line shows how the book will be organized
Ch1Name, Secion1Name, Lesson1Name
Ch1Name, Secion2Name, Lesson1Name
Ch1Name, Secion2Name, Lesson2Name

我正在为每个部分创建 Django 模型对象，并且每个部分都有一个 parent 属性，它是它所在的父部分。我无法想出一种方法来遍历 csv 文件，这样父作业是正确的。任何关于如何开始的想法都会很棒。

score 1 · Accepted Answer

首先，希望您已经在使用该csv模块，而不是尝试手动解析它。

其次，从您的问题中并不完全清楚，但听起来您正在尝试在读取数据时从数据中构建一个简单的树结构。

那么，像这样的事情？

with open('book.csv') as book:
    chapters = collections.defaultdict(collections.defaultdict(list))
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapters[chapter_name][section_name].append(lesson_name)

当然，这是假设你想要一个“关联树”——a dictof dicts。更普通的线性树，如 a listof lists，或“父指针”形式的隐式树，甚至更简单。

例如，假设您有这样定义的类：

class Chapter(object):
    def __init__(self, name):
        self.name = name

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name

class Lesson(object):
    def __init__(self, section, name):
        self.section = section
        self.name = name

您需要一个dictfor each，将名称映射到对象。所以：

with open('book.csv') as book:
    chapters, sections, lessons = {}, {}, {}
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
        section = sections.setdefault(section_name, Section(chapter, section_name))
        lesson = lessons.setdefault(lesson_name, Lesson(section, lesson_name))

现在，您可以选择一个随机课程，并打印其章节：

lesson = random.choice(lessons.values())
print('Chapter {}, Section {}: Lesson {}'.format(lesson.section.chapter.name,
                                                 lesson.section.name, lesson.name))

要记住的最后一件事：在此示例中，父引用不会导致任何循环引用，因为父引用没有对其子的引用。但如果你需要它怎么办？

class Chapter(object):
    def __init__(self, name):
        self.name = name
        self.sections = {}

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name
        self.lessons = {}

# ...

chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
section = sections.setdefault(section_name, Section(chapter, section_name))
chapters[section_name] = section

到目前为止，一切都很好……但是当你完成所有这些对象时会发生什么？它们具有循环引用，这可能会导致垃圾收集问题。不是无法克服的问题，但这确实意味着在大多数实现中对象不会被尽快收集。例如，在 CPython 中，通常会在最后一个引用超出范围时立即收集内容——但如果您有循环引用，则永远不会发生这种情况，因此在循环检测器的下一次通过之前不会收集任何内容。解决方案是使用 aweakref作为父指针（或 s 的集合指向子指针weakref）。

python - 书的 CSV 解析

1 回答 1

Related

Reference