python - Python 帮助 - 字典、键、值

Question

我正在尝试编写一个程序，但遇到了很多麻烦。以下是我的说明：对于这个程序，您将从一些美国人口普查数据创建一个简单的数据库。该数据库将包含一个字典，其键是州名，其值是从 1900 年到 1990 年每一年的人口列表。创建数据库后，您将编写一个简单的命令驱动程序，该程序将提示用户输入州名和年份，然后报告该州该年的人口。您的程序将执行此操作，直到用户键入任何以“q”或“Q”开头的单词。

人口普查数据在这里：http ://www.census.gov/population/www/censusdata/files/urpop0090.txt 我已将其全部保存到名为“database”的平面 ascii 文件中

花一些时间研究文件。它包含一些多余的信息（至少出于我们的目的）。您将需要制定一种策略，从文件中准确提取您需要的信息以放入数据库（字典）中。

以下是我描述所需信息的模式：

当行以 6 个空格开头并后跟一个大写字母时，您可以告诉您有一行包含状态数据。当该行后面连续有两个空格时，您可以找到州名称的结尾。
如果你有一行包含状态数据，你可以找到该行的第一个总人口，方法是转到字符 43，然后备份，直到找到一个空格。
如果您有一行包含状态数据，您可以通过转到字符 101 找到该行的第二个总人口，然后备份直到找到一个空格。
如果您有一行包含状态数据，您可以通过转到字符 159 找到该行的第三个总人口，然后备份直到找到一个空格。

这是我到目前为止所拥有的：

#gets rid of commas in the populations 
def convert_string_to_number( comma_string ):
        number = comma_string.replace(",","")
        parts = number.split(".")  # check for a decimal point
        if len(parts) == 1 and parts[0].isdigit(): # we really have an integer
    number = float(parts[0])
        elif len(parts) == 2 and parts[0].isdigit() and parts[1].isdigit(): #float
    number = float (parts[0] + "." + parts[1])
        else:
    number = None
        return number


def getsub(str, endindex):
     sublist = str[:endindex].split(' ')
     substring = sublist[-1]
     return substring

def main():
    data = open('database', 'r')
lines = data.readlines()

for line in lines:
    # Now do the line processing.
    if line.startswith('      '):
    # Now process the state data
        firsttotalpop = getsub(line, 42)
        secondtotalpop = getsub(line, 100)
        thirdtotalpop = getsub(line, 158)


return 0

我在弄清楚如何实际创建带有键/值的字典以及如何让人口值坚持州名的键时遇到了一些麻烦。另外，我不肯定如何接受用户输入并将其用作键。我也不确定上面的代码是否正确获取州名称和人口信息。

任何建议/帮助将不胜感激！

score 1 · Accepted Answer

要创建一个 dict 你会做这样的事情：

censusvalues = {}
censusvalues['CA'] = {}
censusvalues['CA']['1960'] = <1960 census value>

您可以根据您提取的数据填充字典：

censusvalues['CA'] = {}
censusvalues['CA']['1960'] = 456
censusvalues['CA']['1970'] = 789
>>censusvalues
>>{'CA': {'1960': 456, '1970': 789}}

提示将提示用户输入州名和年份：

state = raw_input("Enter the state: ")
year = raw_input("Enter the year: ")

然后会做类似的事情：

 censusvalues[name][year]

打印输出。

我将在这里解决我在您的代码中看到的一些问题（确保在这些编辑之后的开头导入 re）：

def main():
    data = open('database', 'r')
    lines = data.readlines()
    year = 0
    censusvalues = {}
    for line in lines:
        # Now do the line processing.
        # The first thing you need to do here is see which years 
        # you are about to grab data from.  To do this, you need to figure out
        # how to extract that from the file.  Every line that has a year in it is prefixed by the same number of spaces followed by a number, so you can get it that way:
        if re.match('<insert number of spaces here...too lazy to count>[0-9]', line):
            year = int(line[<number of spaces>:].strip())
            continue

        if line.startswith('      '):

        # Now process the state data
        <you need to insert code here to grab the state name>

            firsttotalpop = getsub(line, 42)
            secondtotalpop = getsub(line, 100)
            thirdtotalpop = getsub(line, 158)
            censusvalues[state][year] = firsttoalpop
            censusvalues[state][year-10] = secondtotalpop 
            censusvalues[state][year-20] = thirdtotalpop 
    return 0

最后，你需要考虑当你只有一年而不是三年时会发生什么。我会把它作为练习留给你......

编辑：还有一件事，在尝试向其添加 K/V 对之前，您还需要检查 dict 是否存在......可能像这样：

if not <state> in censusvalues:
    censusvalues[<state>] = {}

score 0 · Accepted Answer

至于创建字典：

my_dict = {}
my_dict['Texas'] = [1,2,5,10,2000] #etc etc 
my_dict['Florida'] = [2,3,6 10, 1000] #etc etc

你也可以这样做

temp = 'Florida'
print my_dict[temp]

您可以根据需要存储数据，但一般语法是 dict[key] = value键可以是整数或字符串（在您的情况下为字符串），值几乎可以是任何数据结构（列表、整数、字符串、整数列表，甚至是另一个 dict ，或听写列表..你明白了）

score 0 · Accepted Answer

给定：我们知道人口 1 从字符 34 开始，因为没有一个州拥有超过 1 亿人口。我们知道人口 1 将在字符 44 处结束。

但是，有些州的人口少于一千万，因此它们必须从字符 35 或 36 开始。这有关系吗？不。

# where line is the line is containing STATE information
def get_population_one( line ):
    populationOne = line[34:44]
    populationOne = populationOne.replace(',','') # remove the commas
    populationOne = populationOne.replace(' ', '') # remove any spaces for states that start with less than 10 million population
    return int(populationOne) # convert the string to an integer

然后对于人口二和人口三，您只需更改状态信息的索引并使用上述相同的逻辑。

这一切都可以在一行中完成：

 def get_population_one(line):
     return int(line[34:44].replace(',', '').strip())

python - Python 帮助 - 字典、键、值

3 回答 3

Related

Reference