I'm having difficulty understanding the import statement and its variations.

Suppose I'm using the lxml module for scraping websites.

The following examples show...

from lxml.html import parse
parse( 'http://somesite' )

...Google's python style guide prefers the basic import statement, to preserve the namespaces.

I'd prefer to do that, but when I try this:

import lxml
lxml.html.parse( 'http://somesite' )

...then I get the following error message:

AttributeError: 'module' object has no attribute 'html'

Can anyone help me understand what is going on? I'd much prefer to use modules within their namespaces, but need some assistance understanding the semantics.


3 回答 3

import lxml.html as LH
doc = LH.parse('http://somesite')

lxml.html is a module. When you import lxml, the html module is not imported into the lxml namespace. This is a developer's decision. Some packages automatically import some modules, some don't. In this case, you have to do it yourself with import lxml.html.

import lxml.html as LH imports the html module and binds it to the name LH in the current module's namespace. So you can access the parse function with LH.parse.

If you want to delve deeper into when a package (like lxml) imports modules (like lxml.html) automatically, open a terminal and type

In [16]: import lxml

In [17]: lxml
Out[17]: <module 'lxml' from '/usr/lib/python2.7/dist-packages/lxml/__init__.pyc'>

Here is you see the path to the lxml package's __init__.py file. If you look at the contents you find it is empty. So no submodules are imported. If you look in numpy's __init__.py, you see lots of code, amongst which is

import linalg
import fft
import polynomial
import random
import ctypeslib
import ma

These are all submodules which are imported into the numpy namespace. So from a user's perspective, import numpy automatically gives you access to numpy.linalg, numpy.fft, etc.

于 2012-10-26T20:07:47.577 回答

Let's take an example of a package pkg with two module in it a.py and b.py:

   | -- a.py
   | -- b.py
   | -- __init__.py

in __init__.py you are importing a.py and not b.py:

import a

So if you open your terminal and do:

>>> import pkg
>>> pkg.a
>>> pkg.b
AttributeError: 'module' object has no attribute 'b'

As you can see because we have imported a.py in pkg's __init__.py, we was able to access it as an attribute of pkg but b is not there, so to access this later we should use:

>>> import pkg.b   # OR: from pkg import b


于 2012-10-26T20:23:00.623 回答

When you import a package, the interpreter looks up the package on the pythonpath, then if found, parses and runs the package's __init__.py, building a package object from it, and inserts that object in to sys.modules. When importing a module, it does the same thing, except it creates and adds a module object. When you subsequently attempt to access an attribute (aka a member method, class, submodule, or subpackage), it retrieves the corresponding object from sys.modules and attempts a getattr on the module or package object for the child you want. However, if the child is a submodule or subpackage that has not yet been imported, it has not been added to sys.modules or the module or package's attribute list, so you get an AttributeError. Thus, you have to explicitly import a module or package, either in your code, or delegated in a package's __init__.py for it to be available at runtime on its parent.

于 2012-10-26T20:15:07.997 回答