1

我在我的 python 控制台中优化了这个表达式:

texts = re.findall(r"text[^>]*\>(?P<text>(:?[^<]|</\s?[^tT])*)\</text", text)

它工作得很好,当我在控制台中执行时,他的执行时间几乎是即时的,但是当我将它放入我的代码并通过解释器执行时,它似乎被阻塞了。

我在控制台中再次对其进行了测试,并在不到一秒的时间内再次执行。

我检查阻塞语句是正则表达式执行,并且所有执行的文本都相同。

怎么了?

- - - - - - - - - - - - - - - - - - - - 代码 - - - - - ----------------------------------

class Wiki:
    # Regex definition
    search_text_regex = re.compile(r"text[^>]*\>(?P<text>(:?[^<]|</\s?[^tT])*)\</text")

def search_by_title(self, name, text):
        """ Search the slice(The last) of the text that contains the exact name and return               the slice index.
        """
        print "Backoff Launched:"
        # extract the tex from wikipedia Pages
        print "\tExtracting Texts from pages..."
        texts = self.search_text_regex.findall(text) # <= The Regex Launch
        # find the name in the text
        print "\tFinding names on text..."
        for index, text in enumerate(texts):
            if name in text:
                return index
        return None

- - - - - - - - -来源 - - - - - - - - - - - - - - - - --

<page><title>Andrew Johnson</title><id>1624</id><revision><id>244612901</id><timestamp>2008-10-11T18:30:44Z</timestamp><contributor><username>Excirial</username><id>5499713</id></contributor><minor/><comment>Reverted edits by [[Special:Contributions/71.113.103.209|71.113.103.209]] to last version by Soliloquial ([[WP:HG|HG]])</comment><text xml:space="preserve">{{otherpeople2|Andrew Johnson (disambiguation)}}
{{Infobox President
|name=Andrew Johnson
|nationality=American
|image=Andrew Johnson - 3a53290u.png
|caption=President Andrew Johnson, taken in 1865 by [[Mathew Brady|Matthew Brady]].
|order=17th [[President of the United States]]
|vicepresident=none
|term_start=April 15, 1865
|term_end=March 4, 1869
|predecessor=[[Abraham Lincoln]]
|successor=[[Ulysses S. Grant]]
|birth_date={{birth date|mf=yes|1808|12|29}}
|birth_place=[[Raleigh, North Carolina]]
|death_date={{death date and age|mf=yes|1875|7|31|1808|12|29}}
|death_place=[[Elizabethton, Tennessee]]
|spouse=[[Eliza McCardle Johnson]]
|occupation=[[Tailor]]
|party=[[History of the Democratic Party (United States)|Democratic]] until 1864 and after 1869; elected Vice President in 1864 on a [[National Union Party (United States)|National Union]] ticket; no party affiliation 1865–1869
|signature=Andrew Johnson Signature.png
|order2=16th [[Vice President of the United States]]
|term_start2=March 4, 1865
|term_end2=April 15, 1865
|president2=[[Abraham Lincoln]]
|predecessor2=[[Hannibal Hamlin]]
|successor2=[[Schuyler Colfax]]
|jr/sr3=United States Senator
|state3=[[Tennessee]]
|term_start3=October 8, 1857
|term_end3=March 4, 1862
|preceded3=[[James C. Jones]]
|succeeded3=[[David T. Patterson]]
|term_start4=March 4, 1875
|term_end4=July 31, 1875
|preceded4=[[William Gannaway Brownlow|William G. Brownlow]]
|succeeded4=[[David M. Key]]
|order5=17th
|title5=[[Governor of Tennessee]]
|term_start5=October 17, 1853
|term_end5=November 3, 1857
|predecessor5=[[William B. Campbell]]
|successor5=[[Isham G. Harris]]
|religion=[[Christian]] (no denomination; attended Catholic and Methodist services)<ref>[http://www.adherents.com/people/pj/Andrew_Johnson.html Adherents.com: The Religious Affiliation of Andrew Johnson]</ref>
}}
Johnson was nominated for the [[Vice President of the United States|Vice President]] slot in 1864 on the [[National Union Party (United States)|National Union Party]] ticket.  He and Lincoln were [[United States presidential election, 1864|elected in November 1864]].  Johnson succeeded to the Presidency upon Lincoln's assassination on April 15, 1865.

==Bibliography==
{{portal|Tennessee}}
{{portal|United States Army|United States Department of the Army Seal.svg}}
{{portal|American Civil War}}
* Howard K. Beale, ''The Critical Year. A Study of Andrew Johnson and Reconstruction'' (1930). ISBN 0-8044-1085-2
*  Winston; Robert W. ''Andrew Johnson: Plebeian and Patriot'' (1928) [http://www.questia.com/PM.qst?a=o&d=3971949 online edition]

===Primary sources===
* Ralph W. Haskins, LeRoy P. Graf, and Paul H. Bergeron et al, eds. ''The Papers of Andrew Johnson'' 16 volumes; University of Tennessee Press, (1967–2000). ISBN 1572330910.) Includes all letters and speeches by Johnson, and many letters written to him. Complete to 1875.
* [http://www.impeach-andrewjohnson.com/ Newspaper clippings, 1865–1869]
* [http://www.andrewjohnson.com/09ImpeachmentAndAcquittal/ImpeachmentAndAcquittal.htm Series of [[Harper's Weekly]] articles covering the impeachment controversy and trial]
*[http://starship.python.net/crew/manus/Presidents/aj2/aj2obit.html Johnson's obituary, from the ''New York Times'']

==Notes==
{{reflist|2}}

==External links==
{{sisterlinks|s=Author:Andrew Johnson}}
*{{gutenberg author|id=Andrew+Johnson | name=Andrew Johnson}}
{{s-start}}
{{s-par|us-hs}}
{{s-aft|after=[[Ulysses S. Grant]]}}
{{s-par|us-sen}}
{{s-bef|before=[[James C. Jones]]}}
{{s-ttl|title=[[List of United States Senators from Tennessee|Senator from Tennessee (Class 1)]]|years=October 8, 1857{{ndash}} March 4, 1862|alongside=[[John Bell (Tennessee politician)|John Bell]], [[Alfred O. P. Nicholson]]}}
{{s-vac|next=[[David T. Patterson]]|reason=[[American Civil War|Secession of Tennessee from the Union]]}}
{{s-bef|before=[[William Gannaway Brownlow|William G. Brownlow]]}}
{{s-ttl|title=[[List of United States Senators from Tennessee|Senator from Tennessee (Class 1)]]| years=March 4, 1875{{ndash}} July 31, 1875|alongside=[[Henry Cooper (U.S. Senator)|Henry Cooper]]}}
{{s-aft|after=[[David M. Key]]}}
{{s-ppo}}
{{s-bef|before=[[Hannibal Hamlin]]}}
{{s-ttl|title=[[List of United States Republican Party presidential tickets|Republican Party¹ vice presidential candidate]]|years=[[U.S. presidential election, 1864|1864]]}}

{{Persondata
|NAME= Johnson, Andrew
|ALTERNATIVE NAMES=
|SHORT DESCRIPTION= seventeenth [[President of the United States]]<br/> [[Union (American Civil War)|Union]] [[Union Army|Army]] [[General officer|General]]
|DATE OF BIRTH={{birth date|mf=yes|1808|12|29|mf=y}}
|PLACE OF BIRTH= [[Raleigh, North Carolina]]
|DATE OF DEATH={{death date|mf=yes|1875|7|31|mf=y}}
|PLACE OF DEATH= [[Greeneville, Tennessee]]
}}

{{Lifetime|1808|1875|Johnson, Andrew}}
[[Category:Presidents of the United States]]
[[vi:Andrew Johnson]]
[[tr:Andrew Johnson]]
[[uk:Ендрю Джонсон]]
[[ur:انڈریو جانسن]]
[[yi:ענדרו זשאנסאן]]
[[zh:安德鲁·约翰逊]]</text></revision></page>
4

2 回答 2

1

我解决它。该代码有一个用于清理文本的管道,该管道删除了一些必要的标记以进行正确匹配。因为文本的长度,不可能匹配的搜索需要太多时间。

于 2012-06-06T19:37:16.073 回答
0

我会用这个:

result = re.findall(r"(?s)<text[^>]*>(?P<text>(?:(?!</?text>).)*)</text>", subject)

(?:(?!</?text>).)*一次使用一个字符,但只有在前瞻验证它不是 a <text>or</text>标记的第一个字符之后。

于 2012-05-23T21:12:18.100 回答