0

如何对以下代码进行矢量化(不更改函数)

sms['Digit'] = 0
sms['URL'] = 0
###THE FOR LOOP IS MAKING MY CODE VERY SLOW
for i in range(len(sms)):
    sms['Message'].iloc[i],sms['Digit'].iloc[i] = nm.remove_numbers(sms['Message'].iloc[i]) 
    sms['Message'].iloc[i],sms['URL'].iloc[i] = nm.trim_urls(sms['Message'].iloc[i])
sms['Message'] = sms['Message'].apply(nm.stem)
sms.head()

其中函数nm.remove_numbersnm.trim_urls如下

# removes large numbers from sms text
def remove_numbers(message):
    # identifies number with digits in [4,25]
    number_re = "(?<!\d)\d{4,25}(?!\d)"
    numbers = re.findall(number_re, message)
    for number in numbers:
        message = message.replace(number, '')
    return message, numbers.__len__() > 0


# trims all urls in sms text down to their domain names
def trim_urls(message):
    # identifies if string is url
    url_re = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'  #a better version of this
    urls = re.findall(url_re, message)
    for url in urls:
        trimmed_url = url.split("//")[-1].split("/")[0].split('?')[0].replace('www.', '')
        message = message.replace(url, trimmed_url)
    return message, urls.__len__() > 0

sms['Message']因此,我从函数返回一对值,并希望解压缩该对并将它们分配给sms['Digit']第一个函数(两者相似)。

我尝试使用解包,*但这会引发异常。任何显式分配也是如此

sms['Message'],sms['Digit'] = sms['Message'].apply(nm.remove_numbers)

有什么方法可以摆脱我的 for 循环,并矢量化我的代码?当然,如果无法完成,而我唯一的选择是编辑我的主要功能,那么只需帮助我完成该选项即可。

4

0 回答 0