如何对以下代码进行矢量化(不更改函数)
sms['Digit'] = 0
sms['URL'] = 0
###THE FOR LOOP IS MAKING MY CODE VERY SLOW
for i in range(len(sms)):
sms['Message'].iloc[i],sms['Digit'].iloc[i] = nm.remove_numbers(sms['Message'].iloc[i])
sms['Message'].iloc[i],sms['URL'].iloc[i] = nm.trim_urls(sms['Message'].iloc[i])
sms['Message'] = sms['Message'].apply(nm.stem)
sms.head()
其中函数nm.remove_numbers
和nm.trim_urls
如下
# removes large numbers from sms text
def remove_numbers(message):
# identifies number with digits in [4,25]
number_re = "(?<!\d)\d{4,25}(?!\d)"
numbers = re.findall(number_re, message)
for number in numbers:
message = message.replace(number, '')
return message, numbers.__len__() > 0
# trims all urls in sms text down to their domain names
def trim_urls(message):
# identifies if string is url
url_re = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+' #a better version of this
urls = re.findall(url_re, message)
for url in urls:
trimmed_url = url.split("//")[-1].split("/")[0].split('?')[0].replace('www.', '')
message = message.replace(url, trimmed_url)
return message, urls.__len__() > 0
sms['Message']
因此,我从函数返回一对值,并希望解压缩该对并将它们分配给sms['Digit']
第一个函数(两者相似)。
我尝试使用解包,*
但这会引发异常。任何显式分配也是如此
sms['Message'],sms['Digit'] = sms['Message'].apply(nm.remove_numbers)
有什么方法可以摆脱我的 for 循环,并矢量化我的代码?当然,如果无法完成,而我唯一的选择是编辑我的主要功能,那么只需帮助我完成该选项即可。