157

我们主要在美国开展业务,并试图通过将所有地址字段合并到一个文本区域来改善用户体验。但是有几个问题:

  • 用户输入的地址可能不正确或格式不正确
  • 地址必须分成几部分(街道、城市、州等)以处理信用卡付款
  • 用户可以输入的不仅仅是他们的地址(比如他们的名字或公司)
  • 谷歌可以做到这一点,但服务条款和查询限制令人望而却步,尤其是在预算紧张的情况下

显然,这是一个常见的问题:

有没有办法将地址与其周围的文本隔离开来并将其分解成碎片?是否有正则表达式来解析地址?

4

7 回答 7

320

我在地址验证公司工作时经常看到这个问题。我在这里发布答案是为了让搜索相同问题的程序员更容易获得它。我所在的公司处理了数十亿个地址,我们在这个过程中学到了很多东西。

首先,我们需要了解一些关于地址的事情。

地址不规则

这意味着正则表达式已经过时了。我已经看到了这一切,从以特定格式匹配地址的简单正则表达式到这个:

/\s+(\d{2,5}\s+)(?![a|p]m\b)(([a-zA-Z|\s+]{1,5}){1,2}) ?([\s|,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|车道|ln|道路|rd|blvd)([\s|,|.|;]+)?(([a-zA-Z|\s+]{1,30}){1,2})([ \s|,|.]+)?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID|IL|IN|KS|KY |LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN |TX|UT|VA|VI|VT|WA|WI|WV|WY)([\s|,|.]+)?(\s+\d{5})?([\s|,|.] +)/我

...至此,一个 900 多个行类文件会动态生成一个超大规模的正则表达式以匹配更多内容。我不推荐这些(例如,这是上面正则表达式的一个小提琴,它会犯很多错误)。没有一个简单的魔法公式可以让它发挥作用。在理论上和理论上,正则表达式匹配地址是不可能的。

USPS 出版物 28记录了可能的多种地址格式,以及它们的所有关键字和变体。最糟糕的是,地址通常不明确。单词可以表示不止一件事(“St”可以是“Saint”或“Street”),而且我很确定有些词是他们发明的。(谁知道“Stravenue”是街道后缀?)

您需要一些真正理解地址的代码,如果该代码确实存在,那就是商业机密。但如果你真的很喜欢,你可能会自己动手。

地址有意想不到的形状和大小

以下是一些人为的(但完整的)地址:

1)  102 main street
    Anytown, state

2)  400n 600e #2, 52173

3)  p.o. #104 60203

即使这些也可能是有效的:

4)  829 LKSDFJlkjsdflkjsdljf Bkpw 12345

5)  205 1105 14 90210

显然,这些都不是标准化的。不保证标点和换行符。这是发生了什么:

  1. 数字 1是完整的,因为它包含街道地址以及城市和州。有了这些信息,就足以识别地址,它可以被认为是“可交付的”(通过一些标准化)。

  2. 数字 2是完整的,因为它包含街道地址(带有次要/单元号)和 5 位数的邮政编码,这足以识别地址。

  3. 3 号是一个完整的邮政信箱格式,因为它包含一个邮政编码。

  4. 数字 4也是完整的,因为邮政编码是唯一的,这意味着私人实体或公司购买了该地址空间。唯一的邮政编码适用于大容量或集中的交付空间。任何地址为邮政编码 12345 的东西都会送到位于纽约州斯克内克塔迪的通用电气公司。此示例不会特别针对任何人,但 USPS 仍会提供。

  5. 5号也是完整的,信不信由你。仅使用这些数字,就可以在针对所有可能地址的数据库进行解析时发现完整地址。当您将每个数字视为一个组成部分时,填写缺少的方向、辅助指示符和 ZIP+4 代码是微不足道的。这是完全扩展和标准化的样子:

205 N 1105 W Apt 14

比佛利山庄 CA 90210-5221

地址数据不是您自己的

在大多数向许可供应商提供官方地址数据的国家/地区,地址数据本身属于管理机构。在美国,USPS 拥有这些地址。加拿大邮政、皇家邮政和其他公司也是如此,尽管每个国家/地区执行或定义所有权的方式略有不同。了解这一点很重要,因为它通常禁止对地址数据库进行逆向工程。您必须小心如何获取、存储和使用数据。

谷歌地图是快速修复地址的常用方法,但TOS相当禁止;例如,您不能在不显示谷歌地图的情况下使用他们的数据或 API,并且只能用于非商业目的(除非您付费),并且您不能存储数据(临时缓存除外)。说得通。谷歌的数据是世界上最好的。但是,谷歌地图不会验证地址。如果地址不存在,它仍会显示地址存在时的位置(在您自己的街道上尝试;使用您知道存在的门牌号)。这有时很有用,但请注意这一点。

Nominatim 的使用政策同样受到限制,特别是对于大批量和商业用途,并且数据大多来自免费来源,因此维护得不好(例如开放项目的性质)。但是,这可能仍然适合您的需求。一个伟大的社区支持它。

USPS 本身有一个 API,但它下降了很多,并且没有任何保证或支持。它也可能很难使用。有些人很少使用它,没有任何问题。但很容易忽略 USPS 要求您仅使用他们的 API 来确认通过他们运送的地址。

人们希望地址很难

不幸的是,我们已经使我们的社会习惯于期望地址变得复杂。互联网上有很多关于此的优秀 UX 文章。尽管如此,事实是,如果您有一个包含单个字段的地址表单,这就是用户所期望的,即使它使不符合表单期望格式的边缘情况地址变得更加困难,或者表单可能需要一个字段它不应该。或者用户不知道将地址的某个部分放在哪里。

这些天我可以继续谈论糟糕的结帐表单的用户体验,但相反,我会说将地址组合到一个字段中将是一个受欢迎的变化——人们将能够以他们认为合适的方式输入他们的地址,而不是试图找出你冗长的表格。但是,这种变化是出乎意料的,用户一开始可能会觉得有点刺耳。请注意这一点。

通过将国家/地区字段放在地址之前,可以部分缓解这种痛苦。当他们首先填写国家/地区字段时,您就知道如何显示您的表单。也许你有一个处理单字段美国地址的好方法,所以如果他们选择美国,你可以将你的表单缩减为单个字段,否则显示组件字段。只是要考虑的事情!

现在我们知道为什么很难了;你能为这个做什么?

USPS 通过称为 CASS™ 认证的流程向供应商授予许可,以向客户提供经过验证的地址。这些供应商可以访问每月更新的 USPS 数据库。他们的软件必须符合严格的标准才能获得认证,而且他们通常不需要同意上述限制条款。

许多经过 CASS 认证的公司可以处理列表或拥有 API:Melissa Data、Experian QAS 和 SmartyStreets,仅举几例。

(由于“广告”受到抨击,我在这一点上截断了我的答案。找到适合你的解决方案取决于你。)

真相:真的,伙计们,我不在这些公司工作。这不是广告。

于 2012-06-22T16:19:48.187 回答
39

libpostal:一个用于解析地址的开源库,使用来自 OpenStreetMap、OpenAddresses 和 OpenCage 的数据进行训练。

https://github.com/openvenues/libpostal有关它的更多信息

其他工具/服务:

于 2017-07-11T08:34:28.167 回答
14

有许多街道地址解析器。它们有两种基本风格——一种有地名和街道名称数据库,另一种没有。

正则表达式街道地址解析器可以轻松获得高达 95% 的成功率。然后你开始处理不寻常的情况。CPAN 中的 Perl,“Geo::StreetAddress::US”,差不多就是这样。有 Python 和 Javascript 端口,都是开源的。我在 Python 中有一个改进的版本,它通过处理更多的案例来略微提高成功率。但是,要正确完成最后 3%,您需要数据库来帮助消除歧义。

具有 3 位邮政编码和美国州名和缩写的数据库是一个很大的帮助。当解析器看到一致的邮政编码和州名时,它可以开始锁定格式。这对美国和英国非常有效。

正确的街道地址解析从末尾开始并向后工作。USPS 系统就是这样做的。最后地址最不模糊,国家名称、城市名称和邮政编码相对容易识别。街道名称通常可以被隔离。街道上的位置是最难解析的;在那里你会遇到诸如“五楼”和“斯台普斯馆”之类的东西。这时候数据库就大有帮助了。

于 2015-04-05T05:25:59.197 回答
10

更新:Geocode.xyz 现在可以在全球范围内使用。示例见https://geocode.xyz

对于美国、墨西哥和加拿大,请参阅geocoder.ca

例如:

输入:main 和 arthur kill rd new york 交叉点附近发生的事情

输出:

<geodata>
  <latt>40.5123510000</latt>
  <longt>-74.2500500000</longt>
  <AreaCode>347,718</AreaCode>
  <TimeZone>America/New_York</TimeZone>
  <standard>
    <street1>main</street1>
    <street2>arthur kill</street2>
    <stnumber/>
    <staddress/>
    <city>STATEN ISLAND</city>
    <prov>NY</prov>
    <postal>11385</postal>
    <confidence>0.9</confidence>
  </standard>
</geodata>

您还可以在 Web 界面中检查结果或以 Json 或 Jsonp 的形式输出。例如。我正在寻找纽约大街 123 号附近的餐馆

于 2015-12-22T00:41:38.673 回答
4

没有代码?耻辱!

这是一个简单的 JavaScript 地址解析器。由于马特在上面的论文中给出的每一个原因,这都非常糟糕(我几乎 100% 同意:地址是复杂的类型,人类会犯错误;最好外包和自动化——当你负担得起的时候)。

但我没有哭,而是决定尝试:

此代码适用于解析大多数 Esri 结果findAddressCandidate以及其他一些(反向)地理编码器,它们返回单行地址,其中街道/城市/州由逗号分隔。您可以根据需要扩展或编写特定于国家/地区的解析器。或者只是用这个作为案例研究这个练习有多么具有挑战性,或者我在 JavaScript 上有多糟糕。我承认我只花了大约 30 分钟(未来的迭代可能会添加缓存、zip 验证和状态查找以及用户位置上下文),但它适用于我的用例:最终用户看到将地理编码搜索响应解析为 4 的表单文本框。如果地址解析出错(这很少见,除非源数据很差),这没什么大不了的——用户可以验证并修复它!(但对于自动化解决方案可以丢弃/忽略或标记为错误,因此开发人员可以支持新格式或修复源数据。)

/* 
address assumptions:
- US addresses only (probably want separate parser for different countries)
- No country code expected.
- if last token is a number it is probably a postal code
-- 5 digit number means more likely
- if last token is a hyphenated string it might be a postal code
-- if both sides are numeric, and in form #####-#### it is more likely
- if city is supplied, state will also be supplied (city names not unique)
- zip/postal code may be omitted even if has city & state
- state may be two-char code or may be full state name.
- commas: 
-- last comma is usually city/state separator
-- second-to-last comma is possibly street/city separator
-- other commas are building-specific stuff that I don't care about right now.
- token count:
-- because units, street names, and city names may contain spaces token count highly variable.
-- simplest address has at least two tokens: 714 OAK
-- common simple address has at least four tokens: 714 S OAK ST
-- common full (mailing) address has at least 5-7:
--- 714 OAK, RUMTOWN, VA 59201
--- 714 S OAK ST, RUMTOWN, VA 59201
-- complex address may have a dozen or more:
--- MAGICICIAN SUPPLY, LLC, UNIT 213A, MAGIC TOWN MALL, 13 MAGIC CIRCLE DRIVE, LAND OF MAGIC, MA 73122-3412
*/

var rawtext = $("textarea").val();
var rawlist = rawtext.split("\n");

function ParseAddressEsri(singleLineaddressString) {
  var address = {
    street: "",
    city: "",
    state: "",
    postalCode: ""
  };

  // tokenize by space (retain commas in tokens)
  var tokens = singleLineaddressString.split(/[\s]+/);
  var tokenCount = tokens.length;
  var lastToken = tokens.pop();
  if (
    // if numeric assume postal code (ignore length, for now)
    !isNaN(lastToken) ||
    // if hyphenated assume long zip code, ignore whether numeric, for now
    lastToken.split("-").length - 1 === 1) {
    address.postalCode = lastToken;
    lastToken = tokens.pop();
  }

  if (lastToken && isNaN(lastToken)) {
    if (address.postalCode.length && lastToken.length === 2) {
      // assume state/province code ONLY if had postal code
      // otherwise it could be a simple address like "714 S OAK ST"
      // where "ST" for "street" looks like two-letter state code
      // possibly this could be resolved with registry of known state codes, but meh. (and may collide anyway)
      address.state = lastToken;
      lastToken = tokens.pop();
    }
    if (address.state.length === 0) {
      // check for special case: might have State name instead of State Code.
      var stateNameParts = [lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken];

      // check remaining tokens from right-to-left for the first comma
      while (2 + 2 != 5) {
        lastToken = tokens.pop();
        if (!lastToken) break;
        else if (lastToken.endsWith(",")) {
          // found separator, ignore stuff on left side
          tokens.push(lastToken); // put it back
          break;
        } else {
          stateNameParts.unshift(lastToken);
        }
      }
      address.state = stateNameParts.join(' ');
      lastToken = tokens.pop();
    }
  }

  if (lastToken) {
    // here is where it gets trickier:
    if (address.state.length) {
      // if there is a state, then assume there is also a city and street.
      // PROBLEM: city may be multiple words (spaces)
      // but we can pretty safely assume next-from-last token is at least PART of the city name
      // most cities are single-name. It would be very helpful if we knew more context, like
      // the name of the city user is in. But ignore that for now.
      // ideally would have zip code service or lookup to give city name for the zip code.
      var cityNameParts = [lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken];

      // assumption / RULE: street and city must have comma delimiter
      // addresses that do not follow this rule will be wrong only if city has space
      // but don't care because Esri formats put comma before City
      var streetNameParts = [];

      // check remaining tokens from right-to-left for the first comma
      while (2 + 2 != 5) {
        lastToken = tokens.pop();
        if (!lastToken) break;
        else if (lastToken.endsWith(",")) {
          // found end of street address (may include building, etc. - don't care right now)
          // add token back to end, but remove trailing comma (it did its job)
          tokens.push(lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken);
          streetNameParts = tokens;
          break;
        } else {
          cityNameParts.unshift(lastToken);
        }
      }
      address.city = cityNameParts.join(' ');
      address.street = streetNameParts.join(' ');
    } else {
      // if there is NO state, then assume there is NO city also, just street! (easy)
      // reasoning: city names are not very original (Portland, OR and Portland, ME) so if user wants city they need to store state also (but if you are only ever in Portlan, OR, you don't care about city/state)
      // put last token back in list, then rejoin on space
      tokens.push(lastToken);
      address.street = tokens.join(' ');
    }
  }
  // when parsing right-to-left hard to know if street only vs street + city/state
  // hack fix for now is to shift stuff around.
  // assumption/requirement: will always have at least street part; you will never just get "city, state"  
  // could possibly tweak this with options or more intelligent parsing&sniffing
  if (!address.city && address.state) {
    address.city = address.state;
    address.state = '';
  }
  if (!address.street) {
    address.street = address.city;
    address.city = '';
  }

  return address;
}

// get list of objects with discrete address properties
var addresses = rawlist
  .filter(function(o) {
    return o.length > 0
  })
  .map(ParseAddressEsri);
$("#output").text(JSON.stringify(addresses));
console.log(addresses);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea>
27488 Stanford Ave, Bowden, North Dakota
380 New York St, Redlands, CA 92373
13212 E SPRAGUE AVE, FAIR VALLEY, MD 99201
1005 N Gravenstein Highway, Sebastopol CA 95472
A. P. Croll &amp; Son 2299 Lewes-Georgetown Hwy, Georgetown, DE 19947
11522 Shawnee Road, Greenwood, DE 19950
144 Kings Highway, S.W. Dover, DE 19901
Intergrated Const. Services 2 Penns Way Suite 405, New Castle, DE 19720
Humes Realty 33 Bridle Ridge Court, Lewes, DE 19958
Nichols Excavation 2742 Pulaski Hwy, Newark, DE 19711
2284 Bryn Zion Road, Smyrna, DE 19904
VEI Dover Crossroads, LLC 1500 Serpentine Road, Suite 100 Baltimore MD 21
580 North Dupont Highway, Dover, DE 19901
P.O. Box 778, Dover, DE 19903
714 S OAK ST
714 S OAK ST, RUM TOWN, VA, 99201
3142 E SPRAGUE AVE, WHISKEY VALLEY, WA 99281
27488 Stanford Ave, Bowden, North Dakota
380 New York St, Redlands, CA 92373
</textarea>
<div id="output">
</div>

于 2018-03-15T18:59:44.003 回答
2

对于美国usaddress地址解析,我更喜欢使用pip.

python3 -m pip install usaddress

使用示例:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# address_parser.py
import sys
from usaddress import tag
from json import dumps, loads

if __name__ == '__main__':
    tag_mapping = {
        'Recipient': 'recipient',
        'AddressNumber': 'addressStreet',
        'AddressNumberPrefix': 'addressStreet',
        'AddressNumberSuffix': 'addressStreet',
        'StreetName': 'addressStreet',
        'StreetNamePreDirectional': 'addressStreet',
        'StreetNamePreModifier': 'addressStreet',
        'StreetNamePreType': 'addressStreet',
        'StreetNamePostDirectional': 'addressStreet',
        'StreetNamePostModifier': 'addressStreet',
        'StreetNamePostType': 'addressStreet',
        'CornerOf': 'addressStreet',
        'IntersectionSeparator': 'addressStreet',
        'LandmarkName': 'addressStreet',
        'USPSBoxGroupID': 'addressStreet',
        'USPSBoxGroupType': 'addressStreet',
        'USPSBoxID': 'addressStreet',
        'USPSBoxType': 'addressStreet',
        'BuildingName': 'addressStreet',
        'OccupancyType': 'addressStreet',
        'OccupancyIdentifier': 'addressStreet',
        'SubaddressIdentifier': 'addressStreet',
        'SubaddressType': 'addressStreet',
        'PlaceName': 'addressCity',
        'StateName': 'addressState',
        'ZipCode': 'addressPostalCode',
    }
    try:
        address, _ = tag(' '.join(sys.argv[1:]), tag_mapping=tag_mapping)
    except:
        with open('failed_address.txt', 'a') as fp:
            fp.write(sys.argv[1] + '\n')
        print(dumps({}))
    else:
        print(dumps(dict(address)))

运行address_parser.py

python3 address_parser.py 9757 East Arcadia Ave. Saugus MA 01906
{"addressStreet": "9757 East Arcadia Ave.", "addressCity": "Saugus", "addressState": "MA", "addressPostalCode": "01906"}
于 2019-06-04T23:27:17.717 回答
0

我迟到了,但这是我多年前为澳大利亚编写的 Excel VBA 脚本。它可以很容易地修改以支持其他国家。我在这里创建了 C# 代码的 GitHub 存储库。我已经在我的网站上托管了它,你可以在这里下载它:http: //jeremythompson.net/Rocks/ParseAddress.xlsm

战略

对于任何邮政编码为数字或可以与正则表达式匹配的国家/地区,我的策略都非常有效:

  1. 首先,我们检测假设为第一行的 First 和 Surname。通过取消选中复选框(称为“名称是顶行”,如下所示),可以轻松跳过名称并从地址开始。

  2. 接下来可以安全地预期由 Street 和 Number 组成的 Address 位于 Suburb 之前,而 St、Pde、Ave、Av、Rd、Cres、loop 等是分隔符。

  3. 检测 Suburb 与 State 甚至 Country 可以欺骗最复杂的解析器,因为可能存在冲突。为了克服这个问题,我使用 PostCode 查找基于这样一个事实,即在剥离街道和公寓/单元号码以及 PoBox、Ph、Fax、Mobile 等之后,只保留 PostCode 号码。这很容易与正则表达式匹配,然后查找郊区和国家。

    您的国家邮局服务将免费提供包含郊区和州的邮政编码列表,您可以将其存储在 Excel 表、数据库表、文本/json/xml 文件等中。

  4. 最后,由于某些邮政编码有多个郊区,我们检查地址中出现的郊区。


例子

Excel 单元格的屏幕截图

VBA 代码

免责声明,我知道这段代码并不完美,甚至写得很好,但是它很容易转换为任何编程语言并在任何类型的应用程序中运行。该策略是取决于您的国家和规则的答案,以此代码为例:

Option Explicit

Private Const TopRow As Integer = 0

Public Sub ParseAddress()
Dim strArr() As String
Dim sigRow() As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim Stat As String
Dim SpaceInName As Integer
Dim Temp As String
Dim PhExt As String

On Error Resume Next

Temp = ActiveSheet.Range("Address")

'Split info into array
strArr = Split(Temp, vbLf)

'Trim the array
For i = 0 To UBound(strArr)
strArr(i) = VBA.Trim(strArr(i))
Next i

'Remove empty items/rows    
ReDim sigRow(LBound(strArr) To UBound(strArr))
For i = LBound(strArr) To UBound(strArr)
    If Trim(strArr(i)) <> "" Then
        sigRow(j) = strArr(i)
        j = j + 1
    End If
Next i
ReDim Preserve sigRow(LBound(strArr) To j)

'Find the name (MUST BE ON THE FIRST ROW UNLESS CHECKBOX UNTICKED)
i = TopRow
If ActiveSheet.Shapes("chkFirst").ControlFormat.Value = 1 Then

SpaceInName = InStr(1, sigRow(i), " ", vbTextCompare) - 1

If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("FirstName") = VBA.Left(sigRow(i), SpaceInName)
Else
 If MsgBox("First Name: " & VBA.Mid$(sigRow(i), 1, SpaceInName), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("FirstName") = VBA.Left(sigRow(i), SpaceInName)
End If

If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Surname") = VBA.Mid(sigRow(i), SpaceInName + 2)
Else
  If MsgBox("Surame: " & VBA.Mid(sigRow(i), SpaceInName + 2), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Surname") = VBA.Mid(sigRow(i), SpaceInName + 2)
End If
sigRow(i) = ""
End If

'Find the Street by looking for a "St, Pde, Ave, Av, Rd, Cres, loop, etc"
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
    For j = 0 To 8
    If InStr(1, VBA.UCase(sigRow(i)), Street(j), vbTextCompare) > 0 Then
    
    'Find the position of the street in order to get the suburb
    SpaceInName = InStr(1, VBA.UCase(sigRow(i)), Street(j), vbTextCompare) + Len(Street(j)) - 1
    
    'If its a po box then add 5 chars
    If VBA.Right(Street(j), 3) = "BOX" Then SpaceInName = SpaceInName + 5
    
    If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
    ActiveSheet.Range("Street") = VBA.Mid(sigRow(i), 1, SpaceInName)
    Else
      If MsgBox("Street Address: " & VBA.Mid(sigRow(i), 1, SpaceInName), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Street") = VBA.Mid(sigRow(i), 1, SpaceInName)
    End If
    'Trim the Street, Number leaving the Suburb if its exists on the same line
    sigRow(i) = VBA.Mid(sigRow(i), SpaceInName) + 2
    sigRow(i) = Replace(sigRow(i), VBA.Mid(sigRow(i), 1, SpaceInName), "")
    
    GoTo PastAddress:
    End If
    Next j
End If
Next i
PastAddress:

'Mobile
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
    For j = 0 To 3
    Temp = Mb(j)
        If VBA.Left(VBA.UCase(sigRow(i)), Len(Temp)) = Temp Then
        If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
        ActiveSheet.Range("Mobile") = VBA.Mid(sigRow(i), Len(Temp) + 2)
        Else
          If MsgBox("Mobile: " & VBA.Mid(sigRow(i), Len(Temp) + 2), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Mobile") = VBA.Mid(sigRow(i), Len(Temp) + 2)
        End If
    sigRow(i) = ""
    GoTo PastMobile:
    End If
    Next j
End If
Next i
PastMobile:

'Phone
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
    For j = 0 To 1
    Temp = Ph(j)
        If VBA.Left(VBA.UCase(sigRow(i)), Len(Temp)) = Temp Then
            
            'TODO: Detect the intl or national extension here.. or if we can from the postcode.
            If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
            ActiveSheet.Range("Phone") = VBA.Mid(sigRow(i), Len(Temp) + 3)
            Else
              If MsgBox("Phone: " & VBA.Mid(sigRow(i), Len(Temp) + 3), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Phone") = VBA.Mid(sigRow(i), Len(Temp) + 3)
            End If
        
        sigRow(i) = ""
        GoTo PastPhone:
        End If
    Next j
End If
Next i
PastPhone:


'Email
For i = 1 To UBound(sigRow)
    If Len(sigRow(i)) > 0 Then
        'replace with regEx search
        If InStr(1, sigRow(i), "@", vbTextCompare) And InStr(1, VBA.UCase(sigRow(i)), ".CO", vbTextCompare) Then
        Dim email As String
        email = sigRow(i)
        email = Replace(VBA.UCase(email), "EMAIL:", "")
        email = Replace(VBA.UCase(email), "E-MAIL:", "")
        email = Replace(VBA.UCase(email), "E:", "")
        email = Replace(VBA.UCase(Trim(email)), "E ", "")
        email = VBA.LCase(email)
        
            If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
            ActiveSheet.Range("Email") = email
            Else
              If MsgBox("Email: " & email, vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Email") = email
            End If
        sigRow(i) = ""
        Exit For
        End If
    End If
Next i

'Now the only remaining items will be the postcode, suburb, country
'there shouldn't be any numbers (eg. from PoBox,Ph,Fax,Mobile) except for the Post Code

'Join the string and filter out the Post Code
Temp = Join(sigRow, vbCrLf)
Temp = Trim(Temp)

For i = 1 To Len(Temp)

Dim postCode As String
postCode = VBA.Mid(Temp, i, 4)
    
'In Australia PostCodes are 4 digits
If VBA.Mid(Temp, i, 1) <> " " And IsNumeric(postCode) Then

    If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
    ActiveSheet.Range("PostCode") = postCode
    Else
      If MsgBox("Post Code: " & postCode, vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("PostCode") = postCode
    End If

    'Lookup the Suburb and State based on the PostCode, the PostCode sheet has the lookup
    Dim mySuburbArray As Range
    Set mySuburbArray = Sheets("PostCodes").Range("A2:B16670")
    
    Dim suburbs As String
    For j = 1 To mySuburbArray.Columns(1).Cells.Count
    If mySuburbArray.Cells(j, 1) = postCode Then
        'Check if the suburb is listed in the address
        If InStr(1, UCase(Temp), mySuburbArray.Cells(j, 2), vbTextCompare) > 0 Then

        'Set the Suburb and State
        ActiveSheet.Range("Suburb") = mySuburbArray.Cells(j, 2)
        Stat = mySuburbArray.Cells(j, 3)
        ActiveSheet.Range("State") = Stat
                
        'Knowing the State - for Australia we can get the telephone Ext
        PhExt = PhExtension(VBA.UCase(Stat))
        ActiveSheet.Range("PhExt") = PhExt
        
        'remove the phone extension from the number
        Dim prePhone As String
        prePhone = ActiveSheet.Range("Phone")
        prePhone = Replace(prePhone, PhExt & " ", "")
        prePhone = Replace(prePhone, "(" & PhExt & ") ", "")
        prePhone = Replace(prePhone, "(" & PhExt & ")", "")
        ActiveSheet.Range("Phone") = prePhone
        Exit For
        End If
    End If
    Next j
Exit For
End If
Next i

End Sub

  
Private Function PhExtension(ByVal State As String) As String
Select Case State
Case Is = "NSW"
PhExtension = "02"
Case Is = "QLD"
PhExtension = "07"
Case Is = "VIC"
PhExtension = "03"
Case Is = "NT"
PhExtension = "04"
Case Is = "WA"
PhExtension = "05"
Case Is = "SA"
PhExtension = "07"
Case Is = "TAS"
PhExtension = "06"
End Select
End Function

Private Function Ph(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Ph = "PH"
Case Is = 1
Ph = "PHONE"
'Case Is = 2
'Ph = "P"
End Select
End Function

Private Function Mb(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Mb = "MB"
Case Is = 1
Mb = "MOB"
Case Is = 2
Mb = "CELL"
Case Is = 3
Mb = "MOBILE"
'Case Is = 4
'Mb = "M"
End Select
End Function

Private Function Fax(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Fax = "FAX"
Case Is = 1
Fax = "FACSIMILE"
'Case Is = 2
'Fax = "F"
End Select
End Function

Private Function State(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
State = "NSW"
Case Is = 1
State = "QLD"
Case Is = 2
State = "VIC"
Case Is = 3
State = "NT"
Case Is = 4
State = "WA"
Case Is = 5
State = "SA"
Case Is = 6
State = "TAS"
End Select
End Function

Private Function Street(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Street = " ST"
Case Is = 1
Street = " RD"
Case Is = 2
Street = " AVE"
Case Is = 3
Street = " AV"
Case Is = 4
Street = " CRES"
Case Is = 5
Street = " LOOP"
Case Is = 6
Street = "PO BOX"
Case Is = 7
Street = " STREET"
Case Is = 8
Street = " ROAD"
Case Is = 9
Street = " AVENUE"
Case Is = 10
Street = " CRESENT"
Case Is = 11
Street = " PARADE"
Case Is = 12
Street = " PDE"
Case Is = 13
Street = " LANE"
Case Is = 14
Street = " COURT"
Case Is = 15
Street = " BLVD"
Case Is = 16
Street = "P.O. BOX"
Case Is = 17
Street = "P.O BOX"
Case Is = 18
Street = "PO BOX"
Case Is = 19
Street = "POBOX"
End Select
End Function
于 2019-08-25T11:40:44.533 回答