我想验证一长串 URL 字符串,但其中一些包含变音字符,例如:ä、à、è、ö 等。
有没有办法配置 Apache Commons UrlValidator 来接受这些字符?
此测试失败(注意 ã):
@Test
public void urlValidatorShouldPassWithUmlaut()
{
// Given
org.apache.commons.validator.routines.UrlValidator validator;
validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );
// When
String url = "http://dbpedia.org/resource/São_Paulo";
// Then
assertThat( validator.isValid( url ), is( true ) );
}
此测试通过(ã 替换为 a):
@Test
public void urlValidatorShouldPassWithUmlaut()
{
// Given
org.apache.commons.validator.routines.UrlValidator validator;
validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );
// When
String url = "http://dbpedia.org/resource/Sao_Paulo";
// Then
assertThat( validator.isValid( url ), is( true ) );
}
软件版本:
<dependency>
<groupId>commons-validator</groupId>
<artifactId>commons-validator</artifactId>
<version>1.4.0</version>
</dependency>
更新:
validator.isValid( IDN.toASCII(url) )
也失败了IDN.toASCII(url)
,我还不明白的事情也失败了,例如它转换http://dbpedia.org/resource/São_Paulo
成http://dbpedia.xn--org/resource/so_paulo-w1b
,根据UrlValidator