9

我是 Java 新手,并试图了解该语言的要领和基础知识。

说 Java 字符串对象本质上是一个定义为不可变字符数组的类是否准确?

我问这个是因为与 char 数组和字符串类相比,我对规范有点困惑......

JLS 10.9

10.9 字符数组不是字符串 在 Java 编程语言中,与 C 不同,char 数组不是字符串,而且字符串和 char 数组都不会以 '\u0000'(NUL 字符)结尾。String 对象是不可变的,也就是说,它的内容永远不会改变,而 char 数组具有可变元素。String 类中的 toCharArray 方法返回一个字符数组,该数组包含与 String 相同的字符序列。StringBuffer 类在可变字符数组上实现了有用的方法。

JLS 4.3.3

4.3.3 类字符串 String 类的实例表示 Unicode 代码点的序列。

4

1 回答 1

20

说 Java 字符串对象本质上是一个定义为不可变字符数组的类是否准确?

不,Java String 对象是(目前 - 这是我收集的可能正在更改的实现细节)一个包含几个字段的类:

  • Achar[]包含实际字符
  • 数组的起始索引
  • 一个长度
  • 一个缓存的哈希码,延迟计算

索引和长度的原因是几个字符串可以包含对相同的引用char[]。这被一些操作使用,例如substring(在许多实现中,无论如何)。

The important thing is the API for String though - which is very different to the API for an array. It's the API you would think of when you take the JLS definition into account: a String represents a sequence of Unicode code points. So you can take a subsequence (Substring), find a given subsequence (indexOf), convert it to an upper case sequence etc.

In fact the JLS would be slightly more accurate to call it a sequence of UTF-16 code units; it's entirely possible to construct a string which isn't a valid sequence of Unicode code points, e.g. by including one half of a "surrogate pair" of UTF-16 code units but not the other. There are parts of the API which do deal with the String in terms of code units, but frankly most developers spend most of the time treating strings as if non-BMP characters didn't exist.

于 2012-11-02T19:50:08.290 回答