java - 使用对先前标记的引用以及与某个类的子类型相对应的子项来解析 XML

Question

我必须处理以下情况（的变体）。我的模型类是：

class Car {
    String brand;
    Engine engine;
}

abstract class Engine {
}

class V12Engine extends Engine {
    int horsePowers;
}

class V6Engine extends Engine {
    String fuelType;
}

我必须反序列化（不需要序列化支持 ATM）以下输入：

<list>

    <brand id="1">
        Volvo
    </brand>

    <car>
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>

    <car>
        <brand refId="1" />
        <v6engine fuel="unleaded" />
    </car>

</list>

我尝试过的/问题：

我尝试过使用 XStream，但它希望我编写如下标签：

<engine class="cars.V12Engine">
    <horsePowers>300</horsePowers>
</engine>

等等（我不想要-tag ，<engine>我想要<v6engine>-tag或<v12engine>-tag。

此外，我需要能够根据标识符引用“预定义”品牌，如上面的品牌 ID 所示。（例如通过Map<Integer, String> predefinedBrands在反序列化期间维护 a ）。我不知道 XStream 是否适合这种情况。

我意识到这可以通过推或拉解析器（例如 SAX 或 StAX）或 DOM 库“手动”完成。然而，我希望有更多的自动化。理想情况下，我应该能够添加类（例如 new Engines）并立即开始在 XML 中使用它们。（XStream 绝不是必需的，最优雅的解决方案会赢得赏金。）

score 11 · Accepted Answer

JAXB ( javax.xml.bind) 可以完成您所追求的一切，尽管有些位比其他位更容易。为简单起见，我将假设您的所有 XML 文件都有一个命名空间——如果它们没有命名空间会比较棘手，但可以使用 StAX API 来解决。

<list xmlns="http://example.com/cars">

    <brand id="1">
        Volvo
    </brand>

    <car>
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>

    <car>
        <brand refId="1" />
        <v6engine fuel="unleaded" />
    </car>

</list>

并假设对应package-info.java的

@XmlSchema(namespace = "http://example.com/cars",
           elementFormDefault = XmlNsForm.QUALIFIED)
package cars;
import javax.xml.bind.annotation.*;

按元素名称的引擎类型

这很简单，使用@XmlElementRef：

package cars;
import javax.xml.bind.annotation.*;

@XmlRootElement
@XmlAccessorType(XmlAccessType.FIELD)
public class Car {
    String brand;
    @XmlElementRef
    Engine engine;
}

@XmlRootElement
abstract class Engine {
}

@XmlRootElement(name = "v12engine")
@XmlAccessorType(XmlAccessType.FIELD)
class V12Engine extends Engine {
    @XmlAttribute
    int horsePowers;
}

@XmlRootElement(name = "v6engine")
@XmlAccessorType(XmlAccessType.FIELD)
class V6Engine extends Engine {
    // override the default attribute name, which would be fuelType
    @XmlAttribute(name = "fuel")
    String fuelType;
}

各种类型的Engine都@XmlRootElement用适当的元素名称进行注释和标记。在解组时，在 XML 中找到的元素名称用于决定使用哪个Engine子类。所以给定 XML

<car xmlns="http://example.com/cars">
    <brand>BMW</brand>
    <v12engine horsePowers="300" />
</car>

和解组代码

JAXBContext ctx = JAXBContext.newInstance(Car.class, V6Engine.class, V12Engine.class);
Unmarshaller um = ctx.createUnmarshaller();
Car c = (Car)um.unmarshal(new File("file.xml"));

assert "BMW".equals(c.brand);
assert c.engine instanceof V12Engine;
assert ((V12Engine)c.engine).horsePowers == 300;

要添加新类型，Engine只需创建新的子类，@XmlRootElement适当地对其进行注释，然后将此新类添加到传递给的列表中JAXBContext.newInstance()。

品牌的交叉引用

JAXB 有一个基于交叉引用的机制@XmlID，@XmlIDREF但是这些要求 ID 属性是一个有效的 XML ID，即一个 XML 名称，特别是不完全由数字组成。但是，只要您不需要“前向”引用（即引用<car>尚未<brand>“声明”的 a），您自己跟踪交叉引用并不难。

第一步是定义一个 JAXB 类来表示<brand>

package cars;

import javax.xml.bind.annotation.*;

@XmlRootElement
public class Brand {
  @XmlValue // i.e. the simple content of the <brand> element
  String name;

  // optional id and refId attributes (optional because they're
  // Integer rather than int)
  @XmlAttribute
  Integer id;

  @XmlAttribute
  Integer refId;
}

现在我们需要一个“类型适配器”来在Brand对象和String所需的 by之间进行转换Car，并维护 id/ref 映射

package cars;

import javax.xml.bind.annotation.adapters.*;
import java.util.*;

public class BrandAdapter extends XmlAdapter<Brand, String> {
  private Map<Integer, Brand> brandCache = new HashMap<Integer, Brand>();

  public Brand marshal(String s) {
    return null;
  }


  public String unmarshal(Brand b) {
    if(b.id != null) {
      // this is a <brand id="..."> - cache it
      brandCache.put(b.id, b);
    }
    if(b.refId != null) {
      // this is a <brand refId="..."> - pull it from the cache
      b = brandCache.get(b.refId);
    }

    // and extract the name
    return (b.name == null) ? null : b.name.trim();
  }
}

我们将适配器链接到使用另一个注解的brand字段：Car

@XmlRootElement
@XmlAccessorType(XmlAccessType.FIELD)
public class Car {
    @XmlJavaTypeAdapter(BrandAdapter.class)
    String brand;
    @XmlElementRef
    Engine engine;
}

难题的最后一部分是确保<brand>在顶层找到的元素保存在缓存中。这是一个完整的例子

package cars;

import javax.xml.bind.*;
import java.io.File;
import java.util.*;

import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Main {
  public static void main(String[] argv) throws Exception {
    List<Car> cars = new ArayList<Car>();

    JAXBContext ctx = JAXBContext.newInstance(Car.class, V12Engine.class, V6Engine.class, Brand.class);
    Unmarshaller um = ctx.createUnmarshaller();

    // create an adapter, and register it with the unmarshaller
    BrandAdapter ba = new BrandAdapter();
    um.setAdapter(BrandAdapter.class, ba);

    // create a StAX XMLStreamReader to read the XML file
    XMLInputFactory xif = XMLInputFactory.newFactory();
    XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource(new File("file.xml")));

    xsr.nextTag(); // root <list> element
    xsr.nextTag(); // first <brand> or <car> child

    // read each <brand>/<car> in turn
    while(xsr.getEventType() == XMLStreamConstants.START_ELEMENT) {
      Object obj = um.unmarshal(xsr);

      // unmarshal from an XMLStreamReader leaves the reader pointing at
      // the event *after* the closing tag of the element we read.  If there
      // was a text node between the closing tag of this element and the opening
      // tag of the next then we will need to skip it.
      if(xsr.getEventType() != XMLStreamConstants.START_ELEMENT && xsr.getEventType() != XMLStreamConstants.END_ELEMENT) xsr.nextTag();

      if(obj instanceof Brand) {
        // top-level <brand> - hand it to the BrandAdapter so it can be
        // cached if necessary
        ba.unmarshal((Brand)obj);
      }
      if(obj instanceof Car) {
        cars.add((Car)obj);
      }
    }
    xsr.close();

    // at this point, cars contains all the Car objects we found, with
    // any <brand> refIds resolved.
  }
}

score 4 · Accepted Answer

这是一个使用 XStream 的解决方案，因为您似乎已经熟悉它，而且它是一个非常灵活的 XML 工具。它是在 Groovy 中完成的，因为它比 Java 好得多。移植到 Java 相当简单。请注意，我选择对结果进行一些后处理，而不是尝试让 XStream 为我完成所有工作。具体来说，“品牌参考”是事后处理的。我可以在编组内部进行，但我认为这种方法更清洁，并且让您的选择更加开放以供将来修改。此外，这种方法允许“品牌”元素出现在整个文档中的任何位置，包括引用它们的汽车之后——如果你在运行中进行替换，我认为你无法做到这一点。

带注释的解决方案

import com.thoughtworks.xstream.XStream
import com.thoughtworks.xstream.annotations.*
import com.thoughtworks.xstream.converters.*
import com.thoughtworks.xstream.converters.extended.ToAttributedValueConverter
import com.thoughtworks.xstream.io.*
import com.thoughtworks.xstream.mapper.Mapper

// The classes as given, plus toString()'s for readable output and XStream
// annotations to support unmarshalling. Note that with XStream's flexibility,
// all of this is possible with no annotations, so no code modifications are
// actually required.

@XStreamAlias("car")
// A custom converter for handling the oddities of parsing a Car, defined
// below.
@XStreamConverter(CarConverter)
class Car {
    String brand
    Engine engine
    String toString() { "Car{brand='$brand', engine=$engine}" }
}

abstract class Engine {
}

@XStreamAlias("v12engine")
class V12Engine extends Engine {
    @XStreamAsAttribute int horsePowers
    String toString() { "V12Engine{horsePowers=$horsePowers}" }
}

@XStreamAlias("v6engine")
class V6Engine extends Engine {
    @XStreamAsAttribute @XStreamAlias("fuel") String fuelType
    String toString() { "V6Engine{fuelType='$fuelType'}" }
}

// The given input:
String xml = """\
    <list>
        <brand id="1">
            Volvo
        </brand>
        <car>
            <brand>BMW</brand>
            <v12engine horsePowers="300" />
        </car>
        <car>
            <brand refId="1" />
            <v6engine fuel="unleaded" />
        </car>
    </list>"""

// The solution:

// A temporary Brand class to hold the relevant information needed for parsing
@XStreamAlias("brand")
// An out-of-the-box converter that uses a single field as the value of an
// element and makes everything else attributes: a perfect match for the given
// "brand" XML.
@XStreamConverter(value=ToAttributedValueConverter, strings="name")
class Brand {
    Integer id
    Integer refId
    String name
    String toString() { "Brand{id=$id, refId=$refId, name='$name'}" }
}

// Reads Car instances, figuring out the engine type and storing appropriate
// brand info along the way.
class CarConverter implements Converter {
    Mapper mapper

    // A Mapper can be injected auto-magically by XStream when converters are
    // configured via annotation.
    CarConverter(Mapper mapper) {
        this.mapper = mapper
    }

    Object unmarshal(HierarchicalStreamReader reader,
                     UnmarshallingContext context) {
        Car car = new Car()
        reader.moveDown()
        Brand brand = context.convertAnother(car, Brand)
        reader.moveUp()
        reader.moveDown()
        // The mapper knows about registered aliases and can tell us which
        // engine type it is.
        Class engineClass = mapper.realClass(reader.getNodeName())
        def engine = context.convertAnother(car, engineClass)
        reader.moveUp()
        // Set the brand name if available or a placeholder for later 
        // reference if not.
        if (brand.name) {
            car.brand = brand.name
        } else {
            car.brand = "#{$brand.refId}"
        }
        car.engine = engine
        return car
    }

    boolean canConvert(Class type) { type == Car }

    void marshal(Object source, HierarchicalStreamWriter writer,
                 MarshallingContext context) {
        throw new UnsupportedOperationException("Don't need this right now")
    }
}

// Now exercise it:

def x = new XStream()
// As written, this line would have to be modified to add new engine types,
// but if this isn't desirable, classpath scanning or some other kind of
// auto-registration could be set up, but not through XStream that I know of.
x.processAnnotations([Car, Brand, V12Engine, V6Engine] as Class[])
// Parsing will create a List containing Brands and Cars
def brandsAndCars = x.fromXML(xml)
List<Brand> brands = brandsAndCars.findAll { it instanceof Brand }
// XStream doesn't trim whitespace as occurs in the sample XML. Maybe it can
// be made to?
brands.each { it.name = it.name.trim() }
Map<Integer, Brand> brandsById = brands.collectEntries{ [it.id, it] }
List<Car> cars = brandsAndCars.findAll{ it instanceof Car }
// Regex match brand references and replace them with brand names.
cars.each {
    def brandReference = it.brand =~ /#\{(.*)\}/
    if (brandReference) {
        int brandId = brandReference[0][1].toInteger()
        it.brand = brandsById.get(brandId).name
    }
}
println "Brands:"
brands.each{ println "  $it" }
println "Cars:"
cars.each{ println "  $it" }

输出

Brands:
  Brand{id=1, refId=null, name='Volvo'}
Cars:
  Car{brand='BMW', engine=V12Engine{horsePowers=300}}
  Car{brand='Volvo', engine=V6Engine{fuelType='unleaded'}}

没有注释的解决方案

PS 只是为了笑，这里是同样的东西，没有注释。除了没有对类进行注释之外，其他的都是一样的，在下面有几行额外的行new XStream()来完成注释之前所做的所有事情。输出是相同的。

import com.thoughtworks.xstream.XStream
import com.thoughtworks.xstream.converters.*
import com.thoughtworks.xstream.converters.extended.ToAttributedValueConverter
import com.thoughtworks.xstream.io.*
import com.thoughtworks.xstream.mapper.Mapper

class Car {
    String brand
    Engine engine
    String toString() { "Car{brand='$brand', engine=$engine}" }
}

abstract class Engine {
}

class V12Engine extends Engine {
    int horsePowers
    String toString() { "V12Engine{horsePowers=$horsePowers}" }
}

class V6Engine extends Engine {
    String fuelType
    String toString() { "V6Engine{fuelType='$fuelType'}" }
}

String xml = """\
    <list>
        <brand id="1">
            Volvo
        </brand>
        <car>
            <brand>BMW</brand>
            <v12engine horsePowers="300" />
        </car>
        <car>
            <brand refId="1" />
            <v6engine fuel="unleaded" />
        </car>
    </list>"""

class Brand {
    Integer id
    Integer refId
    String name
    String toString() { "Brand{id=$id, refId=$refId, name='$name'}" }
}

class CarConverter implements Converter {
    Mapper mapper

    CarConverter(Mapper mapper) {
        this.mapper = mapper
    }

    Object unmarshal(HierarchicalStreamReader reader,
                     UnmarshallingContext context) {
        Car car = new Car()
        reader.moveDown()
        Brand brand = context.convertAnother(car, Brand)
        reader.moveUp()
        reader.moveDown()
        Class engineClass = mapper.realClass(reader.getNodeName())
        def engine = context.convertAnother(car, engineClass)
        reader.moveUp()
        if (brand.name) {
            car.brand = brand.name
        } else {
            car.brand = "#{$brand.refId}"
        }
        car.engine = engine
        return car
    }

    boolean canConvert(Class type) { type == Car }

    void marshal(Object source, HierarchicalStreamWriter writer,
                 MarshallingContext context) {
        throw new UnsupportedOperationException("Don't need this right now")
    }
}

def x = new XStream()
x.alias('car', Car)
x.alias('brand', Brand)
x.alias('v6engine', V6Engine)
x.alias('v12engine', V12Engine)
x.registerConverter(new CarConverter(x.mapper))
x.registerConverter(new ToAttributedValueConverter(Brand, x.mapper, x.reflectionProvider, x.converterLookup, 'name'))
x.useAttributeFor(V12Engine, 'horsePowers')
x.aliasAttribute(V6Engine, 'fuelType', 'fuel')
x.useAttributeFor(V6Engine, 'fuelType')
def brandsAndCars = x.fromXML(xml)
List<Brand> brands = brandsAndCars.findAll { it instanceof Brand }
brands.each { it.name = it.name.trim() }
Map<Integer, Brand> brandsById = brands.collectEntries{ [it.id, it] }
List<Car> cars = brandsAndCars.findAll{ it instanceof Car }
cars.each {
    def brandReference = it.brand =~ /#\{(.*)\}/
    if (brandReference) {
        int brandId = brandReference[0][1].toInteger()
        it.brand = brandsById.get(brandId).name
    }
}
println "Brands:"
brands.each{ println "  $it" }
println "Cars:"
cars.each{ println "  $it" }

PPS 如果您安装了 Gradle，您可以将其放入 abuild.gradle并将上述脚本之一放入src/main/groovy/XStreamExample.groovy，然后就gradle run可以查看结果：

apply plugin: 'groovy'
apply plugin: 'application'

mainClassName = 'XStreamExample'

dependencies {
    groovy 'org.codehaus.groovy:groovy:2.0.5'
    compile 'com.thoughtworks.xstream:xstream:1.4.3'
}

repositories {
    mavenCentral()
}

score 1 · Accepted Answer

您可以尝试在此处参考以获取一些想法。

就个人而言，我会使用DOM Parser来获取 XML 文件的内容。

例子：

import java.io.*;
import javax.xml.parsers.*;

import org.w3c.dom.*;

public class DOMExample {

  public static void main(String[] args) throws Exception {

    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();

    File file = new File("filename.xml");
    Document doc = builder.parse(file);

    NodeList carList = doc.getElementsByTagName("car");
    for (int i = 0; i < carList.getLength(); ++i) {

        Element carElem = (Element)carList.item(i);

        Element brandElem = (Element)carElem.getElementsByTagName("brand").item(0);
        Element engineElem = (Element)carElem.getElementsByTagName("v12engine").item(0);

        String brand= brandElem.getTextContent();
        String engine= engineElem.getTextContent();

        System.out.println(brand+ ", " + engine);

        // TODO Do something with the desired information.
    }       
  }
}

如果您知道标签名称的可能内容，这将非常有效。有很多方法可以解析 XML 文件。希望你能想出适合你的东西。祝你好运！

java - 使用对先前标记的引用以及与某个类的子类型相对应的子项来解析 XML

3 回答 3

按元素名称的引擎类型

品牌的交叉引用

Related

Reference