我已经设法从 XPS 文档中检索文本并根据需要使用它(感谢这个答案),但是我想知道是否有进一步的对象相关模型(而不是使用 an XmlReader
)可以自动将所有元素放入一个对象集合,您可以在代码等中循环。
这是一个人为的示例,但类似于此处的伪代码:
'open the xps document
Dim xpsDoc As New XpsDocument(pathToTestXps, System.IO.FileAccess.Read)
'load the fixed document squences
Dim fixedDocSeqReader As IXpsFixedDocumentSequenceReader = xpsDoc.FixedDocumentSequenceReader
'the content will go here
Dim sbContent As New System.Text.StringBuilder()
'loops the fixed focuments
For Each docReader As IXpsFixedDocumentReader In fixedDocSeqReader.FixedDocuments
'loop the fixed pages
For Each fixedPageReader As IXpsFixedPageReader In docReader.FixedPages
'BEGIN PSEUDO CODE
Dim content as IXpsContentCollection = fixedPageReader.Contents
For Each contentItem as IXpsContentItem In Contents
Select Case contentItem.Type
Case IXpsContentItem.ContentType.Canvas 'Group
'loop content items, check their type, do stuff
Case IXpsContentItem.ContentType.Glyph 'Text
Dim str As String = DirectCast(contentItem, Glyph).UniCodeString
'do something with the string
Case IXpsContentItem.ContentType.Path 'Shape
'get the shape properties etc
Case Else
Throw New ApplicationException("XPS Content Type Not Expected:" & contentItem.Type.ToString)
End Select
Next
'END PSEUDO CODE
Next
Next
如果没有这样的模型,使用 XMLReader 的最简单方法是什么,是否有关于 XML 元素和属性的良好参考?
对于上下文,目前,我只是这样做来代替上面的伪代码:
'get the xml for the fixed pages
Dim pageContentReader As System.Xml.XmlReader = fixedPageReader.XmlReader
While pageContentReader.Read()
'if it is a canvas, it's a new line or some other stuff
If pageContentReader.Name = XmlElementCanvas Then
'other stuff won't have attibutes
If pageContentReader.HasAttributes Then
'remove the last char as it will be an excess comma
If sbContent.Length > 0 Then
sbContent.Length = sbContent.Length - 1
sbContent.AppendLine()
End If
End If
End If
'if it is a glyph, it's the text we want
If pageContentReader.Name = XmlElementGlyphs Then
'unsure, but it was in the example code, so we'll keep it
If pageContentReader.HasAttributes Then
'unicode string attribute has the text we want
If pageContentReader.GetAttribute(XmlAttribUnicodeString) IsNot Nothing Then
'add the text and a comma
sbContent.Append(pageContentReader.GetAttribute(XmlAttribUnicodeString))
sbContent.Append(",")
End If
End If
End If
End While