1

In XSLT, there is a remove() function for sequences. Given a sequence and a position, it returns the sequence minus the item at the given position.

The question is: How do I employ this function in an actual XSLT file?

The only place I've found mention of an example that isn't just a regurgitation of the function spec completely devoid of context is here: http://books.google.com/books?id=W6SpffnfEPoC&pg=PA776&lpg=PA776&dq=xslt+%22remove+function%22&source=bl&ots=DQQrnXF_nB&sig=nrJtpEvYjBaZU0K8iAtdPTGUIbI&hl=en&sa=X&ei=QOq8T7aPDOyI6AHh-JBP&ved=0CEQQ6AEwAQ#v=onepage&q=xslt%20%22remove%20function%22&f=false

Unfortunately, the stylesheet examples are on pages 777 and 778, which are, of course, not included. And I don't own that book.

So, does anyone have an example of using the remove() XSLT function in an actual stylesheet?

Edit: Let's provide a slightly more concrete example, shall we?

I have a sequence in an XSLT. This sequence is comprised of all of the lines from a text file.

<xsl:variable name="lines" select="tokenize(unparsed-text($filePath), '\r?\n')" />

Every one of these lines is a record...except for one, which gives me the record count. So I have the following code for finding that line:

<xsl:variable name="recordCount">
  <xsl:for-each select="$lines[position()]">
    <xsl:variable name="i" select="position()" />
    <xsl:analyze-string select="$lines[$i]" regex="RECORD COUNT = \d+">
      <xsl:matching-substring>
        <xsl:value-of select="replace($lines[$i], '[^0-9]', '')" />
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:for-each>
</xsl:variable>

I do the above before I start looping through the lines to get all the actual records, so my goal here is to remove the "RECORD COUNT" line from the $lines sequence when I find it. That way when I'm looping through grabbing records I don't have to do a check every time asking "Is this actually not a record, but in fact the RECORD COUNT line? You know, that thing I looked for and found already?"

Edit (2): Based on Martin Honnen's answer(s), my final XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <!-- I want to produce an XML document. -->
  <xsl:output method="xml" indent="yes" />

  <!-- Path to input text file. -->
  <xsl:param name="filePath" select="TestFile.txt" />

  <!-- Regex in replace() removes leading and trailing blank space. -->
  <xsl:variable name="text" select="replace(unparsed-text($filePath), '(^[\r\n]*\s*[\r\n]+)|([\r\n]+\s*[\r\n]*$)', '')" />

  <!-- Regex in tokenize() sets the delimiter to be any blank space between record lines. -->
  <!-- This effectively removes any blank lines. -->
  <xsl:variable name="lines" select="tokenize($text, '[\r\n]+\s*[\r\n]*')" />

  <!-- This finds the "RECORD COUNT = ?" line. -->
  <xsl:variable name="recordCountIndex"
    select="for $pos in 1 to count($lines) return $pos[matches($lines[$pos], 'RECORD COUNT = \d+')]" />

  <!-- Regex in replace() strips everything that's not a number, leaving only the numeric count. -->
  <!-- Example: "RECORD COUNT = 25" -> "25" -->
  <xsl:variable name="recordCount" select="replace($lines[$recordCountIndex], '[^0-9]', '')" />

  <xsl:template name="main">
    <root>
      <recordCount>
        <!-- The record count value being inserted. -->
        <xsl:value-of select="$recordCount" />
      </recordCount>
      <records>
        <!-- Iterate over the $lines minus the line containing the record count. -->
        <xsl:for-each select="remove($lines, $recordCountIndex)">
          <!-- Items in each record, split by blank space. -->
          <!-- Example: "a b c" -> "[a, b, c]" -->
          <xsl:variable name="record" select="tokenize(., ' ')[position()]" />
          <record>
            <aThing>
              <xsl:value-of select="$record[1]" />
            </aThing>
            <aDifferentThing>
              <xsl:value-of select="$record[2]" />
            </aDifferentThing>
            <someStuff>
              <xsl:value-of select="$record[3]" />
            </someStuff>
          </record>
        </xsl:for-each>
      </records>
    </root>
  </xsl:template>
</xsl:stylesheet>
4

2 回答 2

1

Well

<xsl:variable name="seq1" select="1, 2, 3, 4"/>
<xsl:variable name="seq2" select="remove($seq1, 2)"/>

makes the value of the variable seq2 a sequence of three number values 1, 3, 4.

[edit]

Here is an example based on your edited problem description:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="2.0">

  <xsl:output method="text"/>

  <xsl:param name="filePath" select="'test2012052301.txt'"/>

  <xsl:variable name="lines" select="tokenize(unparsed-text($filePath), '\r?\n')" />

  <xsl:variable name="index" as="xs:integer"
    select="for $pos in 1 to count($lines) return $pos[matches($lines[$pos], 'RECORD COUNT = [0-9]+')]"/>

  <xsl:variable name="recordCount" as="xs:integer"
    select="xs:integer(replace($lines[$index], '[^0-9]', ''))"/>

  <xsl:template name="main">
    <xsl:value-of select="remove($lines, $index)" separator="&#10;"/>
    <xsl:text>count is: </xsl:text>
    <xsl:value-of select="$recordCount"/>
  </xsl:template>

</xsl:stylesheet>

With the text file being for instance

foo
bar
RECORD COUNT = 3
baz

the stylesheet outputs

foo
bar
baz
count is: 3

[edit2] I think you can shorten the section

  <records>
    <!-- The $lines sequence trimmed down to only consist of valid records. -->
    <!-- (I have found no way around having this intermediate variable.) -->
    <xsl:variable name="records" select="remove($lines, $recordCountIndex)" />
    <xsl:for-each select="$records[position()]">
      <!-- Variable for iteration. Perhaps there's a more elegant way to do this. -->
      <xsl:variable name="i" select="position()" />
      <!-- Items in each record, split by blank space. -->
      <!-- Example: "a b c" -> "[a, b, c]" -->
      <xsl:variable name="recordItems" select="tokenize($records[$i], ' ')" />
      <record>
        <item1>
          <xsl:value-of select="$recordItems[1]" />
        </item1>
        <item2>
          <xsl:value-of select="$recordItems[2]" />
        </item2>
        <item3>
          <xsl:value-of select="$recordItems[3]" />
        </item3>
      </record>
    </xsl:for-each>
  </records>

of your stylesheet to

  <records>
    <xsl:for-each select="remove($lines, $recordCountIndex)">
      <record>
        <xsl:for-each select="tokenize(., ' ')[position() lt 4]">
          <xsl:element name="item{position()}">
            <xsl:value-of select="."/>
          </xsl:element>
        </xsl:for-each>
      </record>
    </xsl:for-each>
  </records>

Actually the predicate position() lt 4 is only needed if a line can contain more than three tokens.

And as a note, I have now seen a construct like for-each select="$records[position()] two times in your post, that predicate with [position()] is complete useless, you can simply use for-each select="$records".

于 2012-05-23T14:14:01.670 回答
1

It's hard to work out exactly where your confusion lies.

Firstly, removing an item from a sequence will never remove a node from a tree. (I've fought against the way the spec talks about sequences "containing nodes"; I think it's better to think of them as containing references to nodes. So you're removing a reference to a node, which doesn't affect the node itself in any way.)

Secondly, you seem to be thinking of variables as they are sometimes described in procedural languages, as named boxes containing values, which can contain different values at different times. Don't think of XSLT and XQuery variables that way: think of them as named values. "Over-writing" just isn't a meaningful operation.

Finally, use-cases. The most common way I use remove is to get the tail of a sequence: remove($seq, 1). You can also write that as subsequence($seq, 2) or as $seq[position() gt 1], but remove() is fewer keystrokes. To be honest, I'm having trouble thinking of a real life example where I have used remove() any other way, and I can't think of one.

Which leads me to an observation about your question. Asking "how to I use this feature" is a pretty strange sort of question. What we expect people to ask is "how do I solve this problem". Sometimes, when people ask how to use a feature, they are struggling to solve a particular problem, but they aren't telling us what the problem is. It would help if you told us: there's a good chance that remove() isn't part of the solution.

于 2012-05-23T16:13:05.857 回答