0

我在理解 Net::HTTP 和 Nokogiri 时遇到问题。

我的 Jenkins 服务器上有大量工作。我必须定期更新这些作业的分支名称。从 UI 执行此操作是一个繁琐的过程,因此我决定更新 Jenkins config.xml。

我使用 Nokogiri 解析 XML,遍历 XPath 并更新节点的值。但是,当我尝试将更新的 XML 发布回 Jenkins 时,我收到 500 错误消息:

Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.

这是我正在做的事情:

require "net/http"
require "nokogiri"

uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)

getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)

xml_doc = Nokogiri::HTML(getQueue.body)

# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')

# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase

# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml

puts "Logging into Jenkins"

update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml

response = http.request(update_branch)

puts response.body

我知道它可能必须对添加到请求正文的 XML 做一些事情,但我不知道如何解决这个问题。

原始 XML:

<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin@1.504">
  <actions/>
  <description></description>
  <keepDependencies>false</keepDependencies>
  <properties>
    <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
      <maxConcurrentPerNode>0</maxConcurrentPerNode>
      <maxConcurrentTotal>0</maxConcurrentTotal>
      <categories/>
      <throttleEnabled>false</throttleEnabled>
      <throttleOption>project</throttleOption>
      <configVersion>1</configVersion>
    </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
  </properties>
  <scm class="hudson.plugins.git.GitSCM" plugin="git@1.4.0">
    <configVersion>2</configVersion>
    <userRemoteConfigs>
      <hudson.plugins.git.UserRemoteConfig>
        <name></name>
        <refspec></refspec>
        <url>git@github.com:<ORG_NAME>/<REPO_NAME>.git</url>
      </hudson.plugins.git.UserRemoteConfig>
    </userRemoteConfigs>
    <branches>
      <hudson.plugins.git.BranchSpec>
        <name>release</name>
      </hudson.plugins.git.BranchSpec>
    </branches>
    <disableSubmodules>false</disableSubmodules>
    <recursiveSubmodules>false</recursiveSubmodules>
    <doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
    <authorOrCommitter>false</authorOrCommitter>
    <clean>false</clean>
    <wipeOutWorkspace>false</wipeOutWorkspace>
    <pruneBranches>false</pruneBranches>
    <remotePoll>false</remotePoll>
    <ignoreNotifyCommit>false</ignoreNotifyCommit>
    <useShallowClone>false</useShallowClone>
    <buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
    <gitTool>Default</gitTool>
    <submoduleCfg class="list"/>
    <relativeTargetDir></relativeTargetDir>
    <reference></reference>
    <excludedRegions></excludedRegions>
    <excludedUsers></excludedUsers>
    <gitConfigName></gitConfigName>
    <gitConfigEmail></gitConfigEmail>
    <skipTag>false</skipTag>
    <includedRegions></includedRegions>
    <scmName></scmName>
  </scm>
  <canRoam>true</canRoam>
  <disabled>false</disabled>
  <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
  <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
  <triggers class="vector">
    <hudson.triggers.TimerTrigger>
      <spec>0 22 * * 4</spec>
    </hudson.triggers.TimerTrigger>
  </triggers>
  <concurrentBuild>false</concurrentBuild>
  <rootModule>
    <groupId>com.org.project.test</groupId>
    <artifactId>functest</artifactId>
  </rootModule>
  <goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
  <mavenName>apache-maven-3.0.4</mavenName>
  <aggregatorStyleBuild>true</aggregatorStyleBuild>
  <incrementalBuild>false</incrementalBuild>
  <perModuleEmail>true</perModuleEmail>
  <ignoreUpstremChanges>false</ignoreUpstremChanges>
  <archivingDisabled>false</archivingDisabled>
  <resolveDependencies>false</resolveDependencies>
  <processPlugins>false</processPlugins>
  <mavenValidationLevel>-1</mavenValidationLevel>
  <runHeadless>false</runHeadless>
  <disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
  <settings class="jenkins.mvn.DefaultSettingsProvider"/>
  <globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
  <reporters/>
  <publishers/>
  <buildWrappers/>
  <prebuilders/>
  <postbuilders/>
  <runPostStepsIfResult>
    <name>FAILURE</name>
    <ordinal>2</ordinal>
    <color>RED</color>
  </runPostStepsIfResult>
</maven2-moduleset>

编辑和按摩后:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
  <body>
    <maven2-moduleset plugin="maven-plugin@1.504">
      <actions />
      <description />
      <keepdependencies>false</keepdependencies>
      <properties>
        <hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents@1.7.2">
          <maxconcurrentpernode>0</maxconcurrentpernode>
          <maxconcurrenttotal>0</maxconcurrenttotal>
          <categories />
          <throttleenabled>false</throttleenabled>
          <throttleoption>project</throttleoption>
          <configversion>1</configversion>
        </hudson.plugins.throttleconcurrents.throttlejobproperty>
      </properties>
      <scm class="hudson.plugins.git.GitSCM" plugin="git@1.4.0">
        <configversion>2</configversion>
        <userremoteconfigs>
          <hudson.plugins.git.userremoteconfig>
            <name />
            <refspec />
            <url>git@github.com:<ORG_NAME>/<REPO_NAME>.git</url>
          </hudson.plugins.git.userremoteconfig>
        </userremoteconfigs>
        <branches>
          <hudson.plugins.git.branchspec>
            <name>master</name>
          </hudson.plugins.git.branchspec>
        </branches>
        <disablesubmodules>false</disablesubmodules>
        <recursivesubmodules>false</recursivesubmodules>
        <dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
        <authororcommitter>false</authororcommitter>
        <clean>false</clean>
        <wipeoutworkspace>false</wipeoutworkspace>
        <prunebranches>false</prunebranches>
        <remotepoll>false</remotepoll>
        <ignorenotifycommit>false</ignorenotifycommit>
        <useshallowclone>false</useshallowclone>
        <buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
        <gittool>Default</gittool>
        <submodulecfg class="list" />
        <relativetargetdir />
        <reference />
        <excludedregions />
        <excludedusers />
        <gitconfigname />
        <gitconfigemail />
        <skiptag>false</skiptag>
        <includedregions />
        <scmname />
      </scm>
      <canroam>true</canroam>
      <disabled>false</disabled>
      <blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
      <blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
      <triggers class="vector">
        <hudson.triggers.timertrigger>
          <spec>0 22 * * 4</spec>
        </hudson.triggers.timertrigger>
      </triggers>
      <concurrentbuild>false</concurrentbuild>
      <rootmodule>
        <groupid>com.org.project.test</groupid>
        <artifactid>functest</artifactid>
      </rootmodule>
      <goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
      <mavenname>apache-maven-3.0.4</mavenname>
      <aggregatorstylebuild>true</aggregatorstylebuild>
      <incrementalbuild>false</incrementalbuild>
      <permoduleemail>true</permoduleemail>
      <ignoreupstremchanges>false</ignoreupstremchanges>
      <archivingdisabled>false</archivingdisabled>
      <resolvedependencies>false</resolvedependencies>
      <processplugins>false</processplugins>
      <mavenvalidationlevel>-1</mavenvalidationlevel>
      <runheadless>false</runheadless>
      <disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
      <settings class="jenkins.mvn.DefaultSettingsProvider" />
      <globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
      <reporters />
      <publishers />
      <buildwrappers />
      <prebuilders />
      <postbuilders />
      <runpoststepsifresult>
        <name>FAILURE</name>
        <ordinal>2</ordinal>
        <color>RED</color>
      </runpoststepsifresult>
    </maven2-moduleset>
  </body>
</html>
4

2 回答 2

2

当您使用Nokogiri::HTML(some_html)orNokogiri::XML(some_xml)时,Nokogiri 将查看内容是否有效。如果不是,它将对内容进行修复以试图做到这一点。例如:

require 'nokogiri'

html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html 
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"

如果文档部分正确,Nokogiri 仍然添加 DOCTYPE 语句:

html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html 
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"

如果您希望 Nokogiri 留下文档,因为它应该是一个片段,请告诉它这样做:

Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html 
# => "<p>foo bar</p>"

或者:

xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml 
# => "<x>foo bar</x>"

Nokogiri 在处理 XML 和 HTML 方面非常聪明。您可以尝试混淆它,它通常会做正确的事情:

xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml 
# => "<x>foo bar</x>"

那就是将 XML 解析为 HTML 片段并告诉它以 XML 的形式发出。

现在,说了这么多,很明显 Nokogiri 并没有做任何神秘的事情,所以,这里是解决问题的方法。首先,将其解析为 XML,因此 Nokogiri 认为它不应该添加 HTML DOCTYPE 声明,然后,如果 XML 在语法上正确,则告诉 Nokogiri 可以将其解析为完整文档:

require 'nokogiri'

xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin@1.504">
  <actions/>
  <description></description>
  <keepDependencies>false</keepDependencies>
  <properties>
    <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
    </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
  </properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml 

# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin@1.504">
# >>   <actions/>
# >>   <description/>
# >>   <keepDependencies>false</keepDependencies>
# >>   <properties>
# >>     <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
# >>     </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >>   </properties>
# >> </maven2-moduleset>

或者作为一个片段,因为它是完整的,所以会产生同样的结果:

puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml 

# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin@1.504">
# >>   <actions/>
# >>   <description/>
# >>   <keepDependencies>false</keepDependencies>
# >>   <properties>
# >>     <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
# >>     </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >>   </properties>
# >> </maven2-moduleset>

我建议不要使用 Net::HTTP,它是 HTTP 的基本构建块,而是使用更高级别的东西,比如 HTTPClient。这是与您的代码相似的代码:

require 'httpclient'
require 'nokogiri'

URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'

http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
  http_client.get_content(URL)
)

# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')

# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase

# Set branch name and create xml
branch_name.content = new_branch_name

puts 'Logging into Jenkins'

http_client.set_auth(domain, 'user', 'password')

response = http_client.post(URL, :body => xml_doc.to_xml)

我无法测试它,但它看起来很接近。


现在,我发现自己陷入了另一个困境。我看到允许移动到元素和编辑值(如 at_xpath、at_css)的方法仅适用于 Nokogiri::HTML 或 Nokogiri::HTML::DocumentFragment。当我使用 Nokogiri::XML 时,它们不起作用。使用 Nokogiri::HTML 会改变 HTML 标签的大小写。假的变成假的。Jenkins 确实接受带有更改标签大小写的 xml。方法 to_html、to_xml 基本上返回一个字符串,所以我不能使用 xpath 或 css 方法来导航 xml 树。有办法吗?

这些at方法适用于 XML 和 HTML,并允许 CSS 和 XPath 选择器;Nokogiri 中的所有内容都是基于 XML 的。

Nokogiri 将 HTML 标记折叠为小写,因为 HTML 不区分大小写,因此at在处理 HTML 时需要小写值。XML 区分大小写,因此 Nokogiri 不考虑标签大小写,并at要求您在使用 CSS 时使用正确的大小写。

这记录在Nokogiri 文档中:

请注意,对于您的文档类型,CSS 查询字符串区分大小写。也就是说,如果您在 HTML 文档中寻找“H1”,您将永远找不到任何东西,因为 HTML 标记只会匹配小写的 CSS 查询。然而,“H1”可能出现在 XML 文档中,其中标签名称区分大小写(例如,“H1”与“h1”不同)。

于 2013-10-08T20:55:07.270 回答
0

当您解析从服务接收的 XML 时,您将其声明为 HTML:

xml_doc = Nokogiri::HTML(getQueue.body)

这似乎导致 Nokogiri 添加 HTML 节点。

尝试将其解析为 XML:

xml_doc = Nokogiri::XML(getQueue.body)

于 2013-10-08T14:12:55.087 回答