我试图获取这里提供的 xml http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml但它有点棘手,因为他们没有提供任何支持。目的是将xml转换为php以便处理xml。
有人可以给个提示吗?
我试图获取这里提供的 xml http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml但它有点棘手,因为他们没有提供任何支持。目的是将xml转换为php以便处理xml。
有人可以给个提示吗?
通过其中的 HTML 呈现的 XML 也不是 XML,这并不是真的。
您正在寻找的是DOMDocument中称为textContent的内容。这将只为您提供该 HMTL 中的文本。就像它在浏览器中显示为“文本”一样。
因此,您需要做的就是将 HTML 文档加载到DOMDocument
. 因为它包含错误,所以使用内部错误:
$url = 'http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml';
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
$doc->loadHTMLFile($url);
libxml_use_internal_errors(FALSE);
下一部分暗示了有关被抓取页面的特定知识。在您的情况下,XML 是所有具有类属性"xml-tag" *followed* 的div-tags的上述文本内容,位于 ID 为"ResultView"的标签之后。
这些标签可以通过 xpath 查询轻松获取,然后将它们的文本内容存储到一个数组中:
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//*[@id="ResultView"]/following-sibling::div[@class="xml-tag"]');
$buffer = array();
foreach ($nodes as $node) {
$buffer[] = $node->textContent;
}
所以现在剩下的就是创建一个新的DOMDocument
并将 XML 缓冲区加载到其中,做一些很好的格式化和输出:
$new = new DOMDocument();
$new->preserveWhiteSpace = FALSE;
$new->formatOutput = TRUE;
$new->loadXML(implode('', $buffer));
$new->save('php://output');
这大约 20 行代码会产生以下输出:
<?xml version="1.0"?>
<EXPERIMENT_PACKAGE>
<EXPERIMENT alias="SC_EXP_7229_8#56" center_name="SC" accession="ERX086768">
<IDENTIFIERS>
<PRIMARY_ID>ERX086768</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">SC_EXP_7229_8#56</SUBMITTER_ID>
</IDENTIFIERS>
<TITLE/>
<STUDY_REF accession="ERP000913" refname="Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977" refcenter="SC">
<IDENTIFIERS>
<PRIMARY_ID>ERP000913</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977</SUBMITTER_ID>
</IDENTIFIERS>
</STUDY_REF>
<DESIGN>
<DESIGN_DESCRIPTION>Standard</DESIGN_DESCRIPTION>
<SAMPLE_DESCRIPTOR accession="ERS074283" refname="MR223754-sc-2011-11-18T11:31:44Z-1306470" refcenter="SC">
<IDENTIFIERS>
<PRIMARY_ID>ERS074283</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">MR223754-sc-2011-11-18T11:31:44Z-1306470</SUBMITTER_ID>
</IDENTIFIERS>
</SAMPLE_DESCRIPTOR>
<LIBRARY_DESCRIPTOR>
<LIBRARY_NAME>4008297</LIBRARY_NAME>
<LIBRARY_STRATEGY>WGS</LIBRARY_STRATEGY>
<LIBRARY_SOURCE>GENOMIC</LIBRARY_SOURCE>
<LIBRARY_SELECTION>RANDOM</LIBRARY_SELECTION>
<LIBRARY_LAYOUT>
<PAIRED NOMINAL_LENGTH="250"/>
</LIBRARY_LAYOUT>
</LIBRARY_DESCRIPTOR>
<SPOT_DESCRIPTOR>
<SPOT_DECODE_SPEC>
<READ_SPEC>
<READ_INDEX>0</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Forward</READ_TYPE>
<BASE_COORD>1</BASE_COORD>
</READ_SPEC>
<READ_SPEC>
<READ_INDEX>1</READ_INDEX>
<READ_CLASS>Application Read</READ_CLASS>
<READ_TYPE>Reverse</READ_TYPE>
<RELATIVE_ORDER follows_read_index="0"/>
</READ_SPEC>
</SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>
</DESIGN>
<PLATFORM>
<ILLUMINA>
<INSTRUMENT_MODEL>Illumina HiSeq 2000</INSTRUMENT_MODEL>
</ILLUMINA>
</PLATFORM>
<PROCESSING/>
</EXPERIMENT>
<SUBMISSION accession="ERA119046" center_name="SC" submission_date="2012-04-17T09:29:50Z" alias="ERP000913-sc-20120417-2" lab_name="">
<IDENTIFIERS>
<PRIMARY_ID>ERA119046</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">ERP000913-sc-20120417-2</SUBMITTER_ID>
</IDENTIFIERS>
</SUBMISSION>
<STUDY alias="Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977" center_name="SC" accession="ERP000913">
<IDENTIFIERS>
<PRIMARY_ID>ERP000913</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977</SUBMITTER_ID>
</IDENTIFIERS>
<DESCRIPTOR>
<STUDY_TITLE>Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis</STUDY_TITLE>
<STUDY_TYPE existing_study_type="Whole Genome Sequencing"/>
<STUDY_ABSTRACT>http://www.sanger.ac.uk/resources/downloads/bacteria/</STUDY_ABSTRACT>
<CENTER_PROJECT_NAME>Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis</CENTER_PROJECT_NAME>
<STUDY_DESCRIPTION>http://www.sanger.ac.uk/resources/downloads/bacteria/
This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/</STUDY_DESCRIPTION>
</DESCRIPTOR>
</STUDY>
<SAMPLE alias="MR223754-sc-2011-11-18T11:31:44Z-1306470" center_name="SC" accession="ERS074283">
<IDENTIFIERS>
<PRIMARY_ID>ERS074283</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">MR223754-sc-2011-11-18T11:31:44Z-1306470</SUBMITTER_ID>
</IDENTIFIERS>
<SAMPLE_NAME>
<COMMON_NAME>Streptococcus dysgalactiae subspecies equisimilis</COMMON_NAME>
<TAXON_ID>119602</TAXON_ID>
<SCIENTIFIC_NAME>Streptococcus dysgalactiae subsp. equisimilis</SCIENTIFIC_NAME>
</SAMPLE_NAME>
<SAMPLE_LINKS>
<SAMPLE_LINK>
<ENTREZ_LINK>
<DB>biosample</DB>
<ID>859730</ID>
</ENTREZ_LINK>
</SAMPLE_LINK>
</SAMPLE_LINKS>
<SAMPLE_ATTRIBUTES>
<SAMPLE_ATTRIBUTE>
<TAG>Strain</TAG>
<VALUE>MR223754</VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>Sample Description</TAG>
<VALUE/>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>ArrayExpress-StrainOrLine</TAG>
<VALUE>MR223754</VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>ArrayExpress-Sex</TAG>
<VALUE>not applicable</VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>ArrayExpress-Species</TAG>
<VALUE>Streptococcus dysgalactiae subspecies equisimilis</VALUE>
</SAMPLE_ATTRIBUTE>
</SAMPLE_ATTRIBUTES>
</SAMPLE>
<RUN_SET>
<RUN alias="SC_RUN_7229_8#56" center_name="SC" accession="ERR109334" total_spots="2708543" total_bases="406281450" size="334475592" load_done="true" published="2012-04-27 20:11:35" is_public="true" cluster_name="public" static_data_available="1">
<IDENTIFIERS>
<PRIMARY_ID>ERR109334</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">SC_RUN_7229_8#56</SUBMITTER_ID>
</IDENTIFIERS>
<EXPERIMENT_REF refname="SC_EXP_7229_8#56" refcenter="SC" accession="ERX086768">
<IDENTIFIERS>
<PRIMARY_ID>ERX086768</PRIMARY_ID>
<SUBMITTER_ID namespace="SC">SC_EXP_7229_8#56</SUBMITTER_ID>
</IDENTIFIERS>
</EXPERIMENT_REF>
<Pool>
<Member member_name="" accession="ERS074283" sample_name="MR223754-sc-2011-11-18T11:31:44Z-1306470" spots="2708543" bases="406281450"/>
</Pool>
</RUN>
</RUN_SET>
</EXPERIMENT_PACKAGE>
所以不要重新发明轮子,只需了解现有工具即可。有时它比第一眼看起来更容易。
http://php.net/manual/en/class.simplexmlelement.php
它将为您提供一个简单的界面来使用 xml 作为对象。您可能会设置一些属性以解析我想的 cdata 值和属性。要从 Web 服务器获取 xml,请使用 curl 或 file_get_contents 之类的东西。但推荐使用卷曲。
单击发送数据>获取将带您到另一个页面。以不同格式下载的选项。此网址:http ://trace.ncbi.nlm.nih.gov/Traces/sra/?cmd=dload&run_list=ERR109334&format=fasta似乎以 gzip 格式提供数据。也许您可以GET
在此源上使用 a 而不是尝试将 XML 从 HTML 中解析出来?
您必须列出所有有效的 HTMl 标签并将它们从网页中删除。例如:
$taglist = ['div', 'b', 'input']; // List the HTML tags here.
$xml= (read in the webpage here);
foreach ($taglist as $tag) {
$regex = '<' . $tag . '(?: [a-z]+(?:=.+))*?>';
$xml = preg_replace($regex, '', $xml);
// Repeat for the closing tag
$regex = '</' . $tag . '(?: [a-z]+(?:=.+))*?>';
$xml = preg_replace($regex, '', $xml);
}
完成后,$xml 将包含 XML 作为字符串,PHP 应该能够处理它。
这门课XmlRead
可以做到。我也为它设置了 curl 课程
卷曲:
function HeaderProc($response,$Run="",$String=1/*[Is 1 IF Use for String Mode ]*/){
if($String==1){
$response=explode("\r\n",$response);
}
$PartHeader=0;
$out[$PartHeader]=array();
while(list($key,$val)=each($response)){
$name='';
$value='';
$flag=false;
for($i=0;$i<strlen($val);$i++){
if($val[$i]==":"){
$flag=true;
for($j=$i+1;$j<strlen($val);$j++){
if($val[$i]=="\r" and $val[$i+1]=="\n"){
break;
}
$value.=$val[$j];
}
break;
}
$name.=$val[$i];
}
if($flag){
if($name=='' and $value==''){
$PartHeader++;
}else{
if(isset($out[$PartHeader][$name])){
if(is_array($out[$PartHeader][$name])){
$out[$PartHeader][$name][]=$value;
}else{
$T=$out[$PartHeader][$name];
$out[$PartHeader][$name]=array();
$out[$PartHeader][$name][0]=$T;
$out[$PartHeader][$name][1]=$value;
}
}else{
$out[$PartHeader][$name]=$value;
}
}
}else{
if($name==''){
$PartHeader++;
}else{
if(isset($out[$PartHeader][$name])){
if(is_array($out[$PartHeader][$name])){
$out[$PartHeader][$name][]=$value;
}else{
$T=$out[$PartHeader][$name];
$out[$PartHeader][$name]=array();
$out[$PartHeader][$name][0]=$T;
$out[$PartHeader][$name][1]=$name;
}
}else{
$out[$PartHeader][$name]=$name;
}
}
}
if($Run!=""){
$Run($name,$value);
}
}
return $out;
}
class cURL {
var $headers;
var $user_agent;
var $compression;
var $cookie_file;
var $proxy;
var $Cookie;
function CookieAnalysis($Cookie){//convert str cookie to array cookie
//echo $Cookie;
$this->Cookie=array();
preg_match("~(.*?)=(.*?);~si",' '.$Cookie.'; ',$M);
$this->Cookie[trim($M[1])]=trim($M[2]);
return $this->Cookie;
}
function cURL($cookies=false,$cookie='cookies.txt',$compression='gzip',$proxy='') {
$this->headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$this->headers[] = 'Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3';
$this->headers[] = 'Accept-Encoding:gzip,deflate,sdch';
$this->headers[] = 'Accept-Language:en-US,en;q=0.8';
$this->headers[] = 'Cache-Control:max-age=0';
$this->headers[] = 'Connection:keep-alive';
$this->user_agent = 'User-Agent:Mozilla/5.0 (SepidarSoft [Organic Search Engine Crawler] Linux Edition) AppleWebKit/536.5 (KHTML, like Gecko) SepidarBrowser/1.0.100.52 Safari/536.5';
$this->compression=$compression;
$this->proxy=$proxy;
$this->cookies=$cookies;
if ($this->cookies == TRUE) $this->cookie($cookie);
}
function cookie($cookie_file) {
if (file_exists($cookie_file)) {
$this->cookie_file=$cookie_file;
} else {
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions');
$this->cookie_file=$cookie_file;
@fclose($this->cookie_file);
}
}
function GET($url) {
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($process, CURLOPT_HEADER, 1);
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
curl_setopt($process,CURLOPT_ENCODING , $this->compression);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
$response = curl_exec($process);
$header_size = curl_getinfo($process,CURLINFO_HEADER_SIZE);
$result['Header'] = HeaderProc(substr($response, 0, $header_size),'',1);
foreach($result['Header'] as $HeaderK=>$HeaderP){
if(!is_array($HeaderP['Set-Cookie']))continue;
foreach($HeaderP['Set-Cookie'] as $key=>$val){
$result['Header'][$HeaderK]['Set-Cookie'][$key]=$this->CookieAnalysis($val);
}
}
$result['Body'] = substr( $response, $header_size );
$result['HTTP_State'] = curl_getinfo($process,CURLINFO_HTTP_CODE);
$result['URL'] = curl_getinfo($process,CURLINFO_EFFECTIVE_URL);
curl_close($process);
return $result;
}
function POST($url,$data) {
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($process, CURLOPT_HEADER, 1);
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
curl_setopt($process, CURLOPT_ENCODING , $this->compression);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy);
curl_setopt($process, CURLOPT_POSTFIELDS, $data);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($process, CURLOPT_POST, 1);
$response = curl_exec($process);
$header_size = curl_getinfo($process,CURLINFO_HEADER_SIZE);
$result['Header'] = HeaderProc(substr($response, 0, $header_size),'',1);
foreach($result['Header'] as $HeaderK=>$HeaderP){
if(!is_array($HeaderP['Set-Cookie']))continue;
foreach($HeaderP['Set-Cookie'] as $key=>$val){
$result['Header'][$HeaderK]['Set-Cookie'][$key]=$this->CookieAnalysis($val);
}
}
$result['Body'] = substr( $response, $header_size );
$result['HTTP_State'] = curl_getinfo($process,CURLINFO_HTTP_CODE);
$result['URL'] = curl_getinfo($process,CURLINFO_EFFECTIVE_URL);
curl_close($process);
return $result;
}
function error($error) {
echo "<center><div style='width:500px;border: 3px solid #FFEEFF; padding: 3px; background-color: #FFDDFF;font-family: verdana; font-size: 10px'><b>cURL Error</b><br>$error</div></center>";
die;
}
}
XmlRead
class XmlRead{
static function Clean($html){
$html=preg_replace_callback("~<script(.*?)>(.*?)</script>~si",function($m){
//print_r($m);
// $m[2]=preg_replace("/\/\*(.*?)\*\/|[\t\r\n]/s"," ", " ".$m[2]." ");
$m[2]=preg_replace("~//(.*?)\n~si"," ", " ".$m[2]." ");
//echo $m[2];
return "<script ".$m[1].">".$m[2]."</script>";
}, $html);
$search = array(
"/ +/" => " ",
"/<!–\{(.*?)\}–>|<!–(.*?)–>|[\t\r\n]|<!–|–>|\/\/ <!–|\/\/ –>|<!\[CDATA\[|\/\/ \]\]>|\]\]>|\/\/\]\]>|\/\/<!\[CDATA\[/" => "");
//$html = preg_replace(array_keys($search), array_values($search), $html);
$search = array(
"/\/\*(.*?)\*\/|[\t\r\n]/s" => "",
"/ +\{ +|\{ +| +\{/" => "{",
"/ +\} +|\} +| +\}/" => "}",
"/ +: +|: +| +:/" => ":",
"/ +; +|; +| +;/" => ";",
"/ +, +|, +| +,/" => ","
);
$html = preg_replace(array_keys($search), array_values($search), $html);
preg_match_all('!(<(?:code|pre|script).*>[^<]+</(?:code|pre|script)>)!',$html,$pre);
$html = preg_replace('!<(?:code|pre).*>[^<]+</(?:code|pre)>!', '#pre#', $html);
$html = preg_replace('#<!–[^\[].+–>#', '', $html);
$html = preg_replace('/[\r\n\t]+/', ' ', $html);
$html = preg_replace('/>[\s]+</', '><', $html);
$html = preg_replace('/\s+/', ' ', $html);
if (!empty($pre[0])) {
foreach ($pre[0] as $tag) {
$html = preg_replace('!#pre#!', $tag, $html,1);
}
}
return($html);
}
function loadNprepare($content,$encod='') {
$content=self::Clean($content);
//$content=html_entity_decode(html_entity_decode($content));
// $content=htmlspecialchars_decode($content,ENT_HTML5);
$this->DataPage='';
preg_match('~<body(.*?)>(.*?)</body>~si',$content,$M);
$this->DataPage=$M[2];
$HTML=$this->DataPage;
$HTML="<!doctype html><html><head><meta charset=\"utf-8\"><title>Untitled Document</title></head><body>".$HTML."</body></html>";
$dom= new DOMDocument;
$HTML = str_replace("&", "&", $HTML); // disguise &s going IN to loadXML()
// $dom->substituteEntities = true; // collapse &s going OUT to transformToXML()
$dom->recover = TRUE;
@$dom->loadHTML('<?xml encoding="UTF-8">' .$HTML);
// dirty fix
foreach ($dom->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$dom->removeChild($item); // remove hack
$dom->encoding = 'UTF-8'; // insert proper
return $dom;
}
function GetBYClass($Doc,$ClassName){
$finder = new DomXPath($Doc);
return($finder->query("//*[contains(@class, '$ClassName')]"));
}
function extractText($node) {
if($node==NULL)return false;
if (XML_TEXT_NODE === $node->nodeType || XML_CDATA_SECTION_NODE === $node->nodeType) {
return $node->nodeValue;
} else if (XML_ELEMENT_NODE === $node->nodeType || XML_DOCUMENT_NODE === $node->nodeType || XML_DOCUMENT_FRAG_NODE === $node->nodeType) {
if ('script' === $node->nodeName) return '';
$text = '';
foreach($node->childNodes as $childNode) {
$text .= $this->extractText($childNode);
}
return $text;
}
}
function DOMRemove(DOMNode $from) {
$from->parentNode->removeChild($from);
}
}
为您的页面调用 class 和 conf
$cc = new cURL(); //
$XmlRead=new XmlRead();
$Data=$cc->get('http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml');
//get page
$doc=$XmlRead->loadNprepare($Data['Body']);//load as html
//remove two part of page related to your page .
$productspec=$XmlRead->DOMRemove($XmlRead->GetBYClass($doc,'title')->item(0));
$productspec=$XmlRead->DOMRemove($XmlRead->GetBYClass($doc,'aux')->item(0));
//select xml part
$productspec=$XmlRead->GetBYClass($doc,'rprt');
foreach($productspec as $data)
{
$content=html_entity_decode(html_entity_decode($XmlRead->extractText($data)));//decode as entity html
print_r($content);
}
输出:
<EXPERIMENT_PACKAGE><EXPERIMENT alias="SC_EXP_7229_8#56"center_name="SC"accession="ERX086768"><IDENTIFIERS><PRIMARY_ID>ERX086768</PRIMARY_ID><SUBMITTER_ID namespace="SC">SC_EXP_7229_8#56</SUBMITTER_ID></IDENTIFIERS><TITLE></TITLE><STUDY_REF accession="ERP000913"refname="Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977"refcenter="SC"><IDENTIFIERS><PRIMARY_ID>ERP000913</PRIMARY_ID><SUBMITTER_ID namespace="SC">Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977</SUBMITTER_ID></IDENTIFIERS></STUDY_REF><DESIGN><DESIGN_DESCRIPTION>Standard</DESIGN_DESCRIPTION><SAMPLE_DESCRIPTOR accession="ERS074283"refname="MR223754-sc-2011-11-18T11:31:44Z-1306470"refcenter="SC"><IDENTIFIERS><PRIMARY_ID>ERS074283</PRIMARY_ID><SUBMITTER_ID namespace="SC">MR223754-sc-2011-11-18T11:31:44Z-1306470</SUBMITTER_ID></IDENTIFIERS></SAMPLE_DESCRIPTOR><LIBRARY_DESCRIPTOR><LIBRARY_NAME>4008297</LIBRARY_NAME><LIBRARY_STRATEGY>WGS</LIBRARY_STRATEGY><LIBRARY_SOURCE>GENOMIC</LIBRARY_SOURCE><LIBRARY_SELECTION>RANDOM</LIBRARY_SELECTION><LIBRARY_LAYOUT><PAIRED NOMINAL_LENGTH="250"></PAIRED></LIBRARY_LAYOUT></LIBRARY_DESCRIPTOR><SPOT_DESCRIPTOR><SPOT_DECODE_SPEC><READ_SPEC><READ_INDEX>0</READ_INDEX><READ_CLASS>Application Read</READ_CLASS><READ_TYPE>Forward</READ_TYPE><BASE_COORD>1</BASE_COORD></READ_SPEC><READ_SPEC><READ_INDEX>1</READ_INDEX><READ_CLASS>Application Read</READ_CLASS><READ_TYPE>Reverse</READ_TYPE><RELATIVE_ORDER follows_read_index="0"></RELATIVE_ORDER></READ_SPEC></SPOT_DECODE_SPEC></SPOT_DESCRIPTOR></DESIGN><PLATFORM><ILLUMINA><INSTRUMENT_MODEL>Illumina HiSeq 2000</INSTRUMENT_MODEL></ILLUMINA></PLATFORM><PROCESSING></PROCESSING></EXPERIMENT><SUBMISSION accession="ERA119046"center_name="SC"submission_date="2012-04-17T09:29:50Z"alias="ERP000913-sc-20120417-2"lab_name=""><IDENTIFIERS><PRIMARY_ID>ERA119046</PRIMARY_ID><SUBMITTER_ID namespace="SC">ERP000913-sc-20120417-2</SUBMITTER_ID></IDENTIFIERS></SUBMISSION><STUDY alias="Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977"center_name="SC"accession="ERP000913"><IDENTIFIERS><PRIMARY_ID>ERP000913</PRIMARY_ID><SUBMITTER_ID namespace="SC">Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis-sc-2011-09-22T08:43:17Z-1977</SUBMITTER_ID></IDENTIFIERS><DESCRIPTOR><STUDY_TITLE>Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis</STUDY_TITLE><STUDY_TYPE existing_study_type="Whole Genome Sequencing"></STUDY_TYPE><STUDY_ABSTRACT>http://www.sanger.ac.uk/resources/downloads/bacteria/</STUDY_ABSTRACT><CENTER_PROJECT_NAME>Genome_diversity_in_Streptococcus_dysgalactiae_subspecies_equisimilis</CENTER_PROJECT_NAME><STUDY_DESCRIPTION>http://www.sanger.ac.uk/resources/downloads/bacteria/This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria),please see http://www.sanger.ac.uk/datasharing/</STUDY_DESCRIPTION></DESCRIPTOR></STUDY><SAMPLE alias="MR223754-sc-2011-11-18T11:31:44Z-1306470"center_name="SC"accession="ERS074283"><IDENTIFIERS><PRIMARY_ID>ERS074283</PRIMARY_ID><SUBMITTER_ID namespace="SC">MR223754-sc-2011-11-18T11:31:44Z-1306470</SUBMITTER_ID></IDENTIFIERS><SAMPLE_NAME><COMMON_NAME>Streptococcus dysgalactiae subspecies equisimilis</COMMON_NAME><TAXON_ID>119602</TAXON_ID><SCIENTIFIC_NAME>Streptococcus dysgalactiae subsp. equisimilis</SCIENTIFIC_NAME></SAMPLE_NAME><SAMPLE_LINKS><SAMPLE_LINK><ENTREZ_LINK><DB>biosample</DB><ID>859730</ID></ENTREZ_LINK></SAMPLE_LINK></SAMPLE_LINKS><SAMPLE_ATTRIBUTES><SAMPLE_ATTRIBUTE><TAG>Strain</TAG><VALUE>MR223754</VALUE></SAMPLE_ATTRIBUTE><SAMPLE_ATTRIBUTE><TAG>Sample Description</TAG><VALUE></VALUE></SAMPLE_ATTRIBUTE><SAMPLE_ATTRIBUTE><TAG>ArrayExpress-StrainOrLine</TAG><VALUE>MR223754</VALUE></SAMPLE_ATTRIBUTE><SAMPLE_ATTRIBUTE><TAG>ArrayExpress-Sex</TAG><VALUE>not applicable</VALUE></SAMPLE_ATTRIBUTE><SAMPLE_ATTRIBUTE><TAG>ArrayExpress-Species</TAG><VALUE>Streptococcus dysgalactiae subspecies equisimilis</VALUE></SAMPLE_ATTRIBUTE></SAMPLE_ATTRIBUTES></SAMPLE><RUN_SET><RUN alias="SC_RUN_7229_8#56"center_name="SC"accession="ERR109334"total_spots="2708543"total_bases="406281450"size="334475592"load_done="true"published="2012-04-27 20:11:35"is_public="true"cluster_name="public"static_data_available="1"><IDENTIFIERS><PRIMARY_ID>ERR109334</PRIMARY_ID><SUBMITTER_ID namespace="SC">SC_RUN_7229_8#56</SUBMITTER_ID></IDENTIFIERS><EXPERIMENT_REF refname="SC_EXP_7229_8#56"refcenter="SC"accession="ERX086768"><IDENTIFIERS><PRIMARY_ID>ERX086768</PRIMARY_ID><SUBMITTER_ID namespace="SC">SC_EXP_7229_8#56</SUBMITTER_ID></IDENTIFIERS></EXPERIMENT_REF><Pool><Member member_name=""accession="ERS074283"sample_name="MR223754-sc-2011-11-18T11:31:44Z-1306470"spots="2708543"bases="406281450"></Member></Pool></RUN></RUN_SET></EXPERIMENT_PACKAGE>