“google-genomics”的相关标签问题

0 投票

2 回答

4341 浏览

r - 如何在 R 中相交两个 data.frames？

我有两个表在 data.frame 结构中。表 1 包含一列 200 个基因 ID（字母和数字），表 2 包含 4,000 个基因 ID（按行）的列表以及 20 个附加列。我想将这两个表相交并生成一个新的表 3，其中包含 200 个基因 ID 以及 20 列中的相关信息。

表 3 <- 表 1%n%表 2

r google-genomics

2017-12-07T14:59:29.227

0 投票

2 回答

98 浏览

google-bigquery - How is it possible to export a Cloud Genomics variantset to BigQuery now that varientsets.export has been deprecated?

I have loaded a variantset into Cloud Genomics and am attempting to export it to BigQuery. The first approach I tried was to use a pipeline as detailed here:

https://cloud.google.com/genomics/docs/how-tos/load-variants

However, 20 minutes into the process, it failed. According to StackDriver error reporting, it appears to be a problem in the VCF file, though I am at a loss to explain how it might be fixed:

So I continued to search for other options. I turned to the API:

https://cloud.google.com/genomics/reference/rest/v1/variantsets/export

I made sure that my account was a BigQuery admin and an owner for the Genoimcs variantset. I used the following parameters:

Upon submitting, I receive the following error:

I have also tried this from the command line: gcloud alpha genomics variantsets export variantset_id bigquery_table --bigquery-dataset=my-dataset --bigquery-project=my-project.

But that gives me a 500 Unknown Error as well. I've been going back on this for several hours, and the documentation is quite sparse.

Please, what could I be missing?

google-bigquery google-genomics

2018-05-13T04:33:46.423

0 投票

0 回答

167 浏览

google-cloud-platform - dsub：谷歌云错误（“退出状态 141”）

我试图使用 dsub 在谷歌云上运行一些全基因组测序样本。dsub 命令适用于某些示例，但不适用于其他示例。我试过减少并行线程的数量，增加内存和磁盘，但仍然失败。由于每次运行大约需要 2 天，因此反复试验的方法非常昂贵！任何帮助/提示将不胜感激！

我的命令是：

带有“--full”选项的 dstat 命令将错误显示为：

在谷歌云上，日志文件的最后一行只是声明“（退出状态 141）”。

非常感谢！

google-cloud-platform google-genomics

2018-05-29T12:36:31.820

0 投票

1 回答

308 浏览

google-app-engine - 如何使用谷歌云上的 picard dock 将 fastq 转换为 uBAM

我一直在尝试将我在谷歌云上的 fastq 文件转换为 uBAM 文件，但到目前为止没有成功。这是我使用的代码：

我可以看到镜像已经被拉取并运行成功，但随后我收到错误消息说命令不正确，请检查 PicardcommandLine -h

有没有人有使用谷歌云将 fastq 转换为 uBAM 的经验？请帮忙。非常感激。谢谢你。

google-app-engine google-cloud-platform google-genomics picard

2018-06-10T12:01:06.547

0 投票

1 回答

143 浏览

centos6 - centos 上 jbrowse 的替代品

是否有 JBrowse 的替代软件（在 Centos6 上）。

我需要将一个集成到我的网页中，但是 jbrowse 在安装 PerlIO::gzip 时给出了 zlib 错误。虽然安装了所有相关模块（libpng、libpng-devel、gd-devel、zlib-devel、perl-ExtUtils-MakeMaker、开发工具、perl-Compress-Zlib）。

任何建议都会有所帮助。windows8 操作系统的替代方案也可以使用。

centos6 perl-module genome google-genomics jbrowse

2018-06-22T06:31:21.517

0 投票

1 回答

149 浏览

google-bigquery - 如何在 BigQuery Variant Schema 中进行继承/传输查询

Google Genomics Variant Transform 管道使用的 Variant Schema 将基因型表示为 BigQuery 中的嵌套记录 - 例如：

（来自：https ://bigquery.cloud.google.com/table/genomics-public-data:1000_genomes.variants?pli=1&tab=preview ）

我无法理解如何编写涉及样本之间关系的查询 - 例如：

select all variants where sampleA.genotype=HET and sampleB.genotype=HET and sampleC.genotype=HOM-ALT

或类似的查询，其中 sampleA 和 sampleB 是 sampleC 的父母，并且您正在寻找遵循特定继承模式的变体。

人们如何使用嵌套模式编写这些查询？

google-bigquery google-genomics

2018-08-03T00:07:01.247

0 投票

1 回答

864 浏览

google-cloud-platform - 访问谷歌云存储大文件中的随机行

我正在尝试从存储在公共云存储桶中的大文件中读取随机行。

我的理解是我不能用 gsutil 做到这一点并且已经研究了 FUSE 但不确定它是否会满足我的用例： https ://cloud.google.com/storage/docs/gcs-fuse

有很多文件，每个文件大约 50GB——总共有几个 TB。如果可能的话，我想避免下载这些文件。它们都是纯文本文件——你可以在这里看到它们： https://console.cloud.google.com/storage/browser/genomics-public-data/linkage-disequilibrium/1000-genomes-phase-3/ldCutoff0。 4_window1MB

如果我可以使用 FUSE 简单地获取文件系统句柄，这样我就可以将数据直接放入其他脚本中，那就太好了——但如果有必要，我可以重新编写它们以逐行读取。关键是——在任何情况下界面都不应该下载整个文件。

google-cloud-platform filesystems gcsfuse distributed-filesystem google-genomics

2018-10-24T14:50:32.430

0 投票

1 回答

1060 浏览

google-cloud-platform - Google Cloud 上的错误 - 基因组学：“未找到服务名称的 API 解决方案：基因组学”

我对 HPC 和 Google Cloud 完全陌生（我刚刚注册了一个试用帐户）。

我的想法是执行 RNAseq 分析（9 个样本配对，18 个 fastQ 文件），主要是我想执行 FastQC 和尝试不同对齐的映射。下载 Bam 文件，然后在家中使用我的计算机继续操作。

首先，我生成了一个具有 8 个 vCPU 和它们允许我的最大内存的实例，我选择了 Ubuntu 18.04。

然后我去了基因组学 API，第一个错误出现了：

未找到服务名称的 API 解决方案：基因组学

我怎样才能进步？在试用期内可以做我想做的事吗？

问候，费尔

google-cloud-platform google-genomics rna-seq

2018-10-24T15:13:37.453

0 投票

1 回答

324 浏览

google-cloud-platform - 在 GRCh38 全外显子序列上运行 DeepVariant

我正在尝试在我的 BAM 文件上运行 DeepVariant 以生成 VCF。我有以下问题：

1 - 对齐在 GRCh38 中，我应该使用哪个模型。我可以使用标准的全外显子组序列模型吗？('gs://deepvariant/models/DeepVariant/0.7.0/DeepVariant-inception_v3-0.7.0+data-wes_standard')

2 - 使用哪个 BED 文件来指定外显子组区域？有标准的吗？我在这里找到了一个我现在正在使用的（“CDS-cannonical.bed”）： https ://github.com/AstraZeneca-NGS/reference_data/tree/master/hg38/bed

3 - 我使用的是 Verily GRCh38 基因组，谷歌基因组学上是否有标准的 GRCh38 比对。这是我拥有的：--ref gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa \

我的脚本设置如下，如果有意义，请告诉我：

编辑：

我尝试添加使用 samtools 生成的 .bam.bai 文件（bam 索引）

我仍然收到一个错误：

google-cloud-platform bioinformatics vcf-variant-call-format google-genomics bam

2018-11-06T14:09:35.973

0 投票

1 回答

90 浏览

google-cloud-platform - How to calculate the cost (bill) of a Google Cloud Genomics Pipeline

I'm using the Cromwell engine on Google Cloud, which submits pipeline run requests: https://cloud.google.com/genomics/reference/rest/v1alpha2/pipelines/run.

Once the pipelines have finished, I am then able to find the Google Cloud operations associated with each pipeline via the labels. However, I can't determine their cost. The Google Cloud billing logs only list the compute engine bills, but they don't show a connection between the compute engine instances and the genomics operations, so I can't work out how to calculate the cost.

How can I calculate the cost of a Google Cloud Genomics Pipeline

google-cloud-platform google-genomics

2018-11-21T01:22:28.837

问题标签 [google-genomics]

Reference