r - 将 DNAstringsSet 解构为普通字符串

Question

这来自一个名为 “VariantAnnotation”的 R 库及其依赖项“Biostrings”

我有一个 DNAstringsSetList，我想将其转换为普通列表或字符串向量。

library(VariantAnnotation)

fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")

vcf <- readVcf(fl, "hg19")

tempo <- rowRanges(vcf)$ALT  # Here is the DNAstringsSetList I mean.

print(tempo)

A DNAStringSet instance of length 10376
    width seq
[1]     1 G
[2]     1 T
[3]     1 A
[4]     1 T
[5]     1 T
...   ... ...
[10372]     1 G
[10373]     1 G
[10374]     1 G
[10375]     1 A
[10376]     1 C

tempo[[1]]
A DNAStringSet instance of length 1
width seq
[1]     1 G

但我不想要这种格式。我只想要碱基字符串，以便将它们作为列插入新数据框中。我要这个：

G
T
A
T
T

我用这个包方法完成了这个：

as.character(tempo@unlistData)

但是，它返回的行数比 tempo 多 10 行！这个结果和节奏的头尾是完全一样的，所以在中间的某个地方有10个额外的不应该形成的行（不是NAs）

score 2 · Accepted Answer

您可以调用as.characteraDNAString或 a DNAStringSet。

as.character(tempo[1 : 5])
# [1] "G" "T" "A" "T" "T"

score 0 · Accepted Answer

一个简单的循环解决了这个问题，使用了同一个库的 toString 函数：

ALT <-0
for (i in 1:nrow(vcf)){ ALT[i] <- toString(tempo[[i]]) }

但是，我不知道为什么 tempo@unlistData 检索到太多行。它不值得信赖。

r - 将 DNAstringsSet 解构为普通字符串

2 回答 2

Related

Reference