我想要一些关于如何为基因本体(.obo)解析这个文件的帮助/建议
我正在努力在 D3 中创建一个可视化,并且需要以 JSON 格式创建一个“树”文件 -
{
"name": "flare",
"description": "flare",
"children": [
{
"name": "analytic",
"description": "analytics",
"children": [
{
"name": "cluster",
"description": "cluster",
"children": [
{"name": "Agglomer", "description": "AgglomerativeCluster", "size": 3938},
{"name": "Communit", "description": "CommunityStructure", "size": 3812},
{"name": "Hierarch", "description": "HierarchicalCluster", "size": 6714},
{"name": "MergeEdg", "description": "MergeEdge", "size": 743}
]
}, etc..
这种格式似乎很容易在 python 的字典中复制,每个条目有 3 个字段:名称、描述和 children[]。
我的问题实际上是如何提取数据。上面链接的文件的“对象”结构如下:
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
我需要 id、is_a 和 name 字段的地方。我曾尝试使用 python 来解析这个,但我似乎无法找到一种方法来定位每个对象。
有任何想法吗?