0

我有从 NIH 下载的 xml 文件的这一部分

<PubmedArticle>
  <MedlineCitation Status="In-Data-Review" Owner="NLM">
    <PMID Version="1">31202264</PMID>   
    <Article PubModel="Electronic"> 
      <AuthorList CompleteYN="Y">
        <Author ValidYN="Y">
          <LastName>Jg</LastName>
          <ForeName>Zg</ForeName>
          <Initials>Z</Initials>
          <AffiliationInfo>
            <Affiliation>State Key Laboratory of ...</Affiliation>
          </AffiliationInfo>
          <AffiliationInfo>
            <Affiliation>College of Physics, ...</Affiliation>
          </AffiliationInfo>
        </Author>
        <Author ValidYN="Y">
          <LastName>Tn</LastName>
          <ForeName>L</ForeName>
          <Initials>L</Initials>
          <AffiliationInfo>
            <Affiliation>State Key Laboratory of ...</Affiliation>
          </AffiliationInfo>
        </Author>      
      </AuthorList>    
    </Article>   
  </MedlineCitation> 
</PubmedArticle>

如果有多个从属节点,我需要合并。

在存储过程中,我传入了 PMID 并且我一直在使用

   Affiliation = COALESCE(nref.value('AffiliationInfo[1]/Affiliation[1]','varchar(max)'),
            nref.value('Affiliation[1]','varchar(max)')),
from [Publication.PubMed.AllXML] cross apply x.nodes('//AuthorList/Author') as R(nref)
        where pmid = @pmid

哪个可以很好地提取第一个元素,但我想将每个作者的每个从属关系组合在一行中,例如

  affiliation = "State Key Laboratory of ... +';' + College of Physics, ...

要不就

  affiliation = "State Key Laboratory of ..."

如果只有一个。

我努力了

STUFF((SELECT ';' + R.nref.value('.', 'NVARCHAR(MAX)')
               FROM   x.nodes('./AffiliationInfo/Affiliation')  R(nref)
               FOR XML PATH(''),TYPE).value('.', 'NVARCHAR(MAX)'),
              1, 1, '')
                        FROM  [Publication.PubMed.Author]  au
                JOIN  [Publication.PubMed.AllXML] a ON a.pmid = au.pmid     
                     cross apply x.nodes('//AuthorList/Author') as R(nref)
                WHERE au.pmid = 31202264

但我不确定我的参考资料是否正确。我为每个作者得到一个空值。

仅供参考- [Publication.PubMed.AllXML] 是存储 xml 文件的位置

[Publication.PubMed.Author] 是将数据存储在数据库中的位置。

感谢您的帮助。

4

1 回答 1

1

不幸的是,SQL Server 仍然不支持 string-join() XPath 2.0 函数。因此,使用 CTE 进行救援的两步过程。

SQL

-- DDL and data population, start
DECLARE @tbl TABLE (ID INT IDENTITY(1,1) PRIMARY KEY, xmlData XML);
INSERT INTO @tbl(xmlDATA)
VALUES
(N'<PubmedArticle>
    <MedlineCitation Status="In-Data-Review" Owner="NLM">
        <PMID Version="1">31202264</PMID>
        <Article PubModel="Electronic">
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Jg</LastName>
                    <ForeName>Zg</ForeName>
                    <Initials>Z</Initials>
                    <AffiliationInfo>
                        <Affiliation>State Key Laboratory of ...</Affiliation>
                    </AffiliationInfo>
                    <AffiliationInfo>
                        <Affiliation>College of Physics, ...</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Tn</LastName>
                    <ForeName>L</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>State Key Laboratory of ...</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
        </Article>
    </MedlineCitation>
</PubmedArticle>');
-- DDL and data population, end

;WITH rs AS
(
SELECT col.value('(LastName)[1]','VARCHAR(30)') AS LastName
    ,col.query('AffiliationInfo/Affiliation') AS affiliationXML
FROM @tbl AS tbl
      CROSS APPLY tbl.[xmlData].nodes('/PubmedArticle/MedlineCitation/Article/AuthorList/Author') tab(col)
)
SELECT LastName
    , SUBSTRING((SELECT ';' + col.value('.', 'VARCHAR(1024)')
       FROM rs.affiliationXML.nodes('Affiliation') tab(col)
       FOR XML PATh('')), 2, 1024) AS Affiliation
FROM rs;
于 2019-08-08T14:48:50.583 回答