sql-server - 查询使用内存表和/或聚集列存储索引返回错误数据

Question

这天我有一个非常奇怪的行为。我正在尝试使用 SQL Server (2017) 内存表和列存储索引。但是在创建了一个非常小的数据库（只有 2 个表），里面的数据非常少之后，我注意到我的一个查询返回了错误的数据。

一些背景：

第一个表 [Attribute]包含一些分层数据（Id、ParentId）。该表有一个触发器，它计算嵌套集模型的一些列。

基于此，我创建了一个视图 [vw_Attribute_Hierarchy]，它返回所有属性及其子节点的列表。我想用它来总结每个分层节点的数据（因此包括子值）。

第二个表 [FactImpacts] 仅包含一些要在列 [PlanValue] 和 [ActualValue] 上汇总的数据。

我试图创建一个仅包含所需列/数据的架构。但似乎查询结果取决于给定的数据集。

这是目前我最小的脚本（请在 SQL CMD 模式下执行）来重现我的问题，它应该返回两个结果，每行包含三行，但我只得到一个用于第一个查询和三个用于第二个查询。

用于创建数据库和一些内容的脚本：

SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;

GO
:setvar DatabaseName "NEW-DB-823474692347"
:setvar DefaultFilePrefix "NEW-DB-823474692347"
:setvar DefaultDataPath "C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\"
:setvar DefaultLogPath "C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\"

GO
:on error exit
GO
/*
Detect SQLCMD mode and disable script execution if SQLCMD mode is not supported.
To re-enable the script after enabling SQLCMD mode, execute the following:
SET NOEXEC OFF; 
*/
:setvar __IsSqlCmdEnabled "True"
GO
IF N'$(__IsSqlCmdEnabled)' NOT LIKE N'True'
    BEGIN
        PRINT N'SQLCMD mode must be enabled to successfully execute this script.';
        SET NOEXEC ON;
    END


GO
USE [master];


GO

IF (DB_ID(N'$(DatabaseName)') IS NOT NULL) 
BEGIN
    ALTER DATABASE [$(DatabaseName)]
    SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [$(DatabaseName)];
END

GO
PRINT N'Creating $(DatabaseName)...'
GO
CREATE DATABASE [$(DatabaseName)]
    ON 
    PRIMARY(NAME = [$(DatabaseName)], FILENAME = N'$(DefaultDataPath)$(DefaultFilePrefix)_Primary.mdf')
    LOG ON (NAME = [$(DatabaseName)_log], FILENAME = N'$(DefaultLogPath)$(DefaultFilePrefix)_Primary.ldf') COLLATE Latin1_General_CI_AS
GO
IF EXISTS (SELECT 1
           FROM   [master].[dbo].[sysdatabases]
           WHERE  [name] = N'$(DatabaseName)')
    BEGIN
        ALTER DATABASE [$(DatabaseName)]
            SET AUTO_CLOSE OFF 
            WITH ROLLBACK IMMEDIATE;
    END


GO
PRINT N'Creating [FG_MemoryOptimizedData]...';


GO
ALTER DATABASE [$(DatabaseName)]
    ADD FILEGROUP [FG_MemoryOptimizedData] CONTAINS MEMORY_OPTIMIZED_DATA;


GO
ALTER DATABASE [$(DatabaseName)]
    ADD FILE (NAME = [FG_MemoryOptimizedData_76BAE8A0], FILENAME = N'$(DefaultDataPath)$(DefaultFilePrefix)_FG_MemoryOptimizedData_76BAE8A0.mdf') TO FILEGROUP [FG_MemoryOptimizedData];


GO
USE [$(DatabaseName)];

GO
PRINT N'Creating [dbo].[Attribute]...';


GO
CREATE TABLE [dbo].[Attribute] (
    [TenantId]          UNIQUEIDENTIFIER NOT NULL,
    [AttributeId]       UNIQUEIDENTIFIER NOT NULL,
    [AttributeParentId] UNIQUEIDENTIFIER NULL,
    [AttributeName]     NVARCHAR (128)   NOT NULL,
    [LeftExtent]        INT              NOT NULL,
    [RightExtent]       INT              NOT NULL,
    [HierarchyLevel]    INT              NOT NULL,
    [NodeCount]         INT              NOT NULL,
    CONSTRAINT [PK_Attribute] PRIMARY KEY NONCLUSTERED ([AttributeId] ASC)
);


GO
PRINT N'Creating [dbo].[Attribute].[IX_Attribute_Rn]...';


GO
CREATE UNIQUE CLUSTERED INDEX [IX_Attribute_Rn]
    ON [dbo].[Attribute]([TenantId] ASC, [AttributeName] ASC);


GO
PRINT N'Creating [dbo].[Attribute].[IX_Attribute_Name]...';


GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_Attribute_Name]
    ON [dbo].[Attribute]([AttributeName] ASC);


GO
PRINT N'Creating [dbo].[FactImpact]...';


GO
CREATE TABLE [dbo].[FactImpact] (
    [Rn]          BIGINT           IDENTITY (1, 1) NOT NULL,
    [TenantId]    UNIQUEIDENTIFIER CONSTRAINT [DF_FactImpact_TenantId] DEFAULT '10000000-0000-0000-0000-000000000000' NOT NULL,
    [ImpactId]    UNIQUEIDENTIFIER CONSTRAINT [DF_FactImpact_ImpactId] DEFAULT NEWID() NOT NULL,
    [AttributeId] UNIQUEIDENTIFIER NOT NULL,
    [Date]        DATE             NOT NULL,
    [PlanValue]   DECIMAL (38, 10) NULL,
    [ActualValue] DECIMAL (38, 10) NULL,
    CONSTRAINT [PK_FactImpact_Rn] PRIMARY KEY NONCLUSTERED HASH ([Rn]) WITH (BUCKET_COUNT = 100000),
    INDEX [COLIX_FactImpact] CLUSTERED COLUMNSTORE
)
WITH (MEMORY_OPTIMIZED = ON);


GO
PRINT N'Creating [dbo].[DF_Attribute_TenantId]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_TenantId] DEFAULT '10000000-0000-0000-0000-000000000000' FOR [TenantId];


GO
PRINT N'Creating [dbo].[DF_Attribute_Id]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_Id] DEFAULT NEWSEQUENTIALID() FOR [AttributeId];


GO
PRINT N'Creating [dbo].[DF_Attribute_LeftExtend]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_LeftExtend] DEFAULT 0 FOR [LeftExtent];


GO
PRINT N'Creating [dbo].[DF_Attribute_RightExtend]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_RightExtend] DEFAULT 0 FOR [RightExtent];


GO
PRINT N'Creating [dbo].[DF_Attribute_HierarchyLevel]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_HierarchyLevel] DEFAULT 0 FOR [HierarchyLevel];


GO
PRINT N'Creating [dbo].[DF_Attribute_ChildCount]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [DF_Attribute_ChildCount] DEFAULT 0 FOR [NodeCount];


GO
PRINT N'Creating [dbo].[FK_Attribute_ParentId]...';


GO
ALTER TABLE [dbo].[Attribute]
    ADD CONSTRAINT [FK_Attribute_ParentId] FOREIGN KEY ([AttributeParentId]) REFERENCES [dbo].[Attribute] ([AttributeId]);


GO
PRINT N'Creating [dbo].[vw_Attribute_Hierarchy]...';


GO
CREATE VIEW [dbo].[vw_Attribute_Hierarchy]
AS

  SELECT
        [Root].[TenantId],

        [Root].[AttributeId] AS [AttributeRootId],
        [Root].[AttributeName] AS [AttributeRootName],

        [Node].[AttributeId] AS [AttributeNodeId],
        [Node].[AttributeName] AS [AttributeNodeName]

    FROM [dbo].[Attribute] [Root]
        JOIN [dbo].[Attribute] [Node] ON [Node].[LeftExtent] BETWEEN [Root].[LeftExtent] AND [Root].[RightExtent]
GO
PRINT N'Creating [dbo].[iusp_CalculateNestedSets]...';


GO
CREATE PROCEDURE [dbo].[iusp_CalculateNestedSets]
(
    @SchemaName SYSNAME,
    @TableName SYSNAME,
    @GroupColumnName SYSNAME,
    @IdColumnName SYSNAME = 'Id',
    @ParentIdColumnName SYSNAME = 'ParentId'
)
AS
    -- http://www.sqlservercentral.com/articles/Hierarchy/94040/

    SET NOCOUNT ON


    CREATE TABLE [#HTally]([N] INT NOT NULL PRIMARY KEY)

    INSERT INTO [#HTally]([N])
        SELECT TOP 500 --(16 * 500 = VARBINARY(84000) in length)
                N = ISNULL(CAST(
                        (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)*16+1
                    AS INT),0)
            FROM sys.all_columns ac1
                CROSS JOIN sys.all_columns ac2


    --===== Build the new table on-the-fly including some place holders
    ;WITH cteBuildPath AS
    (
        --=== This is the "anchor" part of the recursive CTE.
        -- The only thing it does is load the Root Node.
        SELECT
            [ANCHOR].[TenantId],
            [ANCHOR].[AttributeId],
            [ANCHOR].[AttributeParentId],
            HLevel   = 1,
            [SortPath] =  CAST(
                            CAST([ANCHOR].[AttributeId] AS BINARY(16))
                        AS VARBINARY(8000)) --Up to 1000 levels deep.
      FROM [dbo].[Attribute] AS anchor
      WHERE [AttributeParentId] IS NULL --Only the Root Node has a NULL [ParentId]
      UNION ALL
     --==== This is the "recursive" part of the CTE that adds 1 for each level
         -- and concatenates each level of [Id]'s to the [SortPath] column.
     SELECT
            [RECUR].[TenantId],
            [RECUR].[AttributeId],
            [RECUR].[AttributeParentId],
            HLevel   =  cte.HLevel + 1,
            [SortPath] =  CAST( --This does the concatenation to build [SortPath]
                            cte.[SortPath] + CAST([RECUR].[AttributeId] AS BINARY(16))
                        AS VARBINARY(8000))
        FROM [dbo].[Attribute] AS recur WITH (TABLOCK)
            INNER JOIN cteBuildPath AS cte ON cte.[AttributeId] = [RECUR].[AttributeParentId]
    )
        --=== This final INSERT/SELECT creates the Node # in the same order as a
        -- push-stack would. It also creates the final table with some
        -- "reserved" columns on the fly. We'll leave the [SortPath] column in
        -- place because we're still going to need it later.
        -- The ISNULLs make NOT NULL columns
    SELECT
            [AttributeId] = ISNULL([SORTED].[AttributeId], '00000000-0000-0000-0000-000000000000'),
            [SORTED].[AttributeParentId],
            [HLevel]        = ISNULL([SORTED].HLevel,0),
            [LeftExtent]    = ISNULL(CAST(0 AS INT),0), --Place holder
            [RightExtent]   = ISNULL(CAST(0 AS INT),0), --Place holder
            [NodeNumber]    = ROW_NUMBER() OVER (PARTITION BY [SORTED].[TenantId] ORDER BY [SORTED].[SortPath]),
            [NodeCount]     = ISNULL(CAST(0 AS INT),0), --Place holder
            [SortPath]      = ISNULL([SORTED].[SortPath],[SORTED].[SortPath])
        INTO [#Hierarchy]
        FROM cteBuildPath AS sorted
        OPTION (MAXRECURSION 500) --Change this IF necessary
    ;


    --===== Declare a working variable to hold the result of the calculation
    -- of the LeftExtent so that it may be easily used to create the
    -- RightExtent in a single scan of the final table.
    DECLARE @LeftExtent INT

    --===== Create the Nested Sets from the information available in the table
    -- and in the following CTE. This uses the proprietary form of UPDATE
    -- available in SQL Serrver for extra performance.
    ;WITH cteCountDownlines AS
    (
        --=== Count each occurance of [Id] in the sort path
        SELECT
                [AttributeId] = CAST(SUBSTRING(h.[SortPath],[T].N,16) AS UNIQUEIDENTIFIER),
                [NodeCount]  = COUNT(*) --Includes current node
            FROM [#Hierarchy] h, [#HTally] t
            WHERE [T].N BETWEEN 1 AND DATALENGTH([SortPath])
            GROUP BY SUBSTRING(h.[SortPath],[T].N, 16)
    ) --=== Update the NodeCount and calculate both Bowers
    UPDATE [SourceTable]
         SET @LeftExtent    = [SourceTable].[LeftExtent] = 2 * h.[NodeNumber] - h.[HLevel],
              [SourceTable].[RightExtent] = ([downline].[NodeCount] - 1) * 2 + @LeftExtent + 1,
              [SourceTable].[NodeCount] = [downline].[NodeCount],
              [SourceTable].[HierarchyLevel] = h.[HLevel]
        FROM [dbo].[Attribute] AS [SourceTable]
            JOIN [#Hierarchy] h ON h.[AttributeId] = [SourceTable].[AttributeId]
            JOIN cteCountDownlines downline ON h.[AttributeId] = [downline].[AttributeId]
        WHERE NOT (
                    [SourceTable].[LeftExtent] = 2 * h.[NodeNumber] - h.[HLevel]
                AND [SourceTable].[RightExtent] = ([downline].[NodeCount] - 1) * 2 + @LeftExtent + 1
                AND [SourceTable].[NodeCount] = [downline].[NodeCount]
                AND [SourceTable].[HierarchyLevel] = h.[HLevel]
            )
        ;


RETURN 0
GO
PRINT N'Creating [dbo].[tr_Attribute_IUD]...';


GO

CREATE TRIGGER [dbo].[tr_Attribute_IUD]
     ON [dbo].[Attribute]
     FOR INSERT, UPDATE, DELETE
     AS
     BEGIN
            SET NOCOUNT ON

            IF UPDATE([AttributeId]) OR UPDATE([AttributeParentId]) BEGIN
                EXEC [dbo].[iusp_CalculateNestedSets] @SchemaName = 'dbo',  @TableName = 'Attribute', @GroupColumnName = 'TenantId', @IdColumnName = 'AttributeId', @ParentIdColumnName = 'AttributeParentId'
            END
     END
GO
/*
Post-Deployment Script Template
--------------------------------------------------------------------------------------
 This file contains SQL statements that will be appended to the build script.
 Use SQLCMD syntax to include a file in the post-deployment script.
 Example:      :r .\myfile.sql
 Use SQLCMD syntax to reference a variable in the post-deployment script.
 Example:      :setvar TableName MyTable
               SELECT * FROM [$(TableName)]
--------------------------------------------------------------------------------------
*/
DECLARE @ItemsPerHierarchyLevel INT = 2
DECLARE @HierarchyLevels INT = 3

-- ATTRIBUTE DATA
;WITH [Source]
AS
(
    SELECT TOP (@ItemsPerHierarchyLevel)
            ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS [RN]
        FROM [sys].[syscolumns]
),
[CTE]
AS
(
    SELECT
            NEWID() AS [AttributeId],
            CAST(NULL AS UNIQUEIDENTIFIER) AS [AttributeParentId],
            CAST('Attribute-' + CAST([Source].[RN] AS NVARCHAR(128)) AS NVARCHAR(128)) AS [AttributeName],
            1 AS [Level]
        FROM [Source]
    UNION ALL
    SELECT
            NEWID() AS [AttributeId],
            [CTE].[AttributeId] AS [AttributeParentId],
            CAST([CTE].[AttributeName] + '.' + CAST([Source].[RN] AS NVARCHAR(128)) AS NVARCHAR(128)) AS [AttributeName],
            [CTE].[Level] + 1 AS [Level]
        FROM [Source]
            JOIN [CTE] ON [CTE].[Level] < @HierarchyLevels
)
INSERT INTO [dbo].[Attribute]([AttributeId], [AttributeParentId], [AttributeName])
    SELECT
            [CTE].[AttributeId],
            [CTE].[AttributeParentId],
            [CTE].[AttributeName]
        FROM [CTE]
        ORDER BY [AttributeName]
    ;


-- IMPACT DATA
;WITH [RN]
AS
(
    SELECT TOP (24)
            ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS [RN]
        FROM [sys].[syscolumns]
),
[Date]
AS
(
    SELECT DATEADD(MONTH, [RN].[RN] - 1, DATEFROMPARTS(2019, 1, 1)) AS [Date]
        FROM [RN]
)
INSERT INTO [dbo].[FactImpact]([AttributeId], [Date], [PlanValue], [ActualValue])
SELECT
        [Attribute].[AttributeId],
        [Date].[Date],
        1 AS [PlanValue],
        1 AS [ActualValue]
    FROM [dbo].[Attribute],
        [dbo].[Attribute] [Attribute2], -- ONLY TO DUPLICATE DATA
        [dbo].[Attribute] [Attribute3], -- ONLY TO DUPLICATE DATA
        [Date]
;

GO

PRINT N'Update complete.';


GO

USE [NEW-DB-823474692347]
GO

以下两个查询应始终返回相同的结果：

USE [NEW-DB-823474692347]
GO

SELECT 
        [vw_Attribute_Hierarchy].[AttributeRootId],
        [vw_Attribute_Hierarchy].[AttributeRootName],
        SUM([FactImpact].[PlanValue]) AS [SumFactImpacts]
    FROM 

        [dbo].[FactImpact] INNER JOIN [vw_Attribute_Hierarchy] [vw_Attribute_Hierarchy]  ON [vw_Attribute_Hierarchy].[AttributeNodeId] = [FactImpact].[AttributeId]

    WHERE [vw_Attribute_Hierarchy].[AttributeNodeName] = 'Attribute-1.1.1'
    GROUP BY [vw_Attribute_Hierarchy].[AttributeRootId]
        , [vw_Attribute_Hierarchy].[AttributeRootName]

GO


SELECT 
        [vw_Attribute_Hierarchy].[AttributeRootId],
        [vw_Attribute_Hierarchy].[AttributeRootName],
        SUM([FactImpact].[PlanValue]) AS [SumFactImpacts]
    FROM 

        -- SWITCH LEFT AND RIGHT TABLE ONLY
        [vw_Attribute_Hierarchy] [vw_Attribute_Hierarchy] INNER JOIN [dbo].[FactImpact]  ON [vw_Attribute_Hierarchy].[AttributeNodeId] = [FactImpact].[AttributeId]

    WHERE [vw_Attribute_Hierarchy].[AttributeNodeName] = 'Attribute-1.1.1'
    GROUP BY [vw_Attribute_Hierarchy].[AttributeRootId]
        , [vw_Attribute_Hierarchy].[AttributeRootName]


GO

奇怪的是，如果您进行以下更改之一，两个查询总是返回相同的正确数据：

1) 我为最后两个 SELECT 语句切换了 INNER JOIN 中的表。如果 [FactImpact] 在第二位，则结果是正确的。

2) 删除 [FactImpacts] 上的 In-Memory 选项

3) 删除 [FactImpacts] 上的聚集列存储索引

4）减少或增加[FactImpacts]中的数据量

DELETE [dbo].[FactImpact] WHERE [AttributeId] IN (SELECT [AttributeId] FROM [Attribute] WHERE [Attribute].[AttributeName] LIKE 'Attribute-2%')

5）在表[属性]上再添加一个索引

CREATE INDEX [IX_Attribute_Nested_Set] ON [dbo].[Attribute] ([LeftExtent], [RightExtent])

我希望你能听从我的解释！

我能够在 SQL Server 2016 / 2017 中本地重现这种奇怪的行为。

sql-server - 查询使用内存表和/或聚集列存储索引返回错误数据

0 回答 0

Related

Reference