ssis - Best practice to organize a 200+ tables import project

Question

This question is going to be a purely organizational question about SSIS project best practice for medium sized imports.

So I have source database which is continuously being enriched with new data. Then I have a staging database in which I sometimes load the data from the source database so I can work on a copy of the source database and migrate the current system. I am actually using a SSIS Visual Studio project to import this data.

My issue is that I realised the actual design of my project is not really optimal and now I would like to move this project to SQL Server so I can schedule the import instead of running manually the Visual Studio project. That means the actual project needs to be cleaned and optimized.

So basically, for each table, the process is simple: truncate table, extract from source and load into destination. And I have about 200 tables. Extractions cannot be parallelized as the source database only accepts one connection at a time. So how would you design such a project?

I read from Microsoft documentation that they recommend to use one Data Flow per package, but managing 200 different package seems quite impossible, especially that I will have to chain for scheduling import. On the other hand a single package with 200 Data Flows seems unamangeable too...

Edit 21/11:

The first apporach I wanted to use when starting this project was to extract my table automatically by iterating on a list of table names. This could have worked out well if my source and destination tables had all the same schema object names, but the source and destination database being from different vendor (BTrieve and Oracle) they also have different naming restrictions. For example BTrieve does not reserve names and allow more than 30 characters names, which Oracle does not. So that is how I ended up manually creating 200 data flows with a semi-automatic column mapping (most were automatic).

When generating the CREATE TABLE query for the destination database, I created a reusable C# library containing the methods to generate the new schema object names, just in case the methodology could automated. If there was any custom tool to generate the package that could use an external .NET library, then this might do the trick.

score 2 · Accepted Answer

您是否研究过BIDS Helper的 BIML（商业智能标记语言）作为包生成工具？我用它来创建多个包，它们都遵循相同的基本截断-提取-加载模式。如果您需要比 BIML 内置的更聪明的东西，可以使用BimlScript，它增加了将 C# 代码嵌入到处理中的能力。

根据您的问题描述，我相信您可以编写一个 BIML 文件并生成两百个单独的包。您可能可以使用它来生成一个包含两百个数据流任务的包，但我从未尝试过如此努力地推动 SSIS。

score 0 · Accepted Answer

You can basically create 10 child packages each having 20 data flow tasks and create a master package which triggers these child pkgs.Using parent to child configuration create a single XML file configuration file .Define the precedence constraint for executing the package in serial fashion in master pkg. In this way maintainability will be better compared to having 200 packages or single package with 200 data flow tasks.

score 0 · Accepted Answer

0

以下链接可能对您有用。

用于暂存过程的单个 SSIS 包

希望这可以帮助！

于 2012-11-20T09:54:32.180 回答

ssis - Best practice to organize a 200+ tables import project

3 回答 3

Related

Reference