ssis - load TableC from TableB based on value of TableA in SSDT/SSIS

Question

I have 3 tables-

      --server 1
      CREATE TABLE TableA (GROUP_ID INT
                          ,STATUS VARCHAR(10))
      --server 2
      CREATE TABLE TableB (GROUP_ID INT 
                          ,NAME VARCHAR(10)
                          ,STATE VARCHAR(50)
                          ,COMPANY VARCHAR(50))
      -- server 1
      CREATE TABLE TableC (GROUP_ID INT
                          ,NAME VARCHAR(10)
                          ,STATE VARCHAR(50)
                          ,COMPANY VARCHAR(50))

Sample data

      INSERT INTO TableA (1, 'READY'),(2,'NOT READY),(3,'READY'),(4,'NOT READY')
      INSERT INTO TableB (1, Mike, 'NY', 'aaa'), (1, Rick, 'OK','bbb'), (2, Smith, 'TX','ccc'), (3, Nancy, 'MN','bbb'), (4, Roger, 'CA','aaa')

I am trying to build a SSDT(SSIS 2012) package to load the data in TableC from TableB for only those GROUP_ID which has STATUS= 'READY' in TableA and change STATUS ='LOADED'

I need to accomplish this by using a project level parameters or variables for TableA-GROUP_ID and STATUS because i will be doing this for about 60 tables and those values might change.

I must build a SSIS package, it is a requirement. using linked server is not preferred. unless its impossible to achieve through SSIS.

Any help would be appreciated.

score 1 · Accepted Answer

As the two tables are on separate servers, you could create a Data Flow with two Sources. You'll need to set up Connection Managers to both databases, then point one Source to the database holding TableA, and the other to the database holding TableB. Once this is done, you can join the two with a Merge Join, and then discard the records which don't have the value or values you want using a Conditional Split. It would ultimately look a bit like this:

Example data flow

First you'll need to set up the Sources as already discussed. However, since you want to use a Merge Join, you'll need to sort the output from the sources. You can do this in SSIS with a Sort transform, but you're better off just building an ORDER BY clause into your SELECT statement that you have in the source, and then telling SSIS that the output is sorted:

Right click on each Source, and select Show Advanced Editor.
Go to the Input and Output Properties tab.
Select OLE DB Source Output, then set IsSorted on the right-hand side to True.
Expand OLE DB Source Output, then expand Output Columns.
Click on the column you're sorting by (presumably GROUP_ID), and set SourceKeyPosition to 1.

Here's an image of that last bit in case you're at all lost - it can be a little fiddly getting around the properties in SSIS if you're not used to it:

SortKeyPosition picture

Since the STATUS value you want to change might load, you could set this up in the Project Parameters. Just go to that page from the Solution Explorer, and click to add a new parameter. You should end up with something like this:

enter image description here

As you're using 2012, you'll be able to configure this value after release in SSMS, avoiding the need to re-work this or create a configuration file.

When you set up the Conditional Split, you have a couple of options. If you might want to send rows with other STATUS values into other tables in future, then you should look for cases where the STATUS has a value of READY, but if you only care about the READY rows you can also do it the way I have here:

Conditional Split setup

When you drag the output of the Conditional Split to the destination, it'll ask which output you want to use. If you've set it up the same way I have, use Conditional Split Default Output, and it'll pass through all rows which don't meet one of the conditions you've stated.

If you need to update the values of the data while you're loading it, it depends where you want the updates to show. If you want to leave TableA and TableB alone, but change the value in TableC, then you could set up a Derived Column transform after the Conditional Split and before the Destination. You could then replace the value in the STATUS column with one you set (this can be parameterised, as above):

Derived Column with replace

If you want to update the STATUS field in TableA, then you should go back to the Control Flow, and after the Data Flow you've been working on, add an Execute SQL Task which is connected to the database holding TableA, and which runs a simple SQL update statement.

If this is going to be running outside of business hours and you know there won't be any new rows during this time, you can simply update all rows which currently have a STATUS of READY. If you need to update the rows more precisely because the situation might be continuing to change while you work, then you might need to re-think this - one option would be to grab all of the GROUP_ID values you want to update at the beginning, store that in a variable, and use the variable as a parameter in the Source select statements and Execute SQL Task update statement. You could also choose to work in a loop instead, but that would obviously be a lot slower than operating on the rows in bulk.

This part is from my original answer before the question was updated, but I'll leave it here in case it's useful to anyone else:

If the tables (A and B) are in the same database, instead of the Conditional Split you could set the source up to be a select statement which joins Table A to Table B, and has a WHERE clause that only selects the rows with a STATUS of READY:

select GROUP_ID, NAME, STATE, COMPANY
  from TableA a
inner join TableB b
    on a.GROUP_ID = b.GROUP_ID
 where a.STATUS = 'READY';

ssis - load TableC from TableB based on value of TableA in SSDT/SSIS

1 回答 1

Related

Reference