insert or replace: identify

Showing posts with label identify. Show all posts

Friday, March 9, 2012

Insert row in table with Identity field, and get new Identity back

I want to insert a new record into a table with an Identity field and return the new Identify field value back to the data stream (for later insertion as a foreign key in another table).

What is the most direct way to do this in SSIS?

TIA,

barkingdog

P.S. Or should I pass the identity value back in a variable and not make it part of the data stream?

If you need to do this for every row, then using identities with an SSIS is not a good idea. You cannot get the new identity back until the row is committed, but that would mean committing one row at a time in SSIS. Even then, you only get back the last identity - which may not be what you expect if a parallel process has added a row between you commiting your row and asking for it's identity.

The best way is to use a script to generate a key in the data flow. In that way, you will know what the key value for each row is in advance and it can be inserted (thanks to multicast) into different tables at once, guaranteeing referential integrity.

Donald

|||

Donald,

When you wrote "You cannot get the new identity back until the row is committed, but that would mean committing one row at a time in SSIS."

When I run a normal SSIS package that reads from a file and writse to a database isn't one row being committed at a time? Or does SSIS save as many rows as possible in, say a memory buffer, and then commit then all at once?

TIA,

barkindog

|||

Strictly speaking it is the provider that handles commits, not SSIS.

The Fastload option on the OLEDB provider allows you to set batch sizes from 1 to "the entire data load in one batch."

If you do not use Fast Load, then one row at a time is sent.

The OLEDB command component also processes one row at a time.

However, in all these cases, the problem is not the performance of handling one row at a time (although that is a real factor) - it is also that you cannot get back the identity for the row you have just committed.

The pattern in SQL Server (and in most rdbms's) is that you can get the last identity issued. It is tempting to think that having just posted a row, the last identity issued must be for that row. Many a design has foundered on that assumption, as just the teensiest smidgin of parallelism soon throws that process out of synchronization.

I much prefer issuing keys in advance in the ETL process - you can do so much with them, with great performance and guaranteed integrity.

Donald

|||

Regarding "Many a design has foundered on that assumption, as just the teensiest smidgin of parallelism soon throws that process out of synchronization."

1. If my job is the only one updating the table with the Identity column , and I'm not running multiple copies of my job, then I presume that parallellism can't happen to me. Or does SSIS do things "in the background" that could cause a smidgin of parallelism, even for my particular case?

2. Later on I will need to re-run my job with new data. Then I have to read the current value of the Identity from the table, add 1 to it, and begin with that value. Your argument about parallelism makes me wonder if the only way to accurately read the identity value from a table is to make sure no other app updates that table. (That sure puts a dent in the possibility of scaling out horizontally with servers.)

TIA,

barkingdog

|||

1. The OLEDB command destination may send a command for the second row before the first has completed. Our buffer architecture is designed to maximise the potential for pipeline parallelism.

2. The only way to guarantee that the last identity you read is the last one you inserted, is to be able to guarantee that no process has written to the table since your process.

We do have a design pattern for highly parallel key generation that may (but may not) be in the next version . Either way there will be a paper on this at some point.

The best strategy is to know your keys in advance - by generating them in your data integration process. That way, you have complete control.

Donald

Sunday, February 19, 2012

INSERT performance

Hi guys,
Could someone please help me identify the botleneck in my case.
I have a table that includes one varchar(100) field and about 10 integer,
date, and boolean fields. The varchar column and two date columns are
indexed. The table includes about 1M of rows. No triggers, one foreign key
to a table with about 500 records.
When I insert about 50 records calling a stored procedure in a loop it
takes 30-60 second. During this time SQL takes 100% of CPU (AMD Athlone
2800) and makes lots of HDD reading requests. The database itself is quite
small and I guess is completely located in RAM, however I am not sure what
it reads from HDD in this case.
Is this performance normal? I guess I can improve it writing to those 50
records to a temporary table and then copying data to the main table. Am I
right? Are there any other ways to improve the insert performance?
Thanks,
Mike
Message posted via http://www.sqlmonster.com
Mike via SQLMonster.com wrote:
> Hi guys,
> Could someone please help me identify the botleneck in my case.
> I have a table that includes one varchar(100) field and about 10
> integer, date, and boolean fields. The varchar column and two date
> columns are indexed. The table includes about 1M of rows. No
> triggers, one foreign key to a table with about 500 records.
> When I insert about 50 records calling a stored procedure in a loop it
> takes 30-60 second. During this time SQL takes 100% of CPU (AMD
> Athlone 2800) and makes lots of HDD reading requests. The database
> itself is quite small and I guess is completely located in RAM,
> however I am not sure what it reads from HDD in this case.
> Is this performance normal? I guess I can improve it writing to those
> 50 records to a temporary table and then copying data to the main
> table. Am I right? Are there any other ways to improve the insert
> performance?
> Thanks,
> Mike
Take a look at the execution plan for the insert and see what's going
on. You can see a graphical plan in Query Analyzer, or use SET
STATISTICS PROFILE ON/OFF before and after the SP call.
David Gugick
Imceda Software
www.imceda.com
|||It seems like DBCC DBREINDEX solved the problem.
Message posted via http://www.sqlmonster.com

INSERT performance

Hi guys,
Could someone please help me identify the botleneck in my case.
I have a table that includes one varchar(100) field and about 10 integer,
date, and boolean fields. The varchar column and two date columns are
indexed. The table includes about 1M of rows. No triggers, one foreign key
to a table with about 500 records.
When I insert about 50 records calling a stored procedure in a loop it
takes 30-60 second. During this time SQL takes 100% of CPU (AMD Athlone
2800) and makes lots of HDD reading requests. The database itself is quite
small and I guess is completely located in RAM, however I am not sure what
it reads from HDD in this case.
Is this performance normal? I guess I can improve it writing to those 50
records to a temporary table and then copying data to the main table. Am I
right? Are there any other ways to improve the insert performance?
Thanks,
Mike
Message posted via http://www.droptable.comMike via droptable.com wrote:
> Hi guys,
> Could someone please help me identify the botleneck in my case.
> I have a table that includes one varchar(100) field and about 10
> integer, date, and boolean fields. The varchar column and two date
> columns are indexed. The table includes about 1M of rows. No
> triggers, one foreign key to a table with about 500 records.
> When I insert about 50 records calling a stored procedure in a loop it
> takes 30-60 second. During this time SQL takes 100% of CPU (AMD
> Athlone 2800) and makes lots of HDD reading requests. The database
> itself is quite small and I guess is completely located in RAM,
> however I am not sure what it reads from HDD in this case.
> Is this performance normal? I guess I can improve it writing to those
> 50 records to a temporary table and then copying data to the main
> table. Am I right? Are there any other ways to improve the insert
> performance?
> Thanks,
> Mike
Take a look at the execution plan for the insert and see what's going
on. You can see a graphical plan in Query Analyzer, or use SET
STATISTICS PROFILE ON/OFF before and after the SP call.
David Gugick
Imceda Software
www.imceda.com|||It seems like DBCC DBREINDEX solved the problem.
Message posted via http://www.droptable.com

INSERT performance

Hi guys,
Could someone please help me identify the botleneck in my case.
I have a table that includes one varchar(100) field and about 10 integer,
date, and boolean fields. The varchar column and two date columns are
indexed. The table includes about 1M of rows. No triggers, one foreign key
to a table with about 500 records.
When I insert about 50 records calling a stored procedure in a loop it
takes 30-60 second. During this time SQL takes 100% of CPU (AMD Athlone
2800) and makes lots of HDD reading requests. The database itself is quite
small and I guess is completely located in RAM, however I am not sure what
it reads from HDD in this case.
Is this performance normal? I guess I can improve it writing to those 50
records to a temporary table and then copying data to the main table. Am I
right? Are there any other ways to improve the insert performance?
Thanks,
Mike
--
Message posted via http://www.sqlmonster.comMike via SQLMonster.com wrote:
> Hi guys,
> Could someone please help me identify the botleneck in my case.
> I have a table that includes one varchar(100) field and about 10
> integer, date, and boolean fields. The varchar column and two date
> columns are indexed. The table includes about 1M of rows. No
> triggers, one foreign key to a table with about 500 records.
> When I insert about 50 records calling a stored procedure in a loop it
> takes 30-60 second. During this time SQL takes 100% of CPU (AMD
> Athlone 2800) and makes lots of HDD reading requests. The database
> itself is quite small and I guess is completely located in RAM,
> however I am not sure what it reads from HDD in this case.
> Is this performance normal? I guess I can improve it writing to those
> 50 records to a temporary table and then copying data to the main
> table. Am I right? Are there any other ways to improve the insert
> performance?
> Thanks,
> Mike
Take a look at the execution plan for the insert and see what's going
on. You can see a graphical plan in Query Analyzer, or use SET
STATISTICS PROFILE ON/OFF before and after the SP call.
--
David Gugick
Imceda Software
www.imceda.com|||It seems like DBCC DBREINDEX solved the problem.
--
Message posted via http://www.sqlmonster.com

Friday, March 9, 2012

Insert row in table with Identity field, and get new Identity back

Sunday, February 19, 2012

INSERT performance

INSERT performance

INSERT performance

insert or replace

Blog Archive

About Me