I have a table of Prices: ProductID, Date, Price and Source. My application needs to load new prices but if the price already exists and it is from a "better" source, it should update existing record.
The orignal app loaded all data in SQL temp table and did a bulk insert to the target table, followed by a bulk update. Kinda simple and effecient even if too SQL-centric. The first version of my app used to check every ProductID+Date combination for existance using regular stateful NHibernate session, then load new prices using SQL Bulk load, and update each existing price using regular NHibernate session. Needless to say, not only it was slow but the gap between the read and the insert allowed new prices from other sources (my app is not the only source) to sneak into the table causing duplicate/ambigous records to appear. I needed a solution that would allow me to Insert-or-Update records fast, without resorting to temptables+SQL mess. (I have a habit of trying to minimize the amount of SQL code in my application but if nothing else worked, I would have to fall back to the old ways.)
Here's the outline of my new solution:
- Stateless NHibernate session (IStatelessSession) for extra efficiency. No need to spend CPU cycles keeping that first-level cache growing.
- Single explicit transaction wrapping entire load operation for all records. This reduces stress on SQL Server by not having it commit every record. If my volume becomes too high I can always limit the transaction size to say 10000 rows.
- Custom sql-insert script defined in the entity's mapping file that implements the read-update-or-insert logic . The idea is to be able to completely process one record without going back-and-forth between SQL Server and the .NET
- NHibernate batching enabled, in order to push the records hundreds at a time to the SQL Server.
Combined, these techniques are designed to make the whole operation less chatty and, in a way, achieve the effect of having the processing done on the SQL Server side.
Here's what I discovered while implementing the above:
- IStatelessSession does not support .BeginTransaction(IsolationLevel) overload. Instead, do session.Transaction.Begin(IsolationLevel.Serializable) which does exactly same thing (I looked at NH source code)
- You can further optimize the logic by not doing session.Get for each many-to-one-property of your entity. In my example, instead of doing price.Product = session.Get<Product>(ProductID) I can create a dummy Product object once and simply set its Id to a different value for each price. Stateless session does not care that the object is transient.
- The custom sql-insert script is sensitive to the order of parameters (they're simply represented by "?"s). Turn on NH debug-logging and look at NH log to see the exact order in which object's properties are mapped to SQL params. Do this before you implement sql-insert.
- The custom sql-insert script is sent to the server once for every inserted record. You may want to move it to an SP to minimize the traffic, and execute the SP from sql-insert. (Note that prepare-sql setting has no effect on batched statements)
- In order for NH batching to work, the entity's Identity must be assigned before the insert. Can't use generator=native (aka SQL Server IDENTITY).
- Since your Insert is now in fact Insert-Update-Or-Do-Nothing you want your entities Identity be its natural key, in my case ProductID + Date. That requires implementing Equals and GetHashCode but other than that it's real easy.
- NH batching requires check="rowcount" setting on the sql-insert. This means your script must return number of records affected, whether it actually inserted, updated or did nothing.
- NH batching uses global settings for its CommandTimeout, not the one defined in your SessionFactory. This means you must add hibernate-configuration section to your App.Config (or Web.Config).
This summary turned out to be more elaborate than I expected. No details for you then. You can figure it out, I have trust in you.
Tags: