Performing Calculations on an Enterprise Database

Photo by Osman Köycü / Unsplash

What do you think of when you hear the term “Large Database”?

Did you conjure up the concept of an aggregation of static data for mining and searching? While this is correct, it doesn’t encompass all database types.

There are also databases full of scientific calculations and rules. These calculation-focused large databases pose a specific set of challenges when trying to achieve maximum throughput with multiple users.

Apex has engaged in several projects where we have been brought in to assist or evaluate procedural code. Our clients often sought help because the database did not meet the required or expected performance.

To learn a better way to conduct database development, check out my article on SQL with Tea.

The Price of Performance Test Procrastination

Most projects wait until the end of the development cycle to do heavy performance testing or stress testing of a system. This procrastination places a number of constraints on what remedies can be used to address the issues.

Late-stage testing guarantees that the time has passed for many key updates, such as:

  1. Changing the database design
  2. Using newer features
  3. In some cases, changing configuration parameters

With late-stage testing constraints in place, we are limited to isolated code changes or design changes. These changes must not affect data retrieval from the front-end.

We have consistently achieved optimal results after reviewing the procedural code and suggesting or making changes.

Database engines are as powerful and smart as we make and use them. They’ll perform well if you can provide the pertinent information or ask the right question.

Existing Query Examination

Before attacking any procedural code, look at the performance of the existing queries as written.

Two methods we use to examine performance of existing queries:

  1. In SQL Server, the use of the profiler can help identify poorly performing queries.
  2. Use the explain plan to see what the optimizer is doing with the queries. After reviewing the explain plan, we can look for ways to increase performance or decrease the data set size.

If you are not familiar with these tools, follow the links to familiarize yourself. You’ll be glad you did!

5 Small Changes for Big Performance Improvements

Below are the 5 top areas where small changes make a large difference. While performing your query plan analysis, keep these ideas in mind.

1. Limit the Data Reach

Stored procedures that do heavy processing may need to pull information from a lot of rows in the database before executing. There may even be the need to do interim calculations before a final calculation can be performed.

Make sure that you are not touching data that you are not going to need. Not using proper join statements or incomplete filtering (i.e. where clauses) will cause a query to return too much data, causing extra processing.

It is generally faster to issue two queries to the DB instead of one query and then exclude data in procedural code.

Key Callout:

With less data being returned, there is a lower chance of both locking and the dreaded deadlock.

2. Query Based Indexing

Many software developers know the value of an index on a database table, but don’t know how to set-up the indexes to get the maximum benefit from them.

Over-indexing is just as bad as under-indexing a table. To set up an index, review your queries to see how you are selecting your data and joining tables. Then use those fields in the proper order.

Three other indexing options are:

  1. Specifying uniqueness
  2. Sort order
  3. Creation of a covering index to eliminate the table access altogether

If you are not familiar with this concept or would like to read more about it, Microsoft has a good article about it here.

Key Callout:

Setting up an index is as easy as reviewing your queries to see how you are selecting your data and how you are joining tables.

3. Using Temporary Tables

Temporary tables can be a large performance boost for databases with a high quantity of users and large tables. Most databases have Global or Local temporary tables where the scope of the table differs.

Temporary tables act just like real tables but usually contain a limited and potentially a pre-modified set of data. Population of a temporary table can be done using a ‘Create Statement’ or using a ‘Select Into’ statement.

The temporary table will not have any indexes at first, but indexes can be added and specialized for how the temporary table will be used. This can eliminate the over-indexing mentioned above.

Local temporary tables are user specific which could help eliminate locking issues that might be seen if the real table was used.

Key Callout:

Make sure you have the needed space in your tempdb (SQL Server) to host the temporary tables.

4. Use of Common Table Expressions

The use of common table expressions (CTE’s) in a stored procedure can make the queries easier to write and read. CTE’s can also make debugging easier when evaluating the data in a data set.

There are many differences between CTE’s and temporary tables but we can save most of that for another post.

The top three differences between CTE’s and temporary tables:

  1. CTE’s are not materialized
  2. CTE’s cannot be indexed
  3. CTE’s are only good for the next query after they are defined

CTE’s can be nested and referenced multiple times in the primary query.

Key Callout:

Limit your CTE’s to the proper data set and only the columns that you need.

5. Dynamically Eliminate Steps

We often see stored procedures called that perform no processing due to lack of data. For example, think processing and totaling tax deductions when no deductions have been entered.

If there is a chance that the data does not exist, a quick check to see if it does can eliminate the overhead of calling a procedure. This overhead would also include any processing that might take place before a data collection query executes in the called procedures (i.e. variable, cursor, or CTE declaration, temp table creation, etc). The overhead of a data existence select could heavily outweigh the call to the procedure.

Key Callout:

If your calculations perform a lot of batch totaling into a parent table, a simple ‘isModified’ column set at the time of data entry can eliminate recalculating batch header information. This works only if the underlying detail records for the batch have not been modified or added to.

Other Performance Robbers

In addition to these top 5 performance robbers, many other reasons contribute to a lacking performance or database locking.

Each application differs, and has unique scaling and performance requirements.

The sooner you can identify that an issue exists, the sooner you can start to address them and more options will be available to you.

Test-driven development can help spot performance issues. Read more about it in my posting SQL with Tea.

Keith A Miller

Keith A Miller