This is a story about trying to keep the head above water in the middle of the ocean for a long time, just waiting for that ship to come by and pick you up. Or, in other words, it’s about what we in Confirmit have been struggling with while waiting for the long-missed feature of columnstore indexes in Microsoft SQL Server.
Our business is based on collecting survey data and providing reporting and analysis solutions for the data we collect through surveying or otherwise feeds into the system. A core component of the solution is of course the survey, and a challenge has always been to store the data we collect in a way that makes data collection and real-time reporting and analysis as efficient as possible. Keep in mind that every survey is unique with its own set of demographic variables and measures, and that the complexity of one survey can vary from just a few questions to having several thousand variables including a good mix of demographic variables and measures, to use data warehousing terms.
Short about how we store survey data in our databases:
- One database for each survey.
- One dynamically created table for holding the samples and on table for storing the response data. The columns in these tables depend on the questions in each particular survey. And when a survey gets more variables than you can have in one SQL Server table, we just add another one sharing the primary key.
- One record in those tables for each person taking the survey.
Simple as that.
Turning back the clock – 1998
In 1998 Confirmit was a startup with 4 developers and I just joined the company after spending a few days too many configuring SAP and writing ABAP/4 code for a consultancy company. The 3 other developers had their responsibilities within areas like survey design and survey engine runtime, so trying to implement some kind of reporting functionality on top of the data we collected ended up on my plate.
And where could I find the data? You guessed right, in Microsoft Access databases.
And did queries towards these databases perform well under high load in a multi-user environment? You guessed right again, they did not. Some people think that Microsoft Access is a single-user database, but that is not really true as you can read about here: https://www.linkedin.com/pulse/20140728120923-4665957-yes-microsoft-access-works-in-a-multi-user-environment. Still, if you plan to build a multi-tenant high-end survey and reporting platform, you should consider other options than Microsoft Access for storing the data.
Fortunately, our surveys weren’t that large or many, and neither where our client base, so we were able to go with MS Access for about a year after the beginning in 1997. Anyway, as survey complexity, respondent volume and number of reporting users increased, it got harder and harder to explain to our clients why nothing really scaled. Because it did not. Period.
Glory days – SQL Server 7 and 2000
We knew all the way that Microsoft Access where not going to take us to the moon or further. And we wanted further, so after working some late nights while trying to hold off some slightly irritated clients, we were able to migrate our solution to SQL Server 7 late 1998.
For the next few years, SQL Server version 7 and 2000 served us their purpose well. Our main selling point at that time was our highly scalable online data collection tool, and each SQL Server database instance where able to serve a significant number of concurrent projects, some of the projects also with a significant load during peak hours. To make this scale we added:
- Support for multiple database servers and distribution of projects across these servers.
- Mechanisms for automatically detaching and attaching databases depending on activity. Too many concurrently attached databases on one SQL Server instance will eventually get you into trouble.
When it comes to reporting, our tools at that time where not too sophisticated and the number of respondents in one survey where rarely above 100.000. So guess what, we did reporting directly on top of the data collection databases, and this actually worked more or less fine for a while!
All good things come to an end, though, and after the launch of our online reporting solution ReportalTM, we were able to sell reporting functionality to clients having larger number of concurrent reporting users. These users were also able to do dynamic drilldown and filtering, challenging the responsiveness of the databases. And they were drilling and filtering on surveys with more than a million responses. Yes, you read correctly, more than a million, can you imagine? Consequently, it did not take long before we got complaints about the sluggishness of the online reporting queries. And not only did the reporting tool perform sub-optimal, the effect of the reporting queries on data collection weren’t that great either. After all, reporting and data collection where fighting over the same resources.
Arghh – what to do next?
Separating storage for data collection and reporting seemed like the logical next step, right? After all, performance and scalability of our data collection had been a key selling point for a while and we definitely did not want to jeopardize that reputation. So what to do? Some of the options we had at that time:
- Periodically replicate the survey data to dedicated reporting database servers?
- Start using OLAP?
- Or something else?
Option 1 was rejected because regular row-based database tables had proven too slow for our purpose. Each query typically resulted in far more IO than you want when creating a pivot table with just 3-4 variables, particularly if the database table has 2-300 columns. You may think that heavy use of indexes or fewer columns in each table could solve this, but believe me, we tried it all with no success
Option 2 might be a good option if you had a limited set of dimensions and measures, but as I already mentioned, surveys can have huge amounts of variables. So even OLAP was kind of a hype at that time, it never was not a good fit for us.
Then we had to do something else, but what? Words were out there that other companies where storing their survey data somehow in columnar fashion for their analysis. We also knew about Sybase IQ, a commercial columnar database. Sybase IQ was too expensive for us to use as it would require our On-Premise clients (those with their own installations of the Confirmit platform) to buy another expensive piece of software before they could even start using Confirmit. And no, we did not want Sybase to have a big piece of our cake. So, the conclusion was: Let’s implement something ourselves. How hard can it be, right?
2004 – Confirmit BitStream Format
So that’s what we did. In 2004 we implemented our own proprietary updateable columnar database based on the file system in Windows Server:
- One “database table” in one folder.
- One “database column” in one file.
Here’s what a “database” for one survey may look like in what we first called Data Inversion Storage, but renamed to BitStream format after an internal naming contest:
We used all tricks in the book to make each file as small as possible using different compression techniques. Confirmit supports a wide range of question types like, single-choice, multi-choice, grids, numeric, text, numeric lists, ranking etc. Each question type required its own storage and compression technique.
The result was:
- A new reporting storage that could execute queries sometimes 10-50 times faster than traditional row-based SQL Server tables.
- A system that could do reporting on top of either our data collection database or on top of periodically updated BitStream files. Our query tool generated queries in a SQL-like XML format that could be translated into real SQL or BitStream queries.
- A new beast that we needed to maintain and extend as reporting requirements where added. And not only reporting, all our panel sampling products were implemented based on the BitStream format.
For slightly geeky developers, you can understand that implementing such a database system was kind of the coolest thing. At least for a while. But after a while demands and requirements are adding up and then you starts feeling just like knocking your head in the roof (or the wall) when:
- The queries still do not run fast enough! We need some indexes on this database. At least for the variables normally used in filtering!
- The database needs to handle updates well, because data is coming in all the time and the reporting need to be real time of course!
- The queries still do not run fast enough! And the server has many cores, right? Queries should of cource be multi-threaded!
- And scale out well on any number of servers!
- And shouldn’t we have some performance counters and tracing opportunities comparable to what’s in SQL Server, maybe?
- And why is joining not working so well on large data sets?
- And what about sub-queries?
- And windowing functions like rank, lag, lead and over?
- And all the other cool stuff that you can do in commercial database products?
- And performance counters and tracing functionality.
- Etc, etc
Some of these things where implemented, but the complexity of the beast grew, and it became obvious that something needed to be done. Should we re-write the BitStream engine with all these things in mind? That sounded like extreme sport to us, so we rather lifted our heads and started looking around.
Commercial Columnar databases
The year is now 2012, and the market for commercial columnar databases has matured since 2004. Here you see some of the products we looked into to see if they could help us out.
All of these were tested and all of them have some great features and characteristics that would make them a good option for our business A small team spent more than half a year looking into the different products before drawing a conclusion.
The bell – SQL Server Columnstore
By the end of 2012 we completed the product evaluation and landed once again on going all in with Microsoft. Some of the key differentiators leading us to that conclusion:
- SQL Server was already in our technology stack. Selecting another product would mean that many Operations teams around the world would need to do a significant competency lift.
- SQL Server came out strong in performance tests. Both for read and write.
- We got local support from competent Microsoft database engineers during evaluation.
- We already had several developers with many years of experience with SQL Server in our team.
Even though the first version of columnstore indexes had its clear limitations, we trusted this was a feature that Microsoft planned to invest heavily in for the upcoming releases. Since then, the history has proven that they have been able to deliver to our expectations.
I am not going to talk about what columnstore indexes are, how they works or go into details about issues and improvements that have happened over the last 5 years. There’s probably enough information about that out there already. For those interested in details, I highly recommend Niko Neugebauer’s blog: http://www.nikoport.com/columnstore/.
I’ll rather briefly go through the latest releases of SQL Server and what limitations we have seen and what improvements that have played to our advantage. You can also read more about the evolution of columnstore index features here: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-what-s-new.
Key to queries running on large data volumes stored in columnstore indexes is that they run in batch mode, not in row mode. Check out https://blog.sqlauthority.com/2012/06/21/sql-server-columnstore-index-batch-mode-vs-row-mode/ for differences between these modes. In SQL Server 2012, you could have good luck or bad luck when it came to query mode. By trying and failing and trying again, we somehow got used to the moody behavior of the database server and we were often able to work around the row mode issues we got into. One of the oddest of the oddities were that a query did not run in batch mode if it did not have a group by clause, meaning that a simple select count(*) from table would run e.g. in 10 seconds, while select x, count(*) from table group by x could run in 0.005 seconds. Hard to believe, right?
Another issue for us was that the indexes were not updateable, meaning that our typical use case of data constantly trickling into the system was hard to support. We solved this by partitioning tables by time, but then we always had to load full partitions for all partitions having at least one change, making the ETL implementation more complex than we wanted.
Version 2014 introduced updateable clustered columnstore indexes and many improvements to query plan generations, along with other general performance improvements. Particularly the updateable indexes where important to us, as trickling of data got a lot easier. Unfortunately some clients run batch update jobs on their survey databases with many records, leading to large delta stores in our indexes. It took some effort to come up with good ways of online defragmenting our indexes in these cases, but eventually we found a solution works well for us. If you’ll ever try the same yourself, be aware that sometimes the queries towards the SQL Server dictionary tables to figure out about delta size and number of records marked as deleted in each row group can get really slow once you have many tables and row groups.
When it comes to improvements to query plan generation, most of those played well with us, but as you should expect after significant re-write, some of our queries got slower in 2014 than in 2012. For a while we actually had to switch to the old cardinality estimator (by using trace flag 9481) to have some of our heavier queries to run well. This was improved in the first couple of service packs and we have now switch back to the 2014 cardinality estimator which now work well for us. The main challenge since then has been that the cardinality estimator estimates too low cardinalities and we may end up having tempdb-spill during query execution.
Some other issues with 2014 that was carried over from 2012:
- Queries need to get a parallel execution plan to be executed in batch mode. We solved this by setting a low (zero) cost of parallelism for the reporting SQL Server instances. This results in some queries that would have performed better in one thread to be parallelized, but that pain is by far out-weighted by the advantage of using batch mode on the columnstore scans.
- The row groups are automatically closed a little too late for us. The normal behavior is that they are closed when reaching 1 million records. We have seen that we have great advantage of having them closed earlier than that. A lot of the surveys on our system have below 100.000 records, and even for those surveys we have seen big advantages of having the indexes closed and compressed. Good for us that there are ways to force compression before the internal Tuple Mover process kicks in and does it for us.
We have not yet started using version 2016 in our production systems, but we have already played with this version for 2 years. We are closing in on start using 2016, but we did find a few issues during testing that have halted us for a while:
- SQL Server 2016 consumes more memory when you reorganize a columnstore index.
- Parallel redo causes high memory usage in SQL Server 2016 when it’s compared to SQL Server 2014 or earlier versions.
- Error when you add a NOT NULL column with default values to a non-empty clustered columnstore index in SQL Server 2016 Standard and Express edition
All of those are fixed in CU3, so we are now moving forward and planning to upgrade or first production environment to 2016 over the summer.
Some observations from our test environments that that are worth mentioning:
- String predicate pushdown is a great feature, but so far we have only got this working for comparing with constants. When using joins or subqueries, we get a table scan and filter in the execution plan instead of the pushdown within the table scan. Nothing in the documentation about this limitation, so we created a support ticket for this. The current status on that one is that the documentation will be updated and they may remove this limitation in an upcoming release.
- Dynamic memory grants in batch mode is just a must-have for us. This is currently hidden behind trace flag 9389, but our impression is that without this flag, we are more likely to get temp-db spill for batch mode queries than we were in 2014. Using trace flag 9389 looks like solving this issue in a really good way.
- 2016 SQL Server have started to pre-allocate log buffers for all attached databases. This is an optimiziation done with InMemory tables in mind, but it takes affect for all databases. This leads to an additional overhead of about 3-4GB of stolen memory (check the performance counter in perfmon) compared to 2014 if you have 1000 databases. This would probably be fine for most production environments because you do have lots of memory, but for our 70-80 test environments this is problematic. Due to this we will stay with 2014 for most test environments to see what Microsoft decides to do. We have got indications that they’ll add some heuristics to the pre-allocation to limit the effect on database without any InMemory tables.
One great new feature in 2016 SP1 is that columnstore indexes is now a part of Standard Edition, not only Enterprise Edition. This is good for us, even though some features are throttled in Standard Edition, as some of our On-Premise clients may be able to get lots of the performance benefits from columnstore indexes without paying the price of an Enterprise Edition license.
As the requirements and feature sets of our reporting and analysis solutions keep growing, we will keep pushing the limits of columnstore indexes queries. Good to know, then, that Microsoft continues investing in improving performance of batch mode queries. One example of that here from Niko Neugebauer: Batch Mode Adaptive Joins.
While still focusing on getting the best out of columnstore indexes, we also need to widen our horizon when it comes to storing data for analysis. Partly because other types of data than structured surveys might not fit that well into traditional database tables, but also because we are moving towards a more cloud-friendly deployment of our system. Hard to tell what the future will bring, but we have started looking into technologies like Apache Spark and Elasticsearch. Exciting days here at Confirmit R&D – even if you’re not into Backbone, React or Redux:-)