The Relational DB Headaches I No Longer Have
I’ve been working with columnar databases for several years. There are many advantages that I’ve internalized using Vertica and Redshift…
I’ve been working with columnar databases for several years. There are many advantages that I’ve internalized using Vertica and Redshift. I also use them in AWS, always. So I’ll just talk about the way I design from a practical standpoint.
I never shy away from using human readable field names.
I never shy away from using materialized views.
I never worry about complicated joins from a performance POV but only from a readability POV.
I almost always use denormalized fact tables. The only time I don’t is when the BI front-end can’t handle it.
I do 90% ELT and 10% ETL. Unless there is XML or JSON involved, I perform all transformations and cleansing in the database.
I do NOT depend on BI tools for complex metrics.
I love using very tall KVP tables into the billions of rows and never worry about table scans.
I never worry about select distinct queries on non-indexed fields.
I never worry about storage requirements for indices.
Since I mostly deal with data warehouses, I don’t spend much time concerning myself with commits and rollbacks. I have differences with my colleagues on that point, but in general I prefer dealing with transaction crap in in-memory databases like Redis and VoltDB. So my basic philosophy is to let designated source systems do overwrites for some fixed period of time and let volatility be what it will be. Especially when I’m working with a data lake, I don’t worry. I’m generous with space, so I will parse message queues to get the latest updates as they come, and store the transactions in compressed flat files. That way if push comes to shove I can recreate the input stream and literally know everything that is slowly changing. So, commit everything and keep all versions. Like git.