Big Data Analytics vs Data Warehousing
Question: What is the difference between big data analytics and data warehousing?
Question: What is the difference between big data analytics and data warehousing?
Problem: How many people are at the beach on the 4th of July?
Solution 1: Big Data Analytics Approach
Take a fleet of quadcopter drones and have them take thousands of snapshots of the people on the beach. Download many gigabytes of jpegs. Use ML for facial recognition to identify individuals and take a best guess estimation by deduplicating the result sets. Give a number within a confidence interval.
Solution 2: Data Warehouse Approach
Block off all access to the beach. Don’t let anyone in until they swipe their driver’s license or photo ID. Query the database, get an exact count.
Big data analytics is what it is because it makes guesses from data that is not structured to answer specific questions. Data warehousing is what it is because you absolutely structure the domain to be queried and setup data collection according to that purpose.
So ‘big data analytics’ essentially means inefficient unstructured data + smart guessing. All of the credit card transactions in the world are data warehouse structured, and have always been. But that’s not ‘small data’.
In an ideal world, all big data analysis guessing evolves to data warehouse structure.
The reason big data analytics is pervasive today is because it is mostly engaged in analyzing markets from social media and other web sources. It involves imputation because this is often done without the direct knowledge or explicit permission of the persons surveyed. Of course there is a lot of data to be analyzed. At some point in the future, imputations will give way to explicit permissions — that is the direction of privacy regulations like GDPR.
What we want to know actually doesn’t require big data so much as it requires structure and permission.