
Discover more from Stoic Observations

I'm newly calling myself a data engineer. I'm a grumpy old man who loves data and hates sloppy. I used to call myself a 'cloud data guy' because that pretty much explained it. But then people started making distinctions between 'data scientist' and 'data architect' and 'data modeler' and 'devops', and it made me more grumpy. I've been doing this stuff for > 25 years and this is my fourth generation of tools and techniques. I expect to do all of that. I am relatively tool agnostic for about 2 years and then I pick one and become a tool bigot.
Mostly I have been building managed services in AWS for industrial clients, and that means multi-tier data architectures. That means streaming AND data warehouse AND data lake AND business intelligence AND machine learning AND NLP. Whatever the workload, I have to make all of that work, so it's almost more like data infrastructure plumber. I make sure the pipes work and the water is clean.
But it's also important for me to represent myself not as some geek, but as an engineer. I'm one of the STEM guys before they called it STEM. And I have a very hard time with the way people monkey with facts and logic these days. So 'data engineer' is distinguished from 'cloud data guy' in that I'm not making my money in the realms of social media Mordor. Sorry if that offends, but not sorry. I don't like the fact that people make money selling metadata without express permission. There is so much other vital data that needs tending in this world. That's what I came into this business to do.
I chose this profession because it finally became clear that you could do well without going to Wall Street. That was around 1991. I was inches from going to work for Bloomberg when some friends showed me how they hacked a Bloomberg terminal and got 15 minute quotes on a Mac. I realized at that moment that client server computing would eventually win. Plus, I learned earlier that every business will change their business plans. That means they will always have new requirements for data systems. It's an infinite stream of work.
I don't have any typical days. That's because I work for a small firm and I need to know a little bit of everything. The interesting thing I am doing now is learning the new VerticaPy. I am also responsible on occasion to assist in the migration of old apps that I built into our new and improved CI/CD standards. I'm also on the security team with audit responsibilities for our ISO 27K certification and maintenance processes. I also write most of the marketing copy and product spec stuff for public consumption. I also do strategic brainstorming for all of the crap my boss, the CTO throws over into the slack channel.
The future of data engineering will revolve around Apache Beam, if they can get it to work with Go. Ultimately Python will be too slow and Haskell and Scala too obscure. Rust has a fighting chance. There will be a continuing shakeout in containerization standards for multi-cloud migrations and transparency. I would not bet against HashiCorp. MPP is king and the first vendor to kill zookeeper for good wins. Personal medical data is the industry of the future. It starts with the Apple Watch and Health apps. Blockchain may or may not change the future of auditing and provenance. I think that will require entities that are already wealthy to de-ponizify the moral hazard of pump and dump that defines the sector. It will be 10 years before a healthy shakeout.
Data is the oil of the future. It will need to be refined well and distributed well. It is neither today. There is plenty of mind-numbing work for everyone. Oh, and here's something that nobody does right now. Nobody puts secure hashes on video and news. Fake news will get people killed until that changes.