Drive has 700+ articles for digital transformation leaders written by StarCIO Digital Trailblazer, Isaac Sacolick. Learn more.

I was surprised this morning to see an article about the janitorial” work data scientists have to perform to be able to find “nuggets” in big data. Actually, my only surprise is that the story is in the NY Times and that they are covering the least glamorous side of the “sexiest” job.

Why is data so messy?

Let’s start with the past. The history of data science starts with complicated data warehouse, expensive BI tools, hundreds if not thousands ETLs moving data all over the place, and bloated expectations. On the other extreme, many organizations have siloed databases, DBAs largely skilled at keeping the lights on (future post?), and spreadsheet jockeys performing analytics. The janitorial work data scientists are performing partially exists because of the mess of databases and derivative data sources previous generations left behind.
And I’m not sure this generation will get it better. As I reported just a couple of months ago, with great power comes even greater responsibility. All the technologies and tools data scientists have at their finger tips also have the power to create a new set of data stashes – informal places where data is aggregated – or buried data mines – places where analytics are performed, but not automated or transparent to future scientists.

If data scientists, DBAs, and CIOs are not careful the data stashes and buried data mines can slowly transform into full blown data landfills. 

DBAs know what I’m talking about. It’s a combination of data warehouses, reports, dashboards, and ETLs that no one wants to touch. No one understands who is using what reports or dashboards in what business process for what purpose or benefit. ETLs look like a maze of buried unlabeled pipes developed using a myriad of materials (programming approaches) and with no standards to help future workers separate out plumbing from filters and valves.

Build Foundations, Not Landfills!

Data scientists and their partners, data stewards, DBAs, business analysts, developers and testers need to instill some discipline – dare I say data governance – and balance their time mining for nuggets with practices that establish data and analytics foundations. For an upcoming post… Remember, big data is a journey.Until then, here are a few things one can learn about data science from a fourth grade class and think twice about creating another data source!

Published on:

Leave a Reply


StarCIO

My company, StarCIO, provides leadership, learning, and advisory programs for companies looking to accelerate delivering business value from digital transformation. Contact me if you’d like to learn more about partnering opportunities.


Isaac Sacolick

Join us for a future session of Coffee with Digital Trailblazers, where we discuss topics for aspiring transformation leaders. If you enjoy my thought leadership, please sign up for the Driving Digital Newsletter and read all about my transformation stories in Digital Trailblazer.


Coffee with Digital Trailblazers hosted by Isaac Sacolick

Digital Trailblazers! Join us Fridays at 11am ET for a live audio discussion on digital transformation topics:  innovation, product management, agile, DevOps, data governance, and more!


Join the Community of StarCIO Digital Trailblazers

About Drive

Drive Agility, Innovation, Transformation

Drive is the blog for digital transformation leaders brought to you by StarCIO and Isaac Sacolick.

Agility, Innovation, and Transformation are the three primary digital transformation core competencies that every StarCIO Digital Trailblazer must champion in their organizations. Learn more About Drive.


About the StarCIO Digital Trailblazer Community

StarCIO Digital Trailblazer Community

Revolutionizing traditional learning, networking, and advising experiences.

Visit the community


About StarCIO

StarCIO

About Isaac Sacolick

Isaac Sacolick

Author, 1,000+ articles, keynote speaker, Chief StarCIO Digital Trailblazer. Full bio


Driving Digital Newsletter

Driving Digital Newsletter

StarCIO Guides

StarCIO Agile Planning Guides

Digital Trailblazer

Digital Trailblazer by Isaac Sacolick

Driving Digital

Driving Digital by Isaac Sacolick

Driving Digital Standup

Driving Digital Standup

Coffee with Digital Trailblazers

StarCIO Coffee With Digital Trailblazers

Recognition

InfoWorld 2025 Judge
InfoWorld Technology of the Year 2024 Judge
Thinkers360 Top 10 in IT Leadership
Thinkers360 Top Agile Thought Leader
Thinkers360 Top DevOps Leader
Thinkers360 Top in Digital Transfomation
Thinkers360 Top in Analytics
Thinkers360 Top in Product Management

Discover more from StarCIO Digital Trailblazer Community

Subscribe now to keep reading and get access to the full archive.

Continue reading