Shadow data management happens whenever data scientists or analysts copy data from the primary, IT-approved database to run analysis with their own tools in their own environments — many of which have not gone through the rigor of IT approvals.
This is an issue that will create many headaches, ranging from security vulnerabilities to inadequate return on investment (ROI) for AI projects and also amplifies a company’s liability if sensitive user information is included in the “shadow” dataset.
To understand how to address the challenge of shadow data management, it’s important to first explore the underlying root causes. After all, data scientists and machine learning (ML) engineers aren’t devious rule-breakers devoted to causing chaos. Rather, they’re driven to shadow data management practices because their technology needs aren’t being met by their IT department’s approaches to data governance, management and risk.
In this blog, Peter Wang explains how data scientists and their IT counterparts must work together to develop data management strategies that work for both sides.