[Music] thank you for our next tech stage session we have Monica Miller developer Advocate at Starburst presenting on solving the data divide the role of data products in Bridging the Gap between producers and consumers Monica welcome the floor is yours hi everyone I'm Monica Miller and thank you for joining me as I dive into one of the biggest challenges currently affecting the data space which is The Great Divide between data producers and data consumers in our time together I want to evaluate the current state of data ownership and then I want to talk about data
products and the components of data products that can contribute towards improving challenges we see in the data space today before I even get started I think it's important to set the stage as a data engineer I acted as both a data producer and a data consumer and I've definitely lived through my fair share of struggles that are associated with either side it's kind of like that controversial dress where half of us saw the dresses blue and black and the other half viewed the dress as white and gold and in reality both groups are right and
in my opinion that's what's happening in the data space right now you're having these data handoffs that are resulting in untrustworthy and unreliable data and both sides are raising valid concerns the biggest challenge in the data space today is a misalignment between data producers and data consumers it's difficult to identify because it's not something that's necessarily quantifiable but it's definitely tangible and if you've spent time in either role you've probably experienced this firsthand the major contributing factors towards this are that data consumers fail to accurately convey their needs to data producers request change updates are
made and the initial asks snowballs before requests can even be fulfilled and they aren't properly communicated on the other hand data producers struggle to understand the business value attached to various requests no priority is given to the most critical business flows and everything is half maintained and there's no motivation to assign ownership the data becomes untrustworthy the service level agreements become unreliable and the problem only gets worse the more that the data changes hands especially in larger organizations where it's usually passed through a couple of teams before it reaches its end destination the reason for
this misalignment is a lack of ownership and accountability and I'd argue that this actually does not Trace back to the data practitioners but rather this is a product of the current data environment we haven't had the tools or the resources to make this a priority within existing architecture and many data Engineers are just stuck putting out one emergency fire after another but it's not all Bleak because the current data climate has recently shifted its focus towards this issue data ownership data accountability data governance are all extremely important components of the data Journey that are now
gaining traction and value within the community one of the most common points of discussion recently has centered around data products and how data products can be utilized to strengthen accountability and encourage collaboration so let's talk about data products and discuss their importance as described by Community leaders Teresa Tong defines data products as a data set that creates value for Downstream consumers Sanjeev Mohan frames data products as directly solving a business problem and last but not least Jamal dagani defines the components of data products as the code the data the metadata and the infrastructure essentially all
three industry experts allude to the value brought to the organization by data products whether that be in a data mesh architecture or in the currently existing architecture that's built today data products are an Innovative Modern Way of creating curated data sets which can be saved published searched for and consumed to provide business value at Starburst we Define the components of data products as the data metadata and the access patterns associated with that data set so we aren't just talking about the target output here we're also referring to the business context the data lineage the slas
the data ownership who has access to what data and how that data is accessed there are lots of components that can contribute to the design of your specific data product and in a perfect world a data product would have every single element on this graph however if it doesn't that's okay it's a marathon it's not a Sprint and if you're just starting your data products Journey here are the imperatives you should focus on first data products should be demand driven and should be designed and built to serve the purpose of a clear need which will
reinforce the data producer and data consumer collaboration data products are reusable and scalable and they should be designed to promote easy reuse across multiple use cases data products should be discoverable and accessible and organized in a manner that will help teams quickly find and access the information they need while also being able to be shared to maximize their value and last but not least data products need a committed owner a formal owner who's trained on their responsibilities throughout the entire life cycle Starburst data products are unique because they allow you to discover publish manage and
share data products based on multiple data sources instead of being siled into only creating data products from a specific location Starburst gives you the power to create and maintain trusted data products for your entire organization no matter how many data sources you utilize this enables users to confidently find the right data to solve important business questions regardless of the source here's an example of the details within a data product such as the relevant links the tags the owners descriptions and the identified data sets there's also usage metrics associated with each data product which can be
further analyzed for even more insight and comment feature capabilities which encourage efficient communication and collaboration tackling the lack of ownership in the data space is no small fee it will require a simultaneous culture change along with a new technical approach if organizations agree to shift that mindset within both their architecture and their processes this enables improved data stewardship iterative feedback reusability trustworthiness and essentially empowers collaboration and I'm confident that data practitioners on both sides will quickly experience the benefits thank you