Press "Enter" to skip to content

Data Governance in a Cloud-First World (Cloud Next '19)

how's everyone doing nice it's two o'clock we're ready right my name is Aaron I'm the director of product management in data analytics organization in Google clouds and today I have my guest Michelle she will join us in a few seconds if you want you can come up here to yeah and today's topic is data governance in this world and we've got a lot of questions yesterday even during my presentation for streaming analytics whenever you have asked I think you're there thank you I love that topic this is something we're gonna talk about probably at length over the years but I want to start what we're gonna discuss today and Michelle is gonna color a lot of insight for you guys I'm really excited to have her here with us and we're gonna talk about what data governance is what companies are doing in the end I'm gonna little cover about what we're doing on Google clouds to help you really deal with your world and how these things can be actually enabling capabilities to grow your businesses so let's talk about data data is getting bigger and bigger I think I mentioned this yesterday too by 2020 every one of us in the world every second will produce about two megabits of data every second that's a lot of data sum them up there's about 88 billion of us that's quite a bit of them look at look at all those things that we're seeing today their business is being built around this but still we're scratching the surface of what we can do with data either we're not ready either we're afraid of it or we don't have the right choice for it who remembers the v's of the data the big data remember velocity variety volume those three things are still kicking really really high take a look at the volume and variety it's tripling fantastic for us because you can take this data and great generate some great insights for your businesses but if you're not prepared to deal with it that won't help you look at the velocity it is shaping up immensely yesterday we talked about streaming and I had a partner on stage with me Amy tasted they were talking about how they were utilizing this real-time data to generate revenues for their customers you can do the same if you can harness how we can use the data if you can turn governance into revenue generating mechanism for yourself we are seeing businesses building data and event-driven structures for themselves you can do the same we have the platform for you we have the means for you there are a lot of things that we're learning but I said we're not really using all the data available enhance take a look at this only 1% of the unstructured data is in use only 1% if you look at the 50% of the structured data is still sitting and not being used for decision-making we're leaving it we're leaving value behind us what will it take for us to really get there I think all of us know why we can't do this right it's complex say there's distributed everywhere lots of legacy applications silos we may not have the talents but there's still awake what do you think well we have to address people I know we have to address people what do we really want we want to be able to use it we want to be able to interact with it we want to be able to play with our data but our experience around our data is really we struggle with it we struggle with it on a small data scale and if we struggle it with it there we're gonna struggle with it on a big data scale who owns the data all right let's some guesses it has to be interactive who owns the data well the data all right that's one way collective okay do you own your data yeah company owns the data these are way better answers than what I usually get we have smart Googlers in the room um few years ago we started asking the question who owns the data looking at it through the lens of saying where's the data strategy sit where does governance it where does master data management sit all things that help us better understand and take advantage of our information to make decisions to grow our revenues to create great customer experiences and data decision-makers what did they say IT owns it 50% of them I take take care of it losing I see here you own the data do you want to own the data no so what do we do so that was back in 2015 and 2018 we asked the question again and what did the data systems makers say they said Iced he still owns it and more of them said that IT owns it so what does that actually mean I mean you said who created it where does it come from who's you know who's taking care of it and it really boils down to this I want my data to work stop frustrating me every time I go into my application every time I go into my business intelligence every time I'm trying to run analytics every time I'm just trying to figure out who my customer is so I can have a grand old party right so lock your data don't make it work for you and when they talk about or when our respondents talked about IT owns the data it's because the complexity of it is a nightmare it's what we deal with every day we need to simplify it so what we're gonna do is now take this to data governance because data governance really is that accelerator to make it happen do you believe that and she will convince you guys yeah how data governance isn't accelerated for the business seller radar so let's introduce you to data governance first there we go so if you look at data governance little history go back some of you are coming from the very highly regulated industries back in 2007 it was specific to the department's very specific in 2008 if you remember there was this crisis lots of regulation around the financial data what can be done who can be done very prescriptive remember those days things have changed but if you look at today where we are GDP our happens right in Europe in here this EPA is happening don't be afraid of it it's actually going to help you make more sense from your data what's driving the governance in the data governance in the initiatives we want to connect to our customers see who they are get a customer 360 we want to control our data protect ourselves stay in compliance we know that our regulators look at how we use data and the type of data that we're collecting to say how we're doing um you know we just don't feel like we're in control let's that make that data work thing we don't have great experiences how can we trust our information to make decisions every one of them are actually asking about these things the complexity of the data lack of control that's or maybe the perception of control and how we can actually operational drive efficiency around it and risk management these are all items that are driving the governance needs now it is about policy it's about people it's about technology now you might say hey I heard the same thing for other things too but it's absolutely true but there are specific cases that we can actually leverage and build together to really make this thing work for you it is going to be driving the success how does it come to rescue well data governance is like any other competency and practice in our business people process technology sounds really boring on the surface of it but it's that brain trust that comes together we're not only are you trying to drive down the the complexity whether you're trying to mitigate risk but on the other hand we do have strategies of getting value from our data and we need policies that enable us to democratize it and to find those great insights and so when we talk about data governance as that starting point for that value and accelerating our business forward that's how we should really be thinking about our governance program it's a strategic competency it's not just have you seen examples of people who operationalize this what does this thing well these days so I think John Deere has done some really interesting things anyone in here in the world of where John Deere is similar to some of manufacturing operations well maybe they will take it though all right let's tell a little bit of story about this just like every other company here all of you here governance is a nightmare and when we've started on big data and predictive and prescriptive analytics we don't want anything getting in our way so what do we do we grab the data we need and we start analyzing it and we build insights this is what john deere was really focused on how do we help farming communities drive better crop yields how do we better manage the finances around crop insurance for example and so they were just grabbing all of this IOT data off of their tractors watching how those tractors were operating in the fields how the seeds and the fertilizer was going how the harvesting was happening relating this all back to their customers and the finances to support this as well as the dealerships who are hoping with a lot of the fixes on the equipment um what happens though they ran into a problem they were thinking about governance they weren't thinking about what is the org structure of themselves in order to relate back to their customers and dealerships they didn't have the proper master data around their customers and their dealerships so you can think about how do you create all these relationships and really understand what information can I share or should I share with whom there are things that agronomists don't want to share what the dealerships or they don't want to share with their banks it's very personal and when you move equipment around in big farming areas what happens they don't just tracked her down the street you're not always stuck behind the International Harvester sometimes I put on flatbeds and they move what happens when you move that tractor 50 miles and you're trying to figure out how your harvest is going that's a problem so a lot of data quality issues were starting to come up so what they really did is they took a look at how they were moving forward very quickly they were running a DevOps agile development practice not thinking about governance as requirements now they start to bring those business requirements in that really drive the policies the understanding about who their customers are who their dealerships are who they are as an org understanding how their equipment runs and it's not just this control thing that drives you away but it actually brings the value back in and so rather than struggling for another year with these practices they're able to really drive forward very very quickly because the information is trusted and the value of the insights really be together this is why we wanted to share some of these examples with you early on because if you're on this path and start actually looking at it down the road you'll start seeing hey we didn't really think about this it's best to really consider what we need to do collectively as you're building your operations around data around events and what you want to do around big data because governance is gonna be your guide your governance is gonna be your help it's gonna be there to really help you excel some of the things that are coming in your way that you may not be concerning it today so how do you start to look at this well you want to have a vision for the future you want a success story you want to know what success looks like so let's connect the dots between what information that we have does it matter which industry the industry no difference no difference they may not be farmers in here you don't have to be a farmer okay good you don't have to be a mechanic okay you don't have to be an engineer you could be all of those things but they have supply chains they have supply chains they run they do they've got financial systems it's your business it doesn't matter and everybody has a vision for where their business should go and what those objectives are so build your success story around that start with where are my business opportunities how is data gonna shape that for me where do I want to take my business forward John Deere's example there is to push the envelope on not just being a farming equipment company they want to help you do a better job at bringing products to market them yourself and recognizing being real about the challenges that you face not just that you're missing information but really holistically looking at how you interact and engage with that data and how you're going to put it to use look at the people and processes the experiences around that data from the point of bringing it into the point of where it's being presented to the point where you're making decisions and taking actions off of that now you can start looking at a much more holistic solution for your data strategies overall for the management of that and taking into account the governance policies that are going to happen from it and then it boils down to how do I know I did a great job how do I sell myself how do I tell that story how do I get that next promotion it's what I do great from a data management perspective what did I do great from a data governance perspective and what did I do great from a business perspective so you have three different points of view for every stakeholder that is part of delivering value and 69 percent of companies who do this who think from a business lead perspective have data governance and programs have data governance programs in place so that doesn't look like putting the brakes on your business does it they have grown and the way that you start to enable and drive culture change around that is how do you know the data is good where do you put in these cues as you bring data in as you're going to use it as you distribute that out to your customers you're looking at social mechanisms that say I like my data and you see the thumbs up you see a star that says it's certified and it's managed and it's governed well so I know somebody's taking a look at it and has a vested interest in my personal ups and my personal success with the data and there's also accused from a governance perspective to say are we hitting up against any regulations are there risks from our data so really thinking holistically around those success factors and metrics and KPIs and bringing that back into the experience to drive in appropriate culture around that no more reactive much more proactive approach exactly no roles and responsibilities often times we think of governance as you know the big bureaucracy at the top and we have our executive councils and there's a lot of conversation and then you say what happens well now I'm gonna go fix the data but really what we're doing is driving a hub-and-spoke culture around this having centers of gravity where expertise about the data and the value from the data is going to come in and even looking at that directly at the endpoints of your organization when you said who's creating the data they're stewards and custodians not just creationists they have a vested responsibility in ensuring that the information that they're collecting and bringing in not only works for them but will also help drive things and continue to pay it forward to the rest of the organization so when we talk about the experience of data and making data work don't forget to your end-users don't forget the fact that nobody likes to jump through 17 different applications to figure out how to provide customer support when somebody's got a problem with their insurance policy and needs questions answered and oh by the way I've been through that process which is why I explained it that way and when you're a data scientist and you want to go in and grab data and manipulate it and prepare it that's great information and insight for the data teams coming back where they have to now interpret and deploy those models and rationalize them against policies so building a mechanism for tribal knowledge to come through is what we talk about when we say a change in culture and bringing data governance to the edge of your business now what happens you need an awesome delivery team to do that who's gonna instrument across infrastructure your data sources and your endpoints from applications to devices to automation we need engineering and data ops right within our chief data office they are the masters of delivery building the services and the pipelines that are going to deliver taking just like John Deere did those data governance requirements and data requirements mashing them together and really thinking succinctly about where that drives into the solutions that happen the chief data officer that you saw is a trend if you look at the next 18 months last 18 months that there's a huge tick and you're gonna actually see that continuing because everybody is trying to figure out what their data strategy is and you're trying to build their organization around it and each of those chief data officer czar coming with the mindset of okay what do I do to get more value out of my data that's exactly the organization they're trying to establish I am willing to bet some of you are already in those organizations today exactly and you need that strategic leader because the complexity around this not only in terms of what you want to aspire to with your data but how you operationalize that not just from team and delivery but holistically your program has to come out look at all these different processes that are driving towards outcomes managing around both offensive and defensive responsibilities taking into account today more and more especially as gdpr and CCPA have come on and security breaches are you know at record levels security is brought into the fold it doesn't just sit over here anymore when you're moving into more complex data science and artificial intelligence capabilities you are now tapping into historical archives of your data turning cold storage into hot storage what is the life cycle of that sitting there do you actually know the lineage of your data do you know if you can actually use that data do you have the process to identify all those exactly and so creating those policies recognizing what they are communicating them training the organization to do that all these different policies are resolving themselves around those outcomes and objectives that you're looking at both from a data governance point of view as well as your business policies as well as regulatory and risk management did you see some instances where some good dynamic policy investments and companies are really actually integrating into their business practices I think the oil and gas industry is amazing in the way that they think about this it's very complicated on on drawing oil out of the ground so if you think about the complexity of their partner ecosystem to just build you know build the drill to put down the well and then run all of the the management and maintenance off of that handling inventory and parts you know they really operationalize the data to the tasks and so those policies have to balance against things like trusted Network to share information ensuring that parts aren't coming from the black market and really are too speck in here they've purchased from so they don't have a drill go down and lose a million dollars an hour so you can really see where policies business policies risk policies are all brought together just to bring oil to market or energy I mean energy the same thing if you're on electric cars exactly so policies are really coming together in that manner to how do you deal with the complexity oh my goodness Wow Tech's gotta take care of it somehow someway without giving you too much credence I mean you kind of need a Google at the center of your your of your environment and why do I say that we think of data and how you manage it around this notion of metadata yeah how do we describe our information how do we set policies around that information how do we see ownership for that data and and where is it located what's the lineage of our your data assets yeah it's just yeah you you the first thing we do is profile our information and capture all of that metadata what does that metadata mean for us though is the bigger question and so then our governance processes come in and what do we do we slap a glossary on top of it so that we put it into business context now we got an awesome data governance program where data governance and data management are all coming together and we'll figure out how to connect all of our systems and make our services and pipelines flow and we should be often running to the you know races and we're all set but in the world where we have to change the experience we're democratizing data is much more important than it was in the past where data engineers don't stop at bringing data into a data Lake or into another type of repository but they have to think about that delivery they have to think about presentation of information you need a marketplace what does that market look like so catalogs today start to become that center of the universe of not just describing the physical logical and semantic aspects of the data not just housing all of those policies and ownerships and practices but also becoming that point both from services as well as direct access to information so that you have and I hate to say it we're back to a knowledge center guys don't twitch if you remember aloneness it's so much smarter because machine learning is at the heart of it so all the social cues that we talked about earlier you can see what people are sharing you can see what people like you can see what people are changing you can see what people are saying about information and that's not just on the consumption side that's your governance teams and that's your data management and engineering teams as well so it's really important to think about how do you bring together the metadata facet the business glossary facet and the marketplace and engagement and build services that are not just about an ETL job or a streaming job but look at the queries that are running against your information look at the Python and our scripts that are coming out of your data science notebooks all of that information really helps you understand not only what data you have and how it can be used but where it's providing value and this is my English muffinz slide I like talking about the nooks and crannies of where data lives don't just think that that metadata is only in your databases it's everywhere how are you managed let's go back to the John Deere situation how do you manage master data is that in an MDM environment it could be a Kafka topic now you're distributing all of these definitions and models about your data across every piece of technology your catalogs have to be able to profile and understand and interpret and synchronize that information it's really critical this is how you can navigate your fabrics this is how you have the appropriate oversight don't just think about your apps and what is following through think about the assets think about the containers that you're managing think about all the servers that your data might be running through it's we really have to consider all that in your system get started with it on a smaller basis think about your solution think about the pipeline first begin there so that you're continuously accumulating this information and starting to I don't know if you're any gamers in the room you know when you've got your game map open and it's all highlighted and you see a big light and the section of the map that you're in in your universe and then as you move around it's the map starts to open up ok not a lot of gamers here no no no sorry but everything becomes illuminated as you kind of move through and that's what your your catalogues should be able to do and help you with it should start illuminating iteratively through your data ecosystems lessons learned one size does not fit all it is your data has to fit the experience and the policies and procedures and culture of your organization it can drive massive benefits when you tie it to those business incomes and incomes outcomes hopefully driving income to deliver there is a certain amount of centralization in the organization that is required somebody needs to have an ID you know has a purview over the entire landscape that's really where that chief data officer comes in and drives some of the top-down priorities and objectives for your data as well as you know making sure that the bottom up is orchestrated appropriately that's a tough job it's complex from an Operations perspective as well as from the data perspective it requires that different approach to empower the enterprise that notion of tribal knowledge and tribal contribution get that governance and culture out into the edge and bring that back in and that'll be multiple tools don't be really locked in so I gotta have one tool that does everything there are only multiple services that you're gonna put together think about the flexibility try to solve it around the problem that you have in hands pull one thread all the way out see what we can actually do exactly and so measure it based off of the IT objectives the governance objectives and the business objectives and that's ensuring that you're striking the right balance between what does the value of your data is bringing as well as protecting you on the back end from risk and don't think governance as compliance only it's an enabler for you what you're gonna be do doing with those capabilities for your business a few things we're doing today on Google we are we actually announced just about all these things so I can talk about it now some of you may have been using our cloud dlp services nobody ready or data is you heard about our programs yesterday anthos and you can actually use it on your own prem hybrid facilities or in GCP this is your data loss prevention this is a fantastic tool for those of you who are concerned about masking the data it should not be seen by others you can actually train it you've been possibly using it today we announced our metadata management cloud data catalog this is a capability that actually helps for you to discover your assets metadata wherever your data might be and it's a capability that you can integrate with other tools that are available outside it might be on Prem tools that you're using through the data metadata exchange that you can actually achieve some of these things these are all there for you to discover on top of it you can build some policy management systems access control systems that you can build you have an integrated identity access management that I am sure many of you have been using it already today that helps you identify create those policies types of personas who can have what type of data access where and when and how long these are all things that you can do today we have a very aggressive roadmap to really drive what we can help you with there's a lot of things that we can provide you around the principles of who can access to what data many of you have come to us and said hey I'm dealing with this thing and here are my PII data I don't want anyone in my organization to have access but I want my analysts to be able to do it but just mask that data because it has PII data I don't want to lose that side of that but only those people who have access can actually control it so we're providing those capabilities and there's of course their table levels there's of course the role levels there's a lot of things in here there's retention policies that you wanna run or deletion policies that you want to run those are all part of our roadmaps that we're talking about in here and we will be able to really provide you administrative tools that helps you give gives you knobs that you can actually provide solutions within your organization's because you need those to be able to drive your policies within your organization around your business just as it's best for you

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *