[Music] all right let's get started first of all let we that's a blanket statement for all the presentations if you said that if you claim that I told you something I will deny it so so that's what it says we know product and product features can change we're not making any commitments here so um I'm run I'm Lon bit I'm the unity catalog product specialist I'm helping customers like you migrate and adopt Unity catalog and have been doing that since the Inception of unity catalog a little bit of History H we G Unity catalog EX
ly two years ago in data nii Summit it started as a data governance solution primarily around securing data and cataloges and tables and it was an optional component for data bricks it was in a kind of an add-on for a governance fast forward to today Unity catalog is enabled by default if you're a new customer you have to go out of your way to not use Unity catalog it was extended to machine machine learning models and and feature stores and others as well as general purpose files using volumes and it is used by nearly 100%
of the customers there are very few customers that haven't enabled Unity catalog at all the future state is a as I said Unity catalog will be the default for any new features and functionality data bricks will uh will offer you you guys and we expect 100% of you to use it for 100% of the use cases this is just a quick preview of everything that is on the pipeline either that will be announced in this conference or coming in the coming weeks that will be exclusive to Unity catalog and roughly you can divide this feat
Fe to data governance feature to machine learning governance features everything that has to do with llm and Vector search will be will be exclusive to Unity catalog and monitoring and intelligence so Lakehouse IQ and all the features that feed you back data that we collect from your accessing the data is all exclusive to Unity catalog this is to say you all of you have to really want to get on Unity catalog how how many of you are in the process of adopting Unity catalog raise of hand how how you how many of you are using
ucx okay by the time you're back in the office on Monday I want it to be 100% okay so hopefully we can get there okay so I think we proved that right here okay Unity catalog challenges is a really a real thing upgrades are complicated this is a chart of you adopting and remove the numbers it's divided by it's divided by industry we remove the labels but what it is the story is telling is adoption is going very gradually and fairly slowly upwards are very complicated there's perceived an actual Risk by that I mean that
you have no way to to tell your to tell your management what's going to break by adopting Unity catalog it requires a lot of planning it's a multi-team effort and finally it's expansive and expensive what we're seeing for large customers is that it takes up to 18 months and this is what we're are trying to address today when you do migration that's kind of the blueprint on how to do migration right it's very very rough you start by assessing inventor like you create an inventory of all the assets you're going to migrate you start by
migrating your workspace group to account groups you attach the metast store to enable the workspace with the unity catalog you migrate your external table these are the easier to migrate you migrate your SQL warehouses to access these external table then you start migrating the jobs the notebooks all your assets you migrate manage tables and finish by migrate all the remaining code right so this is kind of like how we would do it today ucx is a Unity catalog upgrade framework uh you can scan this slide here data Unity catalog is part of datab bricks lab
project I have here with me Serge Who is the technical lead for datab bricks lab so if you have any questions you can reach out to him it's our effort to expand the project that to expand the product with resources outside of the our core development team there are daily PRS so this this uh Unity catalog ucx is updated on a daily bases so if you file a bug if you requested a feature you you may see it you may see it uh addressed overnight we are do we're trying attempting to be on a weekly
release schedule so new features are introduced weekly this is the GitHub repository it comes up from the link and the functionality we're trying to address is assessment well we will dive into each one of them individually assessment is assessing your current Resources Group migration Cloud infrastructure so we will automate all the creation of the all the cloud infrastructure elements for you we will do table migration and and and finally code migrations so these are the task and this is kind of like the blueprint of how you would work with ucx ucx is accessed by datab
Brick CLI you start by installing it onto a workspace current and we will allow you to install more than one workspace at once but you install it on the onto the workspace you run an assessment workflow we will touch on that the assessment workflow essentially assesses your current state and what will have to be upgraded the next thing you do is group migration you migrate all your workspace group into the account group again automated by ucx next is table migration then code migration code migration touch notebooks touch jobs and does all that and we the
we are attempting to automate all that for you we create an inventory on your workspace that that that keeps all the elements specifically from the assessment and we provide you with bunch of dashboards to monitor the migration process this is a case study this is a real customer H it's a big Manufacturing in emia they have 400 workspaces the typical workspace is around 350 databases 100 tables per database just to give you some scale millions and millions of uh workspace objects why do you care that you have so many millions workspace objects because this workspace
objects will have to be migrated or at the very least the the access to this workspace objects will have to be reassigned to a group so needless to say it's a Monumental task to be done manually and they are now completed all their group migrations thousands of groups with with the tens of thousands of users and they are now migrating workspace by workspace we expect them to be done in three months uh without ucx the the timeline would have been for forever like that I I can think of how long before they can be done
we will dive now into to demo I will show you all the components we talked about before we go into demo maybe I will take a question real quick anybody all right let's dive into them the first thing you do is you install it okay so now we see a a word space that is not enabled with a Unity catalog this is our sandbox word space H the metast store has a bunch of tables and oh hold on doesn't Okay the workpace has a bunch of tables and that we will migrate I'm just showing you
one of these table is an external table and obviously it is pointed to an S3 bucket we will use the CLI to install so a lot of the operation is from the CLI okay the CLI is running on a a machine other not on the cluster it can you can run it on the cluster itself or on the workspace itself but here I'm running it from my laptop then so the first thing I'm going to do is I'm going to validate that I have connection to the to the workspace and I'm authenticated with the work
space and then I can get started the next thing we'll do is I will trigger the data bricks laabs install ucx I'm specifying a profile but if your default profile is pointing to your workspace you all set you don't have to specify a profile and I'll run the command it will download the [Music] it will download all the all ucx and the dependency and will deploy it onto the workpace you have to specify the database name that will be created and by the way we have now an option so by default you would need the
cluster to be able to access the internet to download packages but we have now an option where you this is not required where all the downloads will be happen will happen from the client from the the the CLI and the workpace doesn't have to have access to the internet okay once it's install the first thing we will do is we will do the assessment you run the assessment by running a workflow you can trigger it by running a CLI command or by default the installation will ask you if you want to run the assessment by
default we will run it as a we will run the workflow now this is this work workflow essentially scans all your assets and report on them it can take from anywhere from few minutes like obviously yeah just one uh it can takes anywhere from minutes to hours to even more than a day in case you have a lot of assets a lot of particular table in your HMS and um and when we will wait for it to be done and okay so yeah you had a question what can I zoom in not really but all
these commands are in the documentation so you don't have to take it from here okay but I'm just like showing you the um um okay sorry so the next thing I will do is I will once the the once the assessment workl has been completed we're going to examine the the dashboard now so as you can see like few dashboard has been created by the workflow we're trying to estimate cost and other thing but this this is the default uh dashboard it shows you your Readiness it shows you how many uh tables views storage locations
were identify we're concentrating on table migration did this demonstration so you don't see any of the other assets it give you some assessment summary it give you counts it give you like the lay of the land of your algor the next thing you will do is we will assign a meta store again it can be done from the UI or it can be done by a CLI command here we go to the account console and I show you that this particular workspace is not assigned to a meta store you can obviously do it manually here
or I will show you the command the command is assign meta store I'm specifying the workspace ID yeah it is too small sorry H I'm assigning it to the the workspace ID if you have more than one met you will specify the meta store ID you click the button and it does what it should do to assign The Meta store to you have to select an account and it will assign The Meta store to the workspace then then I will go back to the workspace and I will show you that this workspace was enabled with
the UC see so first thing we will go to the account console and you can see it here but I will go then to the workspace so you see metas store was specified so I know that this account is set with the this workspace is set with a metal store and I go to the workspace and I can see that this has a the the the system catalog that so that means that the workspace is using okay we're going to skip group migration but I will tell you in two words what happens here group migration
takes all the workspace groups and copy them to the copy them to the account okay and it then scan all the object within the workspace and report them at the groups that were created at the account level this is a vertiges uh verious task to do manually but very simple to do using the group migration there are not a lot of decision and it's very deterministic so it's fairly simple to do with ucx the next thing I will show you is table migration so the first thing uh we will create the first thing you have
to do is we have to create table mapping table mapping essentially takes each one of the HMS table that you have and create a pointer or a Target to where it will go in unity catalog by default all the tables will be mapped to a single catalog and the name of the table and the name of the database will be will remain the same you can uh then edit it you can edit it with the built-in editor which is not very recommended because it doesn't have any checks or anything or any validation or you can
export it to excel but when you import it back just make sure to convert it back to CSP I'm showing you here that I updated it I Chang few of the catalog I change all the cataloges name to business a meaningful name I Chang couple of the the target databases so not all the tables are going at into to the same place okay the next thing that will happen is we have to create a bunch of H Cloud assets for the migration not for the migration for UC to operate and we will demonstrate on AWS
but all the command other than one that I will specify are available for Azure as well H so we will Dive Right In so the first thing I will do is I will install the AWS client and and I will validate that I can make a connection connection I run the SSO or make sure I can make the connection to the AWS account okay okay so the first thing that we do H in AWS is we run principal PR perect access command essentially we scan all the roles and all the instance profiles and we create
a list of the instance profiles and their target and the the S3 prefixes that they have access to and we do it for all the locations we identified that have tables in them okay so we're showing the command run it scans all the roles and it's generate couple of csvs that one is the UC World access is all the UC enabl one and is AWS instance access essentially give you the role and give you the S3 prefix it has access to and uh will we can continue from there okay the next thing we do in
AWS is we create missing principle create means the command parameters allow you to override the default rle name and policy name it prompts the user when to sest it suggest the creation of this one role with access to these two locations and then when it created the role U it creates it properly and that's that's not an easy task it creates all the trust you will see it creates the trust relationship and I will show you it it cre I will show you the role that was created so the role that is created is created
with the proper trust relationship and with the and with the access to the locations so here's the policy so you see the access to these two locations and all the proper actions I yeah I don't know if you can see it's far and and we are creating the proper trust relationship to the account itself for self assuming and for to the to the data Bri you see account okay the next thing was and and this was this was AWS only but this happens this works on Azure as well Mig credential creates the credentials you need
in data braks and then it creates the credential we will show you so it just go ahead and create a credential so you see we created the credential we created this one role H you ined configuration it works luckily and the other thing it does in AWS it goes back to the role and set the trust relationship to include this Ro ID with this external ID which is the credential ID so this is a step like if an administrator has to do it it's very confusing and prone to mistakes the next thing we do is
we run migrate locations run migrate locations create the external locations needed for the for the for Unit C we see like two external locations were created and they are pointing into the S3 locations for the thing okay the next thing is we create catalog schema create catalog schemas create all the cataloges and schemas that are required for this migration and you can for each schema specify a location the location is the location for the manage tables or you can default it to the meta store location which is not recommended it's not best practice we we
do expect you to provide a manage location for each one of these schemas okay so here I'm specifying a manage location for each one of the schemas and the two schemas are created okay I will show you the they work created like I created one that is called client info and another one that is called lookup finally before we can do the upgrade it is creating an Uber principle so Uber principle is a service principle or instance profile in the case ofed WS that can have access to all the locations the reason we need it
is we need the the cluster that do the migration to have access to the Legacy data as well as to use data so that we call it Uber principle we create it we create this principle and we assign it in to the cluster policy of the cluster will be performing the migration and now we ready for table migration the table migration is somewhat anti clima H we run migrate tables which will run the workflow or we can run the workflow itself and and I will and and and now we are migrating but like like the
the the builtin migration tables and we will show you migrate migrate multiple types of tables it migrates dbfs root table by Deep cloning them it migrates dbff non Delta Tables by doing Crea table as to Delta and you see all these step each one is a different type of tables we're doing external Tables by syn we're migrating all the external table by syncing them and um and we have other options and you can augment it to add other options as well uh when the migration is done there's a dashboard that shows you the migration state
of each individual table okay we will take a look at the migrated table you will see that we selcted the table properties is flag with upgraded from H so a pointer to the hi metas where the table is upgraded from and they upgraded to like so that's how we bind the two together we will take a look at one of the views we migr views as well and what what I will show you is this view is pointing to a table obviously right so with the view we change the table reference to point from the
or original table to the migrated table and we can do it with view to view relationship nested relationship we're pretty smart about it and this is a hint in what's coming this is using the same mechanism that then will be used to upgrade your notebooks your SQL and so on and so forth okay finally and the I think the highlight is code migration okay so code migration will migrate the your legacy code to use compatible code eventually the as the first step we're doing what we call the linting meaning highlighting all the lines of code
in your legacy code or your current workspace code that has to be touched for a proper migration to UC see initially we start with migrating a local codes so you will have to use git or something to get all the code locally and then I read it inside your my editor so it shows me for each T it it shows me for each table what line of code and the why it is not comp compatible with UC and you can go in and fix it in future iteration we will try to fix it ourselves okay
few resources before we open for questions so Serge is running a session automating uni catalog migration with ucx building robust python application which will take you to behind the scene of how this application was developed the reason you may be interested in this talk is a if you want to develop any application inhouse and you want to bore any of the techniques we use in ucx ucx is open source you can download it and you can use a lot of the elements a lot of the testing elements a lot of the other elements in ucx
to your own application another reason is if you have a case that is not supported by ucx and you want to extend ucx to support your use case you can extend ucx using your own code all right few other resources this is where you can get USX you can uh go to the data briak labs and uh GitHub and go to ucx you can search do documentation it is included in documentation La few other talks you may want to attend if you're interested in UC is we have UC will be big in Keynotes so make
sure to catch that attribute based Access Control we are introducing attribute based access control so there will be a talk about that there's a talk about Federation manage table what's new in UC see is another talk and we have lot of customer testimonial if you want to see how other customers are using UC thank you all for coming I will stay here a little longer you can reach to me and ask any question thank you very much [Music]