the stampede to cloud and massive vc investments has led to the emergence of a new generation of object store based data lakes and with them two important trends actually three important trends first a new category that combines data lakes and data warehouses aka the lake house is emerged as a leading contender to be the data platform of the future and this novelty touts the ability to address data engineering data science and data warehouse workloads on a single shared data platform the other major trend we've seen is query engines and broader data fabric virtualization platforms have
embraced next-gen data lakes as platforms for sql-centric business intelligence workloads reducing or you know some would even claim eliminating the need for separate data warehouses pretty bold however cloud data warehouses have added complementary technologies to bridge the gaps with lake houses and the third is many if not most customers are embracing so-called that are embracing the so-called data fabric or data mesh architectures they're looking at data lakes as a fundamental component of their strategies and they're trying to evolve them to be more capable hence the interest in lake house but at the same time they
they don't want to or can't abandon their data warehouse estates as such we see a battle royale is brewing between cloud data warehouses and cloud lake houses is it possible to do it all with one cloud-centric analytical data platform well we're going to find out my name is dave vellante and welcome to the data platforms power panel on the cube our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics data in today's session we'll discuss discuss trends emerging options and the trade-offs of
various approaches and we'll name names joining us today are sanjeev mohan who's the principal at sanjimo tony bears principal at db insights and doug henson is the vice pre vice president and principal analyst at constellation research guys welcome back to the cube great to see you again thank you thank you it's early june and we're gearing up two major conferences several database conferences but two in particular that we're very interested in snowflake summit and databricks data and ai summit doug let's let's start off with you and then tony and sanjeev if you could kindly weigh
in you know where did this all start doug the notion of of lake house and let's talk about what exactly we mean by lake house go ahead yeah well you you nailed it in your intro you know one platform to address bi data science data engineering uh fewer platforms less cost less complexity very compelling uh you can credit data bricks for coining the term lake house uh back in 2020 but it's really a much older idea you can go back to cloudera introducing their impala database in 2012 that was a database on top of hadoop
and indeed in that last decade middle by the middle of that last decade there were several sql on hadoop products uh open standards like apache drill and at the same time the database vendors were trying to respond to this interest in machine learning and the data science so they were adding sql extensions you know the likes of merida and vertica were adding sql extensions to support the data science but then later in that decade with the shift to cloud and object storage you saw the vendor shift to this whole cloud and object storage idea so
you have in the database camp uh snowflake introduced snowpark to try to address the data science needs they introduced that in 2020 and last year they announced support for for python you also had oracle sap jumped on this lake house idea last year supporting both the lake and warehouse a single vendor not necessarily quite single platform google very recently also jumped on the bandwagon uh and then you also mentioned um you know the this the sql engine uh camp the the dremeos the ahanas the starbursts uh really uh doing two things a fabric for distributed
access to many data sources but also very firmly planning that idea that you can just have the lake and we'll help you do the bi workloads on that and then of course the data lake camp with the data bricks and clouderas providing warehouse style deployments on top of their lake platforms okay thanks doug and i'd be remiss those of you who know me know that i typically write my own intros uh this time my colleagues fed me a lot of that material so thank you you guys make it easy but tony give us give us
your thoughts on on this uh this intro right well i very much agree with both of you um uh which may not make for the most you know the exciting television in terms of that it has been an evolution just like doug said i mean for instance you know just to give an example when teradata bought after data was initially seen as a platform you know as a hardware platform play now in the end it was basically it was all those aster functions that made a lot of sort of big data analytics you know accessible
to sql and so what i really see that you know just in a more simpler definition or a functional definition the data lake house is really an attempt by the data lake folks to make the data lake friendlier territory to the sequel folks and also to get into friendlier territory to all all the data stewards who are basically concerned about the sprawl and the lack of control and governance in the data lake um and so it's really kind of a continuing of an ongoing trend that being said there's no action without count without counteraction and
of course at the other end of this of the other end of the spectrum we also see a lot of the data warehouses starting to add things like in database machine learning so they're certainly not surrendering without a fight um again as doug was mentioning this has been part of a continual blending of platforms that we've seen over the years that we first saw in the hadoop years with sql on hadoop and they and and data warehouses starting to reach out to cloud storage um or and or should say the hdfs and then with the
cloud then going cloud native and therefore trying to um break the silos down even further yeah thank you and sanjeev you know data lakes when we first heard about them there was such a compelling name and then we realized all the problems associated with them so so pick it up from there what would you add to doug and tony i would say you know these are excellent points that doug and tony have uh brought to light the concept of lake house was going on uh to your point dave a long time ago long before the
term was invented for example in uber uber was trying to do a mix of hadoop and vertica because what they really needed were transactional capabilities that hadoop did not have so they weren't calling it the lake house they were using multiple technologies but now they're able to collapse it into a single data store that we call lake house data lakes excellent at batch processing large volumes of data but they don't have the real-time capabilities such as change data capture doing certain updates so this is why lake house has become so important because they give us
these transactional capabilities great okay so i'm interested you know it's just the name is great lake house the concept is powerful but i i get concerned that it's a lot of marketing hype behind it so i want to i want to examine that a bit deeper how mature is the concept of lake house are there practical examples that really exist in the real world that are driving business results for practitioners tony maybe you could kick that off well put this one um i think what's interesting is that both data lakes and data warehouses that each
had to extend themselves so it's not like you know the data i mean you know to to believe the data bricks height it's that this was just a natural extension of of the data lake in point of fact databricks had to go outside its core technology of spark to make the lake house possible um and it's a very similar type of thing on the part of the data warehouse folks in terms of that they've had to go beyond sql in the case of you know in the case of um data breaks okay there have been
a number of incremental you know you know improvements you know to you know to delta lake um uh you know basically you know to basically make the uh make the table format more performative for instance but the other thing that i think the most dramatic change in all that is in their sql engine and they had to essentially pretty much abandon spark sql uh because it really you know in itself spark sql is essentially stop gap solution and if they want to really address that crowd they had to totally reinvent sql or at least you
know their sql engine and so data bricks sql is not spark sql it is not sparked it's basically sql that is adapted to run in a spark environment but the underlying engine is c plus plus it's not you know it's not scalar or anything like that um so databricks had to you know take a major detour outside of its core platform to do this so to answer your question this is not mature because these are all basically kind of um even though the idea of blending platforms has been going on for well over a decade
i would say that the current iteration is still fairly you know immature and in the cloud i actually see basically a further uh would i i would i could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data there is no reason why if let's say you're dealing with say you know the same you know basically data targets say cloud storage you know cloud object storage that you might not apportion the tasks to different compute engines and so therefore you could have you know for
instance let's say you're google you could have big query you know before you know before basically the you know the types of uh you know you know so you know the analytics you know the sql analytics analytics that would be associated with the data warehouse and you could have bigquery ml that does some uh some in database you know machine learning but at the same time for another part of the query which might include you know which might involve let's say some deep learning just for example you might go out to let you know let's
say the sparks you know to the serverless spark service or to or or or the data proc there's no reason why google could not blend all those into into a coherent offering um that's basically all triggered through microservices and i say and i just gave google as an example if you could usually generalize that with all the other cloud or all the other third-party vendors so i think we're still very early in the game uh in terms of the maturity of data lake houses thanks tony so sanjeev is this is this all hype what are
your thoughts i it's not hype but uh completely agree it's not mature yet lake houses have still a lot of work to do so what i'm now starting to see is that the world is dividing into two camps on one hand there are people who don't want to deal with the operational aspects of vast amounts of data they are the ones who are going for bigquery red shift snowflakes synapse and so on because they want the platform to handle all the data modeling access control performance enhancements but there's a trade-off if you go with these
platforms then you are giving up on vendor neutrality on the other side are those who have engineering skills they want the independence in other words they don't want vendor lock-in they want to transform their data into any number of use cases especially data science machine learning use case what they want is agility via open file formats using any compute engine so why do i say lake houses are not mature well cloud data warehouses are they provide you an excellent user experience that is the main reason why snowflake took off if you have thousands of tables
it takes minutes to to get them started uploaded into your warehouse and start experimentation table formats are far more resonating with a community then file formats but once the cost goes up of cloud data warehouse then the organization start exploring lake houses but the problem is lake houses still need to do a lot of work on metadata apache hype was a fantastic first attempt at it even today apache hive is still very strong but it's all technical metadata and it has so many different restrictions that's why we see databricks is investing into something called unity
catalog hopefully we'll hear more about unity catalog at the end of the month but there's a second problem uh i just want to mention and that is lack of standards all these open source vendors they're running what i call ego projects you see on linkedin they're constantly you know battling with with each other but but the end user they don't end user doesn't care end user wants a problem to be solved they want to use printer premier spark from emr databricks ohana das flink athena but but the problem is that we don't have common standards
right oh thanks so doug i i worry sometimes i mean i look at the space that we we've we've already you know debated for years best of breed versus the full suite uh you see aws with whatever 12 different plus you know data stores and different apis and primitives you got oracle putting everything into its database it's actually done some interesting things with mysql heatwave so maybe there's proof points there but snowflake really good at data warehouse simplifying data warehouse databricks really good at making lake houses actually more functional can one platform do it all
well uh in a word i can't be best of breed at all things i think the upshot of and a cogent analysis from sanjiv there the the database the folk the vendors coming out of the database tradition they excel at the sql they're extending it into data science but when it comes to unstructured data data science mlai often a compromise the data lake crowd the the data bricks and such they've struggled to completely displace the data warehouse when it really gets to the tough slas they acknowledge that there's still a role for the warehouse maybe
you can size down the warehouse and offload some of the the bi workloads uh and maybe in some of these sql engines uh good for ad hoc minimize data movement but really when you get to the deep service level a requirement the high concurrency the high query workloads you end up creating something that's warehouse like where do you guys think this market is headed you know what's going to take hold which projects are going to fade away you got some things in apache projects like hootie iceberg where do they fit sanji do you have any
thoughts on that so thank you dave um so i i feel that table formats are starting to mature there is a lot of work that's being done we will not have a single uh product a single platform we'll have a mixture so i see a lot of apache iceberg in the news apache iceberg is really innovating their focus is on a table format but then delta and apache hoodie are doing a lot of deep engineering work for example how do you handle high concurrency when there are multiple rights going on uh do you worship your
parker files or or how do you uh do your upsets uh basically so so different focus at the end of the day the end user will will decide what is the right platform but we are going to have multiple uh formats uh living with us for a long time doug is iceberg in your view something that's going to address some of those those gaps in standards that sanjeev was talking about earlier yeah delta leg hoodie iceberg they all addressed this need for consistency and scalability uh delta lake open technically but open for access i don't
hear about delta lakes in any worlds but databricks hearing a lot of buzz about apache iceberg end users want an open performance standard and most recently google embraced iceberg for its recent uh a big lake their uh stab at having uh supporting both lakes and warehouses on one one conjoined platform and tony of course you remember the early days of the sort of big data movement you had uh map r was the most closed you had hortonworks the most open you had cloudera in between there was always this kind of you know contest as to
who's the most open does that does that matter are we gonna see a repeat of that here i think it's spheres of influence i think and and doug very much was kind of referring to this i would call it kind of like the the mongodb syndrome which is that you have a and i'm talking about mongodb before they change their license open source project but very much associated with mongodb which basically you know control you know which you know pretty much control the contra you know most the contributions make the decisions i think databricks has
the same you know ironclad hold on on delta lake but still the market is pretty much associated delta lake as the data bricks open source project i see iceberg i mean you know iceberg is probably further advanced than hootie in terms of mind share um and so what i see this breaking down to is essentially the you know you know basically the data bricks open source versus the you know the community you know the everything else open source the community open source so i see it's a very similar type of breakdown that i see repeating
itself here so you know by the way uh has a conference next week another another data platform is kind of not really relevant to this discussion totally but but in the sense it is because yeah there's a lot of discussion on earnings calls uh these last couple of weeks about consumption and who's exposed obviously people are concerned about about snowflakes consumption model is maybe less exposed because atlas is is prominent and in the portfolio blah blah blah but i wanted to bring up a little bit of controversy that we saw come out of the snowflake
earnings call where the evercore analyst asked frank slootman about discretionary spend and frank basically said look we are not discretionary you know we are deeply operationalized whereas he kind of pooh-poohed the the the lake house or the data lake etc saying oh yeah data scientists will pull files out and you know play with them that's that's really not our business what do you do do any of you have comments on that help us through that controversy who wants to take that one let's put it this way the sql folks are from venus and the data
scientists are from mars that means it really comes down to it sort of sort of that type of perception uh the fact is is that you know traditionally with analytics it was very sequel oriented and that basically the quants were kind of off in their corner on their spec you know where they're using sas or or where they're using teradata um i see what you know it's really a great leveler today which is that they're they're they're i mean basically python has become a very you know it's become arguably one of the most popular programming
languages uh depending on what month we're looking at the tile index um and of course obviously sql is as i tell the mongodb folks sql is not going away you have a large you know skills base out there um and and so uh basically i see this breaking down to essentially you're going to have each you know group that's going to have its own natural you know preferences for its home turf um and the fact that you know that uh that basically that that that uh that ball they you know let's say the python and
scalar folks are or of you know using databricks does not make them any less operational or mission critical than the sql folks anybody else want to chime in on that one yeah i totally agree with that um you know uh python support in in snowflake is very nascent uh with all of snowpark all of the things outside of sql they're very much relying on partners to make things possible make data science possible and it's been it's very early days i think the bottom line what we're going to see is each of these camps is going
to keep working on doing better at the thing that they don't do today or they're new to but they're not going to nail it they're not going to be best of breed on on both sides so the sql-centric companies and shops are going to do more data science on their database-centric platform the data science-driven companies might be doing more bi on their leagues with those vendors and the companies that have highly distributed data uh they're going to add fabrics to and maybe offload more of their bi onto those engines like drameo and starburst so i've
asked you this before but i'll ask you sanjeev so it's because snowflake and databricks are such great examples because you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the sort of lake territory snowflake has five billion dollars on the balance sheet and and i've i've asked you before i asked you again doesn't there has to be a semantic layer between these two worlds does snowflake go out and do m a and maybe buy an at scale or a data mirror or is
that just sort of a band-aid what are your thoughts on that sanji well i yeah some i i think semantic layer is uh is the metadata the business metadata is extremely important at the end of the day the business folks they'd rather go to the business metadata than have to figure out uh where like for example like let's say you know um i want to update somebody's email address and we have a lot of like you know overhead with data residency laws and all that i want my platform to give me the metadata business metadata
so i can write my business logic without having to worry about which database which location so so having that semantic layer is extremely important uh in fact navier we are taking it to the next level now we are saying that it's not just a semantic yet all my kpis all my calculations so how can i make those calculations independent of the compute engine independent of the bi tool and b and make them fungible so more disaggregation of the stack but it gives us more best of breed products that the customers have to worry about so
i want to ask you about this the stack you know the modern data stack if you will and we always talk about injecting machine intelligence ai into applications making them more data driven but when you look at the application development stack it's it's it's separate you know the database it tends to be separate from from the data the data and analytics stack do those two worlds have to come together in the modern data world and what what does that look like organizationally i think it is so organizationally even technically i think it is starting to
happen microservices architecture was the first attempt to bring the application and the data world together but they are fundamentally different things like for example if an application crashes that's horrible but kubernetes will will self-heal and it'll bring the the application back up but if a database crashes and corrupts your data we have a huge problem so so that's why i they have traditionally been two different stacks they are starting to come together uh especially with data ops for instance versioning of of the way we write business logic you know it used to be our business
logic was highly embedded into our database of choice but now we are disaggregating that using github ci cd the whole devops tool chain so so data is catching up to the way applications are we also have databases that trans analytical databases that's a little bit of what the story is with mongodb next week with uh adding more analytical capabilities but i think companies that talk about that are always careful to couch it as operational uh analytics not the warehouse level workloads so we're making progress but i think uh there's there's always going to be um
or there will long be a separate analytical data platform and um uh yeah until data mesh takes over now opening that kind of worms so well but wait wait i know it's out of scope here but wouldn't data mesh say hey do you take your best of breed to doug's earlier point you can't be best to breed at everything wouldn't data mesh advocate do like lake data lakes do your data lake thing data warehouse do your data link then you're just a node in the mesh now you need separate data stores and you need separate
you know teams but well i i think that i mean put it this way um data mesh itself is a logical view of the world it's not you know the data mesh is not necessarily we're on or on the lake or on the warehouse i think and for me the you know the big the fear there is more in terms of you know the silence of governance that could happen and the silent views of the world how we read and that's why and i want to go back to saying what sanji said which is that
it's going to be raising the importance of the semantic layer now the snowflake that races that opens a couple of pandora's boxes here which is one does snowflake deer go into that space or do they risk basically alienating basically you know their their partner ecosystem which is a key part of their whole appeal which is best to breathe they're kind of the same situation that informatica was during the early 2000s when informatica briefly flirted with analytic applications and realized that was not a good idea need to re-double down on on their core which is data
integration um the other thing though that raises the importance of this is where that you know the best of breed comes in is the data fabric i my contention is that if you and whether you use you know employee data mesh practice or not if you do employ a data mesh you need a data fabric you deploy data fabric you don't need that so you know you do practice data mesh but data fabric at its core and admittedly it's a it's a it's a it's it's a category that's still very poorly defined and evolving but
at its core we're talking about a common metadata back plane something that we used to talk about with master data management this is this would be something um that would be more um without i would say basically you know uh mutable that would be more you know that would be more um evolving you know basically using let's say machine learning to kind of uh so that we don't have to predefine rules or predefine what the world looks like um but so i think in the long run what this really means is that whichever way we
implement on whichever physical platform we implement we need to all be speaking the same metadata language and i think at the end of the day because whether it's a lake warehouse or lake house we need that we need common metadata doug can i come back to something you pointed out that those talking about bringing analytic and transaction databases together you talked about operationalizing those and the caution there educate me on my sql heat wave i was surprised when oracle put so much effort in that and you may or may not be familiar with it but
but a lot of folks have talked about that now it's got nowhere in the market that no market share but a lot of you you've seen these benchmarks from oracle how real is that bringing together those two worlds and eliminating etls yeah i have to defer on that one that's my colleague holder mueller he wrote that he wrote the report on that he's way deep on it and i have uh i've not gone okay i i just i wonder if that is something that's you know how real that is or if it's just oracle marketing
anybody have it's essentially oracle doing what i mean there's kind of a parallel with what google is doing with alloy db it's an operational database that will have some embedded analytics um and it's also something which um you know i expect to start seeing with mongodb uh and i think basically you know doug and sanji were kind of you know referring to this before about basically kind of like the operational analytics you know that are basically you know that are basically embedded within an operational database the idea here is that the last thing you want
to do with an operational database is slow it down so you're not going to be doing very complex deep learning or anything like that but you might be doing things like you know classification you might be doing some predictives um in other words like you know we've you know we've just included a transaction with this customer but was it less than what we were expecting what does that mean in terms of is this customer likely to turn i think we're going to be seeing a lot of that and i think that's what a lot of
what my sequel heat wave is all about whether oracle has you know any presence in the market now it's still a pretty new announcement but the other thing that's going that that kind of goes against oracle that they had to battle against is that even though they own my sequel and run the open source project everybody else it you know in terms of the actual commercial implementation it's associated with everybody else and the popular perception has been that my sequel has been basically kind of like a side light for oracle and so it's on oracle's
shoulders to prove that they're damn serious about it yeah there's no coincidence that maria db was launched the day that oracle acquired sun uh sanjeev i wonder if we could come back to a topic that we we discussed earlier which is this notion of consumption uh obviously wall street's very concerned about it snowflake dropped prices last week i've always felt like hey the consumption model is the right model i can dial it down in when i need to of course the the street freaks out what are your thoughts on just pricing the consumption model what's
the right model for for companies for customers consumption model is here to stay what i would like to see and i think it is an ideal situation and actually plays into the lake house concept is that i i have my data in some open format maybe it's par k or csv or json uh avro and i can bring whatever engine is the best engine for my workloads bring it on pay for consumption and then shut it down and by the way that could be cloudera you know we don't talk about cloudera very much but you
know it could be uh one business unit wants to use athena another business unit wants to use uh you know some other uh you know trino let's uh dremeo so every business unit is working on the same data set see that's critical but that data set is maybe in their vpc and they bring any uh compute engine you pay for the use shut it down that then you're getting value and you're only paying for consumption uh it's not like you know i left uh a cluster running by mistake uh you know so there have to
be guardrails phenom this the reason phenox is so big is because it's very easy for me to run is a cartesian joint in the cloud and get a ten thousand dollar bill well the snowflake has been a sort of a victim of its own success in some ways they made it so easy to spin up uh single note instances multi-node instances and back in the day when compute was uh you know scarce and costly those database engines optimized every last bit so they could get as much workload as possible out of every instance today it's
really easy to spin up a new node a new multi-node cluster so that freedom has meant many more nodes that aren't necessarily getting that utilization so snowflake has been doing a lot to add reporting monitoring dashboards around the utilization of all the nodes and multi-node instances that have spun up and meanwhile we're seeing some of the uh traditional on-prem databases that are moving into the cloud trying to offer that freedom and i think they're going to have that same discovery that the the cost surprises are going to follow as they make it easy to spin
up new instances yeah a lot of money went into the this this market over the last decade separating compute from storage moving to the cloud i'm glad you mentioned cloudera sanjeev because they got it all started you know that kind of big data movement uh we don't talk about them that much i i sometimes i wonder if it's because when when they merged hortonworks and cloudera they dead-ended both platforms but then they did invest in in a more modern platform but what's the future of of cloudera what are you seeing out there cloudera has a
has a good product i i have to say the problem in our space is that there are way too many companies there's way too much noise we are expecting the end users to parse it out uh you know or be expecting analyst firms to uh boil it down so so the i think marketing becomes a big problem uh as far as technology is concerned i think cloudera did turn themselves around and uh tony i know you you talk to them quite frequently i i think they have a quite a comprehensive uh offering uh for a
long time actually you know the bot could they've created kudu so they got operational they have hadoop they have uh an operational data warehouse they migrated to the cloud they are in hybrid multi-cloud uh environment a lot of uh data cloud data warehouses are not hybrid they're only in the cloud right right i think i think what cloudier has done the most has been most successful has been in the transition to the cloud um and the fact that they're giving their customers more on-ramps to it more hybrid on-ramps so i give them a lot of
credit there they're also been trying to position themselves as being the most price friendly in terms of that we will put more you know guardrails and governors on it i mean part of that could be part of that could be spin but on the other hand they don't have the same uh vested interest in compute cycles as say aws would have you know you know with emr um that being said yes you know cloudera doesn't you know i think it's most powerful appeals of that is it almost sounds in a way um i don't want
to you know you know cast them as a legacy system but the fact is they do have a huge landed legacy on prem and a huge and and still significant potential to land and expand that you know to you know to the cloud um that being said you know even though cloud era is multifunction there are i think it certainly has its strengths and weaknesses and the fact is that yes color clutter has it you know an operational database uh or you know that an operational data store with it kind of like the outgrowth of
age space but it's still clutter is still based you know primarily known for the deep analytics you know you know the the operational database nobody's going to buy cloud era or cloudy or data platform strictly for the operational database they may use it as an as an add-on um just in the same way that a lot of you know customers have used let's say teradata to basically to to do you know to do some machine learning um or you know or any of the uh or in the other basic or let's say you know snowflake
to you know to parse through json so again it's not an indictment or anything like that but the fact is obviously they do have their strengths and their weaknesses i think their greatest opportunity is with their existing base because that base has a lot that had invested and vested and the fact is they do have a higher hybrid path that a lot of the others lack yeah and and of course uh being on the quarterly shot clock was not a good place to be under the microsoft or for cloud air and now they at least
can you know refactor the business accordingly i'm glad you mentioned hybrid too we saw we saw snowflake last month did a deal with dell where by non-native snowflake data could access on-prem uh object store from dell they announced a similar thing with pure storage what do you guys make of that is that just is that how significant will that be will customers actually do that i think they're using either materialized views or or extended tape there are data rate residency requirements there are there are desires to uh to have uh you know these platforms in
uh your own data center and uh finally they capitulated i mean frank slootman is uh famous for saying to be very focused and uh earlier not many months ago they called the uh going on-prem uh as as a distraction but clearly there's enough demand and certainly government uh uh contracts uh any uh any company that has data residency requirements uh it's a real need so they finally addressed it yeah i'll bet i'll bet dollars to donuts there was a ebc session and the big some big customer said if you don't do this we ain't doing
business with you and that was like okay we'll do it you know so david i have to say you know earlier on you had brought this point how frank scootman was poo-pooing data science workloads on your show about a year or so ago he said we are never going to on-prem we burnt that brick that was on your show i think i just i remember i was i remember exactly the statement because it was interesting he said we're never going to do the half halfway house and i think what he meant is we're not going
to bring the snowflake architecture to to run on-prem because it defeats the the elasticity of the cloud so this was kind of a capitulation in a way but i think it still preserves his original intent sort of i don't know the point the point here is that every vendor will poo-poo you know whatever they don't have uh until they do have it yeah yes and then it'll be like oh we are all in you know we've always been doing this we have always supported this and this is and now we are doing it better than
others it was the same type of shock wave that we got that we felt basically uh when when aws at the last moment at one of their reinvents oh by the way we're going to introduce outposts and you know the analyst group is typically pre-briefed about a week or two ahead under nda and that was not part of it and when and when they dropped they just casually dropped that in the analyst session it's like you could have you could have heard the sound of lots of analysts changing their diapers at that point i remember
that and the props to andy jassy who who once many times actually told us never say never when it comes to aws so guys look we got to i know we got to run we got some hard stops maybe you could each give us your final thoughts doug start us off and then sure well you know we've got uh the snowflake summit coming up i'll be looking for uh customers that are really doing data science that are really employing python through snowflake through snow park and then a couple weeks later we've got data bricks with
their data and ai summit in san francisco i'll be looking for customers that are really doing considerable bi workloads last year i did a market overview of this analytical data platform space 14 vendors eight of them uh claimed to support uh lake house both sides of the camp uh databricks uh customer had uh 32 their their top customer that they could cite was unnamed it had 32 concurrent users doing 15 000 uh queries per hour that's good but it's not up to the most demanding bi sql workloads and uh they acknowledged that and said they
need to keep working that snowflake asked for their biggest data science customer they cited kabula 400 terabytes 8 500 users 400 000 data engineering jobs per day i took the data engineering job to be probably sql centric etl style transformation work so i want to see the real use of the python how how much snowflake has grown as a snow park has grown as a way to support data science great tony yeah actually of all things and certainly you know i'll still be looking for similar things to what doug is saying but i think sort
of like kind of kind of out of left field i'm interested to see what mongodb is going to start to say about operational analytics because i mean they're basically you know they're into this conquer the world strategy we can be all things to all people okay if that's the case what's you know what's going to be you know you know what's going to be the case with you know basically you know putting in some inline analytics what are you going to be doing with your query engine so that's actually kind of an interesting and it's
going to be the thing i'm going to be looking for next week great sanji yeah thank you so much so i i'll be at mongodb world snowflake and data breaks and uh very interested in seeing since tony brought up mongodb i see that even the databases are shifting tremendously they are you know addressing both uh the htac use case uh online uh transactional and analytical i'm also seeing that these databases started in let's say in case of mysql heatwave as as relational or in mongodb as document but now they've added graph they've added time series
they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional so yeah get gets back to our discussion of best of breed you know versus all-in-one and and you know it's likely mongo's path are part of their strategy of course is through developers they're very developer focused so we'll be looking for that and guys you know i'll be there as well i'm hoping that we maybe have some extra time on the cube so please stop by and uh we can maybe chat a little bit guys as always
fantastic thank you so much doug tony sanjiv and let's do this again it's been a pleasure all right and thank you for watching this is dave vellante for the cube and the excellent analyst we'll see you next time see you next [Music] you