VIJAY ASKI: Alright. We'll get started. Good morning, everybody.
Thank you for making it bright and early. Just a quick show of hands, how many of us are here using GenAI in their organization? Okay, that's almost all of you guys.
How many of you guys feel you're getting the maximum ROI you believe you can get out of AI? That's not many hands. How many of you feel that you're unsure if you're getting all the value?
I guess that answers the question, the rest of you guys. Hopefully all of you guys should have something to take away from this talk. So, we'll get right in.
How do you customize AI to meet your organization's needs? So, we'll go over different customization methods that we have and the techniques to use and when and what's involved in each technique. And some real life examples from three specific domains.
We have a couple friends coming in from Bayer and Sight Machine, so you'll hear from them as well. So, what is model customization? It's a way to maximize the performance of your model for a specific use case.
Multiple techniques can be used independently or in conjunction with each other. So, the first thing that we've got to do is be clear on what is the problem that we are trying solve? What is it you're trying to optimize for?
Are we optimizing for cost, performance, domain adaptation, task adaptation? Do we understand the data and the scenario that we are trying to address? And then do we know the data set and the metrics that we are using to measure the performance model?
That clarity gets us on the right path, do what we need to do when we try to adapt the models. So, in terms of the techniques, prompt engineering is the most well-known technique where you're basically enhancing your output token quality with better input prompts. And some example of doing that are Zero-Shot and Few Shot, you do that using prompts.
And then RAG, where you ingest external data to ground the answers that the GenAI model gives. Next up is fine-turning where you, kind of, create a new model. That new model to customized to your task and the data that you have at hand.
A more advanced technique is RLHF where you, kind of, are doing fine-tuning but you're including human feedback in a remote model that helps guide the process. Distillation is a very interesting technique that we all should know. Distillation is a way of teaching us [inaudible] model tricks that the bigger model might now, but at a fraction of the cost.
And then custom model is where you're actually customizing the full weight of the model, not just the Delta-LoRA weights, where you're able to go back and have much bigger amount of data, much better control over your instruction tuning, your RLHF, and your safety layers and on so on and so forth. It's a very powerful technique. You may not need it all the time, but there are exceptional cases where it might be done.
So, in terms of the choice of techniques, the way to think about it is -- going back to understanding the scenario and the data that we have, it is that we are trying to bring in more context to the model, then you go towards RAG and prompt tuning. But if you want to change your model's behavior, then you start thinking about fine-tuning. And the more advanced technique, as we talked about, are RLHF and custom model where you're actually changing how the model behaves and responds, like, the instruction following and so on, so forth.
But more often than not, you end up using more than one technique. Like, for example, if you're doing a Q&A system where you actually have a lookup, then RAG is a good technique to use. If you're trying to basically do, let's say legal documentation where you want to, let's say, every county, country, region might have different legal documents and their rules can be different, how you pick it up and how you can manage that.
It's not easily done as in RAG. That's when you do fine-tuning. But let's say if you want to do something like a travel agent where you need both data which is current, if you don't have the latest flight booking, you can't book a flight, and you want to know the instruction following as well.
So, use a combination of those techniques. The choice of models that you use, again, so that's -- once you know the choice of techniques, then the other next -- sorry, next important thing is choice of the model that you use, starting with, let's say, embedding models. Embedding models are basically much smaller models that can transform high-dimensional data in real-world into a very dense numerical vector that your model can use.
That basically retains the semantic information and can help you do better search, better ranking and better similarity questions. And then obviously we all know the large models. The large models keep getting better, faster.
So, every six months or so, seemingly they are doubling in their capacity. So, they keep getting bigger because the scaling law applies and the model's getting increasingly larger so they'll get more better in capabilities. But at the same time, the small models are really picking up, like the five models that we have, and they are more cost-efficient and performant and low latency.
There are scenarios where you cannot use the bigger models but you still just as good or almost as good smaller models. And we have smaller models there. And then real-world examples are not generally just text.
You have audio, you have video, you have voice. You want to be able to deal with those, so you have multimodal models. And then more recently we have o1 coming up, or being announced last month, we have a very, very powerful reasoning model that we have shipped, GPT-o1.
This basically has, like, PhD level thinking almost and it can ace, for example, it's doing incredibly well in all the reasoning tasks that event GPT-4o before was not as good. But the thing is, here there might be some more latency because this model can think before it responds. So, it's not the best for low latency scenarios, but it's extremely good for reasoning scenarios.
And then you have your domain-specific models that you want which are customized for domains. If those are available, then you want to use those, for instance, with domains. Because no matter how good the models gets, there is value in domain expertise and then data that you have in your domain that the model may not know the base model.
So, that's the domain-specific models. So, it does -- if you want to find all these models, you come to Azure Catalog and you find all of these things. Probably the only place where you have all these categories and the big models, small models, the domain specific, the reasoning specific.
All of those things are available and it's pretty safe to use in the sense that we have absolute safety concentration both in terms of privacy and all the production that you might want for your AI usage. So, let's get to fine-tuning, which is what we'll spend a little bit more time then compared to other techniques that we talked about. So, fine-tuning is a continuation of a model training process using domain-specific data to optimize the model for specific tasks.
It enables you to achieve either domain or task or both adaptation. And when you fine-tune, unlike prompt or RAG, it creates a brand new model when you do fine-tuning. And most importantly, what might not be very intuitive, what people might feel that, hey, fine-tuning is an extra step, they have to get in data and then train the model, but what can pay off is significantly if you have a higher volume of inferencing, large input tokens, output tokens.
So, you don't have to do that large volume of input and output tokens because the model is already adapted to a task or domain. So, your cost is significantly reduced on a fine-tuned model when inferencing which it may not if you use a base model which you have to prompt excessively for -- with the different information. So, in terms of scenarios, we talked about it.
Domain adaptation, task adaptation, less known factors. Let's say when you have -- some of you guys might be dealing with languages which are not as common and not as -- vocabularies not as well known, so those are the ones to fine-tune with. And if you have -- let's say you want to have a model which is well-tuned for your brand and your, specifically of handling customers, or specifically of creating, generating images, basically, that's a good way of using it.
And then, if you want to just improve the accuracy of it just by bringing in a little bit more data, you can actually improve the quality of the model that you have. And there are, as we go more deeper into the topic, there are various techniques. We've just covered some of those things.
For example, continuous fine-tuning is where you take an already fine-tuned model and you continually fine-tune. So, it's not a one-time process. If you believe that your data is incrementally being added on, the more things that can be picked up, then basically the continuous fine-tuning.
Function calling is very powerful. We'll talk about that. Function calling is basically where it enables the models to interact with external tools and APIs.
So, that enables you to have much deeper integration for application and especially if you're doing some Agentic workflows. It really helps because your model knows now how to interact with external tools and APIs. And if you combine with a feature called structured outputs, then you can have exact control over the parameters that the model generates, that the APIs and the tools might need.
So, combining these things and fine-tuning with function calling can get you pretty good accuracy and behavior in how you can use your models for Agentic systems and calling external applications. So, here's a good example of a paper that we wrote and published. Have used BLEU as a metric to evaluate a fine-tuning process.
So, we talk about extensively in this paper and here we use the metric to figure out what is the right model to choose? The model trade-off, big or small. Error analysis, okay, where is the model failing?
What part of data do we go back and enhance? What are the hyper-parameters that we've got to change and the data size, etc. ?
So, all of these things, we did it using evaluation techniques. So, it's very important when you're fine-tuning the model that you pay attention to how much time you spend on evaluation, like knowing what the data set that you evaluate on and then basically how you can (inaudible) your models. And so, similarly, you can basically know, hey, if you want to have similarity checks for your model, if you have your string checks to make sure the right string is there or not, the centimeters aligned, so you could basically use multiple of these criteria to, kind of, like fine-tune and evaluate your model and along the process.
So, we talked about distillation in the beginning. So, distillation, as we said, is a way of training a smaller student model to try and mimic and learn from a general-purpose teacher model, which is much bigger. Obviously advantage is cost and we made it easier now to do this on Azure Open AI and on open-source models as well where you actually log your GPT-4o data, also completion, and you look at the data, you evaluate the data, you filter the data, and potentially export the data to either the fine-tuning process or the evaluation process so we can go back and go and iterate.
And then you can finally fine-tune it in the data and then eval. This process allows you to create a much capable let's say GPT-4o mini that, kind of, mimics the behavior of GPT-4o, which is a much larger model. Or similarly, if you want to use it for your Lama 405b, then you can do that for Lama and then the smaller versions of Lama and open-source models.
So, this is a really important technique where you want to get high-level model performance but with lower cost. So, let me talk. Language models solves many of the tasks that we know.
Similarly, vision models enables to solve many of the vision tasks that we know. But in real world, the world is multimodal and you have a lot of rich data coming in, as we talked about. So, the multimodals are a very important part of this puzzle.
And then fine-tuning of the multimodal models then hence becomes important. We recently announced fine-tuning of GPT-4o Vision that will allow you to do multimodal fine-tuning. So, that's pretty powerful for you guys to use for your real-world examples and walk through one of those things.
So, in essence, what we're doing when we do multimodal fine-tuning is we're looking at your text encoder, image and audio encoder, and you get that into a single embedding system, like, space where you basically have all the embeddings in one vector and then basically use that to have semantic understanding of the data, all the modalities that you have. So, if you look at, let's say models, the big models and the capability of the models that we talked about where they're getting bigger, better, faster. Like, the recent model says that the own model basically scores at 96% in US MLE, the Medical Licensing Examinations.
So, those of you guys know it's pretty very, very good and compelling. That's a very high percentage mark and it's out of the box based model. And yet, I don't know how many of you folks have medical knowledge or can detect if something's wrong on this X-ray image, the model may not be able to do that well out of the box.
There is a right-sided pneumothorax in this X-ray that the base model cannot get. So, if you basically -- this is a demo where we actually -- SPEAKER 1: I'm going to begin by uploading the chest x-ray with GPT-4o. Leveraging 4o I can speak to the model and ask it questions.
Can you write the findings section of a radiology report? SPEAKER 2: The lungs appear clear and well-aerated. The heart and mediastinum are within normal limits.
No evidence of pulmonary edema, pneumothorax, or infiltrates. The bony structures are intact. SPEAKER 1: Do you see the right pneumothorax?
SPEAKER 2: I can't make it out. I do not see a right pneumothorax in this image. It is possible that there may be a small pneumothorax that is not visible on this image, but based on what I can see, I do not see evidence of a pneumothorax.
VIJAY ASKI: So, as you can see, it's extreme -- the multimodal models are extremely powerful. You can interact with prompt and image and voice all at the same time. And then, as the models get better, they'll get better at these things.
But here we are talking about fine-tuning. So, we actually did fine-tune the GPT-4 model, 4o model for better understanding of a multi-turn multitask scenarios and we found pretty good results as a result of that. So, and then there are a whole bunch of customized medical models that we've published in our catalog and you've got more domain-specific models that allow you to do all of the things that we talked about, like having customized models, embedding models, and so on, so forth.
So, we have a whole bunch of embedding models, the segmentation models, and report generation models published in the Azure Catalog. And let's say the embedding fine-tuning that we did does pretty darn well for about nine or so medical tasks that it may not have done otherwise. So, fine-tuning with a model is a good example of how you get better performance out of your fine-tuning.
And then here is a good example of basically having segmentation models trying to be able to recognize your digital images and then getting better at segmenting and then detecting abnormalities on your medical images. So, you have then the report gen model that we talked about. That's basically allows you to do a report on your extra diagnostics that's also available in your model catalog.
And these models can be easily deployed and then you can use them and you can play them in the playground. You can test them and then make sure you're good and then you deploy them production and then have your same scenario that we talked about where you basically give the inputs and the observations that the patient has. The doctor can just put into the prompt and then basically get better results of what the model can find that it otherwise may not have found.
So, this is a good example of how you use fine-tuning for powerful models to find out or get better performance out of the large models that might otherwise be very capable but not as tuned to a particular task or domain. With that, I want to call a good friend, Dan, on stage. Dan comes from Bayer and they have some pretty good work that they have done on agronomist scenarios for-- SPEAKER 3: Imagine you go out to a farm and you're caught by a question the farmer has about a topic that you have not had much experience with.
You don't really have detailed information to respond. The farms asks, "What is the difference between Delaro and Delaro Complete? " You pause, and of course, with integrity, say, "I don't know the answer to that, Let me get back to you.
" You go to the truck and you have a decision to make. How do I get to that answer quickly and respond to the farmer? You go onto Google and you get Google to respond with a mix of links and you try to piece the answer together from all of the different links with the frustration of having to bounce from site to site to try to find the right answer to give to the farmer.
You try texting some experts, your TA, some other fellow FSRs to try to get someone to answer. Of course, then you're waiting for a response while the farmer is also waiting for their answer. DANIEL KURDYS: Alright.
My name is Dan Kurdys and I am here representing the E. L. Y.
team. So, E. L.
Y. stands for Expert Language Model For You and we are on a mission to use GenAI to improve access to apply agronomy expertise. These are longstanding challenges in agriculture that farmers don't have access to the expertise they need.
And unlocking those can impact global poverty as well as issues around hunger. So, today, I would like to invite you to build with us. VIJAY ASKI: So, what is the problem that you're trying to solve, Dan, here?
DANIEL KURDYS: Yeah, there was some jargon in there, and for those that may not be familiar with agriculture, I'll just state it really plainly. So, farmers need timely access to expertise and sometimes it takes too long because of the scarcity of experts. VIJAY ASKI: And tell me more about your journey.
How did you get on this journey and all about it? DANIEL KURDYS: Yeah, yeah. So, it's a great story and I'm happy to share on behalf of the team.
And our journey in customizing AI for agriculture began with the realization that while GPT-3. 5 was making impressive strides across domains, including general agronomy, it had significant gaps in knowledge regarding commercial agricultural projects as well as real-world agronomy out in the field, right? Not just academic principles.
And this specialized knowledge is often proprietary to major agricultural companies like Bayer, but also many others. And so, to address this, we conducted an experiment by integrating a RAG on top of GPT-3. 5 and our goal was to enhance the model's accuracy in answering questions about Bayer commercial products.
And so, you see the results on the left here really demonstrate it, but we had a 42% improvement on accuracy with our test set on Bayer product knowledge. And on the top, you can see an answer from GPT-3. 5 on one of our DEKALB sorghum products.
It's erroneous, it's incomplete, it's not specific. And our E. L.
Y. Answer at the bottom, which is very specific, giving the product details as well as being accurate and correct. And so, that's the type of return we saw from RAG.
And we also started to see signs as we launched to our frontline salespeople with ROI. So, we were seeing time savings and improved accuracy and that improves handling customer agronomy and product questions which intuitively, right, that leads to better experiences with our products. They get more models -- value from those products.
So, as new frontier models emerged, we developed ML flow methods to evaluate the marginal performance gains of different models while gathering user insights and performance criteria from them. And initially, new frontier LLMs were providing increased accuracy. However, there were starting to be some trade-offs.
As we expanded the RAG, it was taking more time to respond. And that was -- user feedback, they were getting frustrated with that. As well, as we were seeing that the cost wasn't really scalable when we went to globalize this solution.
And so, what we did is we went from there and shared that information with Microsoft. So, and I'll talk about the next steps here, but first I'll hand it over to you, Vijay. VIJAY ASKI: It's very fascinating to see you guys use all the techniques that we've just talked about.
So, tell me more. What do you find with RAG? What do you find with fine-tuning?
And how did they work in combination? DANIEL KURDYS: Yeah. So, let's get into the details here.
Like I said, that RAG piece, speeds were eroding, we didn't see the cost elements really playing out like we wanted to. We had 1,500 active users that are asking 200 to 1,200 questions a week. But we really wanted to globalize, and to do that, we had to resolve those.
And so, there was a pivotal meeting with Microsoft where we shared these challenges and introduced us to your 53 models. And we explored the potential of fine-tuning it to overcome our constraints. And I have to say, Sachi is listening virtually and he was really the champion for this in the team.
And he came to me and our modeling person, Gira here who's in the crowd, and we were skeptical at first. Like, "I don't really know about this. It might distract us from what we need to do, expanding the RAG and stuff.
" But he kept pressing, he kept pressing. And so, finally, we were like, "Oh, yeah, let's do it. Let's see what we learn.
" It's a good one to learn. And we narrowed the use case down to crop protection label understanding. And when we committed, then he's like, "Oh, yeah, and we're going to do it in four weeks.
" VIJAY ASKI: That's awesome. DANIEL KURDYS: So, then we launched into it and I'll give you our approach here, but I've got to say, do you know the first step in fine-tuning a model for agriculture? VIJAY ASKI: Tell me.
DANIEL KURDYS: You've got to teach it the difference between a bug in the code and bug in the field. VIJAY ASKI: Ah (laughter). DANIEL KURDYS: Sorry, I do dad jokes and I sound like Ray Romano, that's all I got, people.
So, you've got to hang with me here. So, Bayer contributed domain expertise to crop protection labels. We've developed real-world prompt templates.
So, things people actually ask in the field about crop protection labels as well as training sets. So, something separate, right? We implemented a document ingestion process utilizing a document intelligence parser, an AI search index and table summarization.
Quality assurance was ensured through a proprietary question and answer peer set developed from our company experts where they give us the right answers that the model should have. And then model adaptation was achieved through continual pre-training, supervised fine-tuning and direct preference optimization. From a valuation standpoint, we created a topic-based benchmark and use case-specific metrics assessing general language, reasoning, and math.
In crop protection, you need to do a lot of unit conversions. You have to do some math for application rates, and so these were all really important in terms of the model capabilities. And then our commitment to responsible AI.
We employed advanced methods to ensure our AI systems are ethical, fair, and safe. And so, I'm going to show you a demo here of the SLM in action. [ MUSIC ] That last question there, it's really interesting.
So, you could see faster response time, but LLMs in general are very agreeable. They have a high bias to being agreeable. And in using crop protection products, there are certain situations where you should not and cannot legally use those products.
An LLM -- LLMs like to tell you you can. That is a big problem. And we have fixed that.
You can see that in our SLM, we give answers and our model is -- it tell you when you can use it and when you can't. And so, we see here this project was able to be completed in just four weeks, which is an amazing accomplishment and a case of innovation with speed and I think just speaks to both our team's commitment to deliver products and ship them, as well as the Microsoft team that collaborated with us. And there's some remarkable outcomes that honestly, at the beginning, we didn't know what we would see.
It was a good experiment to run, but some of them we anticipated and some we did not. So, improved understanding and crop protection labels, proficiency in Ag jargon. Yeah, there's a lot of funny Ag words, right?
We're learning about a lot of domain lexicon this week. There's reduced latency, edge access and low or no connectivity uses, ability to fine-tune versus RAG is our only level for performance, and now we can get into harder, more difficult questions, which is really exciting. And upgrade from, as you said, from college kid to PhD.
But what we really want to get to is that 30-year veteran in the field capability, right? So, we're moving that way. Cost-effective solutions for unlocking new use cases and geographic expansion.
When we go to India, we were talking, you have some experience there that we really need to get under a dollar or Euro a query to move to India to really make it work for farmers there. And so, we see a pathway now to work on that. An ability to train models faster and more refresh due to lower cost.
VIJAY ASKI: Talking about cost, it's super fascinating that you got SLMs to perform for you scenarios. Tell me more about the accuracy that you found and the response time and the latency. And what was the experience there with SLM?
DANIEL KURDYS: Yeah. So, our current LLM is GPT-4o Mini with our RAG, right? And it averages around 30 seconds for response time, which our users are frustrated with and we plan to deploy this as an API to integrate into other front ends.
And a lot of them have thresholds for latency where they cut it off at 30 seconds, so it was a big problem, right? And now it's exciting. This SLM averages a five-second response time, so a major improvement in latency (applause).
And finally, the results on accuracy, what everybody wants to see, right? So, I mean, just to do the short of it is we use real world test sets, right? These are questions that our experts get asked all the time to evaluate this model and we are able to show that we are 3x better where it counts.
I don't give partial credit, I'm not worried about that column. I'm worried about the real complete. And so we're 3x better where it counts versus frontier models.
And so, by customizing AI for agriculture industry, we've created an improved out of the box building block for GenAI agents and chat bots. And this tool gives other companies a head start in building a custom GenAI solution, whatever they want to do, right? Instead of just starting with an LLM or a RAG.
And I end by making to call to you again. GenAI is for agriculture, but GenAI is for your domains too. And so, let's go build something together.
VIJAY ASKI: Awesome. [ APPLAUSE ] Thank you, Dan. That was super powerful.
Everybody else who has questions, please come to the mic. Least we'll have time five minutes at the end of it. Next up, I want to invite our friend Kurt to talk about the great work they're doing in factory floor at Sight Machine.
KURT DEMAAGD: Thank you, Vijay. Thank you. So, my name is Kurt Demaagd from Sight Machine, and we are a manufacturing analytics platform.
And our goal is to take the wide variety, almost chaotic variety of data sources that you get from the manufacturing plant floor and turn those into AI-ready, analytics-ready data models. VIJAY ASKI: It's an interesting name, Sight Machine. Tell me more about Sight Machine.
KURT DEMAAGD: So, Sight Machine, we actually started off doing computer vision as our origin. But over time, as we were collecting data that goes along with that computer vision, customers were asking us to start merging in that data so that they could start to do more analytics on top of just that basic vision. And so, that got us to this point where we want to get that AI-ready data and not just a single application.
VIJAY ASKI: So, what does it mean to be AI-ready? How is it different from source data? KURT DEMAAGD: Yeah.
So, for any of you who may have a manufacturing background, you may know these plant floor systems. You get a lot of chaos in terms of what the data actually looks like. The format can be wrong, where it's these long data formats versus your nice wide records sets that you traditionally want for nice panel data, for data science.
If you're trying to do streaming analysis, you've got data coming in out of order, you've got real-time data streams that you've got to blend in with samples of quality tasks and builds of material, all that sort of stuff. And so, it turns into this really challenging problem. And one of the biggest things we're trying to solve with this factory namespace manager is you want to make sure that similar fields are always named the same.
So, if you want to do some sort of analysis on, for example, what is the temperature at the input of your annealing oven or something like that, you want your model to be able to line that up between all your different machines so you can analyze it. But if you've got different names from different machines for the same measurement, it's hard to to that analysis. VIJAY ASKI: And how do you build the definitions for the data models?
KURT DEMAAGD: And that is, you know, the big problem to actually build these data dictionaries that we have to do. Because you do have so many data sources, and these data sources are coming from different machine vendors, different machine line builders, data sources from different software packages, etc. And typically, data governance, data sharing, was not a critical factor when these systems were considered.
And even what we can see from this system is you'll have these variables which you try to look at that and you just can't make any sense. Like, I have my hall of shame of variable names up here on the screen. And a lot of these look more like your cat fell asleep on the keyboard than being some sort of variable that you could actually interpret.
And even the ones that look kind of readable, like that first one, Jetter_Temp_Bool_Axi2, actually, we talked to end users and they were, sort of, "What are you even talking about? I have no clue what this variable is. Why did your AI algorithm bubble that up?
" And so, we get these really crazy challenges here. And even just to decode these names, the other big quiz is -- oh, I guess it ended up bolded already. I hit the button too many times.
But can you guess which two of these variables are actually measuring exactly the same thing, just at two different facilities? And what that is is the second one, which starts with the DET. INVTR and the very last one, the CED_DUC and so on and so forth, those are actually the same measurement.
You would never guess that just by looking at them. And so, that's the sort of problem that we're trying to do when we're creating these standards. VIJAY ASKI: So, how do you decode these names?
Do you just look it up in the documents? KURT DEMAAGD: Yeah. So, I mean, you get lovely documents like my somewhat anonymized blurred version that you have up here.
Lots of engineering specifications that you have to look at. Now, as the old quip goes, that great thing about standards is there are so many to choose from. And that holds true on the manufacturing plant floor too where different generations of equipment have different standards.
Maybe you've got the standards document that was written at one point, but you can't find the document. Equipment has evolved over times, the document's no longer relevant, and so you spend a lot of time trying to go through and piece together exactly what these garbled messes of variable names look like. VIJAY ASKI: That sounds like a lot of manual work.
Really set for automation and AI. KURT DEMAAGD: It is. It is a perfect opportunity then -- oops, here we go.
It is a perfect opportunity then to try to apply some more automation and AI. Because when we look at how we did this in the past, you know, Sight Machine as a company, we're about 14-years-old now. Some of our first projects, we did this all very manually.
And by very manually, I mean we were sitting in Excel, you'd take one variable name from one machine, you'd go look at another machine for all of those, line it up. Go to the next variable name, line it up. Very slow, very painful process.
I still have nightmares. And so, it was painful, but over time, you know, you start to find the patterns, you get a little bit smarter. You write some Jupyter Notebooks with some regular expressions and stuff like that.
You get a little bit better, but still, it's a very manual process to try get this all lined up together. VIJAY ASKI: So, this sounds very familiar to everybody else who was in there trying to figure out whether my task is suited for AI. How do you determine this is an AI problem?
KURT DEMAAGD: Yeah. So, our AI breakthrough here was to say, hey, you know what when you're naming variables, this is actually kind of a language problem. You don't necessarily think about it right away, but your variable names, they have this grammar, you know.
Is it going to be machinename_sensorname_unitofmeasure or something like that? There's a grammar to it. There' special vocabulary for us, you know.
It's a very manufacturing-oriented vocabulary, but even then, the way you abbreviate things. Do you say temperature? Do you say, temp?
Do you just say T? Things like that. So, this whole grammar behind it, sort of, "Oh, wait, you know what?
This is actually a natural language processing problem. " And so, starting about four years ago, we starting doing old-school NLP. Feel a little old when I say that myself, but this is the string tokenization and basic pattern matching and engrams and all that, sort of, fun stuff.
So, we started doing some of that, started applying some slightly more modern approaches, doing BERT language models of our own. But our big breakthrough was actually when we said, "Hey, you know what, I wonder what happens if I just stuff this all into GPT-4o? See if anything works.
" And it took us a little bit of time to work with that, but it was surprisingly effective. But the big challenge there was the, sort of, the speed, the economics, even just the control of the model wasn't great. And so, that's when we were actually talking with some Microsoft folks and said, "Hey, you know what, you might actually want to think about doing a fine-tuned 53 model in order to make this work.
" VIJAY ASKI: Wait, like fine-tuning? I thought you were telling people that it's a colossal waste of time that only benefits GPT vendors. KURT DEMAAGD: Yeah.
I was definitely a naysayer on the fine-tuning category. But, you know, I will admit I have become a convert to the wonders of fine-tuning. And really, one of the challenges was, well, I didn't have your presentation two years ago when we first started playing with fine-tuning to really identify are we doing the right use cases?
Are we tackling the problem correctly? And the biggest challenge really was we were actually going in and doing what I'm going to just call a very traditional RAG use case. This was during the era when all language models were ask me anything chat bot omniscient engines.
And so, we, kind of, had the wrong use case, we were doing it wrong, frankly. Just taking what we would have put into a system prompt, stuffed it into a fine-tuning data set and, "Hey, it didn't learn any of my facts. What's going on?
" So, it was definitely the wrong use case for us. VIJAY ASKI: So, how did the data schema and the (inaudible) problem change your mind on it? KURT DEMAAGD: Yeah.
So, what really got interesting for us is when we started thinking about, "Hey, what we want to do is ship from this ask me anything bot to we're creating an application which we happen to want to plug AI language models into. " And so, that control that you get over the model, tuning how it responds, that was mind-blowing to me. That ability to get much greater accuracy than you get when you're doing prompt engineering and trying to control the output.
For anyone who's tried to create an application around a traditional LLM, you know, you get 99% correct responses, but sometimes it just goes off and does some jazz and, "What on earth are you doing? " It gives you the wrong answer. And that control is great.
We were able to partner with some advisors from Microsoft to learn a little bit more about we put the right information into the model to get it well-trained, so. VIJAY ASKI: That sounds fantastic. I can see it.
KURT DEMAAGD: Absolutely. So, what we have upon the screen here is our factory namespace manager. You actually can directly interact with the AI model if you are an advanced user, but this is also designed so that non-experts who don't want to do something in Python Notebook can interact.
But the core part of this is you need to give it some idea of the patterns that you want to train your variables off of. So, you're going to have your old variable names, optionalize some keywords and the new variable names. Oh, my video did not start as I was talking.
So, I will keep going here a second. And so, the basic idea is that you're going to give it those patterns so that you can learn about how you actually define this. And so, instead of having to give these very detailed specifications of this part of the variable name turns into this part, it's going to learn those for you from a small set of examples.
And so, there we go, pasting in an example. Old variable name, you'll see that very first name there. It's a little small, I apologize, but one of these rather difficult to read, everything smashed together.
We have some keywords here optionally if you did have that documented in your previous system so that you can give a little bit of insight if it's helpful. And then we give it a new name which is hopefully a little bit more readable than the old version. You don't have to use keywords.
There's a lot of flexibility here. So, in this case, we can also just say what is the old name? What is the new name?
And it'll be able to work with that for the patterns. And then what you do is you give it about 20-ish examples. Obviously, if you're doing entire enterprise-wide example, you'll need a few more examples.
But for something, let's say a traditional, typical production line, one or two production lines, you'll do that. And you can see in this example that there are different naming conventions even going on in here where there are different ways of indicating what is the line number, different ways of indicating what is the facility name, things like that. So, we've got a bit of variety going on in this particular example here.
And so, now we're going to see what we can do with it. You can optionally add some additional instructions for the language model, and this is really helpful because oftentimes we find that when you're getting an AI model to learn these patterns, it starts to pick up on all the ambiguities and cases that you yourself hadn't thought about in the first place. And so, sometimes you have to give it a little bit more guidance, a little bit more prompting how to get things to work.
So, in this case I just said T is for temperature as opposed to say T is for time or something like that. And then I've for six examples here about and it's going to go through and try to automatically rename those. And we give it just a quick second here and it will pop up the new naming scheme, gives you this output, which is those other names which were ugly and were from different schemas, and gives you a consistent way of naming those based on the patterns that it found before.
And then, from there, if we can help put it either directly to Sight Machine, our data platform, but you don't have to be using Sight Machine's data platform to use this software. We can also export it to a CSV file and you can bring it to your other systems. So, with that, I think just what's ahead for Sight Machine.
You know, we are expanding this. This is our first beta, so other folks who might be interested in working with us on this, we would be excited to partner with more organizations. Next big thing for the technical side is we're trying to add, sort of, confidence behind the different predictions.
No AI model is perfect, and so we want to make sure that we are signaling to our users where they should be putting their time, their effort, their focus, so that they can continue the tuning after we have done this renaming. And so, you know, if you are interested, there's a contact form there, or an email. And so, we certainly look forward to having the chance to talk with you more.
VIJAY ASKI: Awesome. Alright. Thank you, Kurt.
Fantastic. [ Applause ] So, as you saw, basically even when the models are base level thinking and getting really, really good, and they will keep getting better, there is no substitute for domain knowledge and expertise and customization for your scenario. So, you should be encouraged by all these examples in critical fields like healthcare, agriculture, and even for manufacturing.
That you can actually employ these large models confidently with customization to get real good ROI for your scenarios. And with that, we'll end the talk. And we have more related sessions coming up.
And if you have any other questions, we can take a question. We have a minute or two. Awesome.