Hello and welcome to the Chest Journal [music] podcast where each month we host a discussion with the authors of important articles from the current issue of the journal adding context and commentary [music] to the challenges facing clinicians in the fields of pulmonary critical care and sleep medicine. [music] To introduce today's topic, here's your host, Dr Matt Schuba. [music] Welcome everyone to another edition of the Chess Journal podcast.

Uh my name is Matt Schuba. I'm a medical intensivist at the Cleveland Clinic in Cleveland, Ohio. Today I have the distinct pleasure to talk about uh an article that will be in the journal this month called artificial intelligence based echo cardiography and pulmonary arterial hypertension.

I'm joined here with the first author Dr Bettia Celeststein and uh the senior author Dr Francois Hadad. Um, so first of all, we'll just uh get to know each of our authors who are with us today. Thank you both for being here.

Uh, Beta, why don't you start us off? Uh, tell us a little bit more about yourself. >> Hi.

So, thank you for having us first. Uh, so I'm Beta. I'm MD.

I'm a cardiologist from France. I also have a master degree in bioatistics. Uh so I work at Stanford in research in a research study in general and in PH study with Dr Franad who is my mentor um and it's a pleasure to be here today.

So thank you for that. >> Thank you uh Franco. >> Yes.

No thank you for thank you for having us. It's it's a pleasure to be here. So I'm a I'm a cardiologist uh at Stanford and I direct the the biomarker core lab and imaging core lab uh mainly specialized in pulary hypertension and imaging and it's been really a passion I think in the last few years to really focus on early detection of disease and and finding ways to to really uh improve our diagnostic and minimize our our errors in diagnosis.

>> Fantastic. Thank you both for being here. Um I'm also a pulmonary hypertension right heart enthusiast so we're all kind of kindred spirits today.

So I'm really excited to get into this conversation. Uh to start uh Beta tell us what inspired this study question. So I think um there's like a lot of um things that we can have from the intelligence the artificial intelligence and uh it's really something that uh have improved like this last uh couple of years decade and that in medicine we have a lot of application that we can use in like clinical routine and that can help physician for early diagnosis but uh for like every every day of like routine and followup of a patient.

Uh so what bring us here is uh that this study is really something that we uh was um really excited to do because we wanted to see how we can apply artificial intelligence in the practice in detection of pH and how it works and how we can have uh enough confidence when we use this type of new technology. So that's how uh we find this uh really interesting and we really wanted to uh run this study with Franis. >> Fantastic.

Um so to just to get into some of the details of this study so to help the the audience understand uh Franuis there's a lot of technologies that fall under the umbrella of artificial intelligence. Um and in this study particular we're talking about deep learning. Could you describe in general what deep learning is for our audience?

Yeah know that that will be a pleasure. Maybe I want to add just maybe a small comment how the study got started. >> Absolutely.

>> I want to give credit also to so we were at the at the [clears throat] discussion forum which was re really really interesting with um actually Johnson and Johnson was there at discussion forum. It started actually with the with a general discussion of where is technology evol how is technology evolving and people wanted to address really one important problem was misdiagnosis. Uh we we we see that a lot of people with pulary hypertension unfortunately uh get diagnosed sometimes between 3 months to 18 month from their first symptom and diagnosis are often delayed.

So it it got people thinking can we improve it? uh using better automation or better analysis of of either of either electroc cardiography or echoc cardiography. And that that led to the um that really that really led to the momentum of the study and this came in through a forum with Johnson and Johnson that actually sponsored collaborative agreements to uh to really help help move these studies and from ultrasound to AI was actually it started with a hub and Johnson and Johnson and they moved really fast in the development with a very efficient system.

So that's how it got started uh the the enthusiasm about the collaboration. >> That's great. Thank you for giving us that background.

That's that's really helpful to to hear and maybe inspire other people how they might uh similarly uh stimulate studies. Um to move into the the question now about uh deep learning. Could you describe that for our audience?

>> You know absolutely. So so sometime some sometimes we often hear the words artificial intelligence, machine learning, deep learning and and and and they they sound alike sometimes but they refer to to different things. I think usually the the big umbrella term artificial intelligence that we use is mainly based on rule-based reasoning or applying rules to to solving a problem.

Machine learning is usually usually more a term that refers refers to techn to technology or algorithm that learn from data. I think at the most simple simple level like classical statistic like logistic regression that we use for diagnostic is an example of of machine learning. Other more complex treebased learning learning algorithms can also are examples of machine learning.

Uh deep learning is actually learns more complex data. Uh often it learns learns data from uh at at different layers from more sometimes a basic layer that it learns really the uh the most important features to the to the uh more complex layer and that's why they call it deep learning because often it involves several layers and and it it really learns complex information and it's able to uh to reach a conclusion. compared to to machine learning algorithms sometimes that conclusion is not always easily explainable uh but can be can be however verified.

So so so for this for this study it was mainly the application of of a deep learning algorithm that it learned how to analyze the images. Thank you for that overview that's really helpful. Um to delve more into the specifics uh particularly like the software that we're using for for this study uh it's called us2.

ai. AI. Um, Beta, how does this software work?

And could you give us some idea how long it might take to analyze images uh for echo compared to something like like let's say how long it would take a cardiologist to read it? >> Yeah, sure. So, it's a platform.

So, it's uh so you when you take the images, you you need to um upload the images on the platform. But I think like in a in a like a lab in a like a routine lab it will be much easier like to uh install this kind of platform because it will be like directly connected with the the images and the different like study of the patient. So once we have the the study on the platform the the it's the deep learning like it's it's really like super fast like to read all the images um for like this kind of uh deep learning they have like of course what we call like um the image view recognition so identification the different um uh so the different view identification And after the different measurement and this could be like really fast like maybe less than 5 minutes for like a really normal study and could be just a little bit more for like a like more images for and more complicated study and you have the different measurements it's like the EF like for but this this stud is more on the the right heart.

So you will have the fact you will have the tap cm mode. So the different measurements that we do in a routine lab and you will have a report and the report will give you the different measure but also a conclusion about the study of the patient and so it's really fast it's easy it's easy to use. Uh and after of course you can modify.

So that's the the the really important points is like if you not agree with the tracing of for example your area um you can modify it and uh you will have an adjustment of the measurement. >> Thank you for that overview. Franis anything to add?

>> Yeah the it's it's the the idea it's what's amazing about the ultrasound to just maybe historywise. I remember they they came to visit Stanford. Carolyn Lamb which is a cardiologist focusing in heart failure.

Her husband James Hair that did uh his masters also at Stanford. Uh they came in and actually it got started in I think 2017 but they they progressed really fast in the technology. uh some some really aspects to to add to so for example for for the right heart it was trained with with approximately 5,000 images and that's how it was able to develop its algorithm when we're talking about deep learning algorithm that's what how it was it it it it led to the development of the segmentation algorithms so one thing that's very important in the algorithm that we used there is segmentationbased algorithm meaning that when an image like beta I said are uploaded.

The system then can analyze them and segment every image. First recognize the view then really segment the image to divide it by the right ventricle. Usually it's based on area measurements or linear dimension or or point estimates.

And then a very important part of any algorithm that people develop is really qual internal quality control because you could segment any image but if you don't have any internal quality control then then users physicians user synographer will not trust it. So that's a very important part of of of any software any company that would develop uh needs to have. >> Yeah.

Thank you. As you both are talking about this, it's really uh interesting to think about how this could potentially make uh routine echo uh lab acquisition a lot easier and and particularly for people like us who are interested in pulmonary hypertension, make sure those right-sided measurements are being appropriately emphasized and captured. So now that we have some sort of stimulation for why we might want to learn more about this, uh Franis, could you tell us about the basic overview of the study design uh for this paper?

>> Yeah. No, no, absolutely. Uh with great pleasure.

One thing is when we when we did this study it wasn't connected on our server. So we we actually the the first question we had to do so the question we wanted to answer with beta was is ultrasound to an example of deep learning platform that's automated is it reliable for clinical use. So really that was the uh the main question we wanted to answer and we knew it like you you said we knew it was important because one we had a lot of misdiagnosis and what physicians sometimes we do estimate pulary hypertension and tricuspid regurgitation velocity in the European society of cardiology and ERS and the world symposium the algorithm for detection of pH really relies on peak triricuspid regurgitation velocity but a lot of other information aail available and the echoes that are often overlooked because we're busy.

Uh for example, uh the pulmonary flow can sometimes even though acquired can be overlooked and not measured. uh the shape of the right ventricle uh the right atrial area uh the shape of the septum which really can give us a lot of insight whether there's pressure overload of the heart are often overlooked and this is where I think a an an automated algorithm can shine where it could measure all automated and provide it to the uh uh to to the user. uh one thing very important you asked me about the study design.

So when it actually we spent several weeks to months developing the study and asking really really the right question we we we called it with beta implement right and and I think we have many question before we design is first of all do we want to do a prospective or retrospective study I think in any technology like a deep learning technology uh we sometimes want to jump on the clinical application right away but I think one important question is really the variability of the measure measures and this is where the study first started is can we really quantify the variability of the measures and and how to design it properly. I don't know if Beta wants to comment. We did something special in the study like a triangular way of looking at variability and I don't know if she wants to comment um on that we we didn't just leave it to to the deep learning and one reader.

So uh she probably has some very good insight because she plays a key role in designing this. >> Uh yes, thank you France. So yeah, I think the the most important question is what France really said previously like what how we can rely on this kind of new technology and how it can help but how we can rely on this.

And what we start how we started is uh really to make sure how we can um try to assess the how it's accurate and how we can assess the variability between readers because the the platform so the AI here is another reader. Um so [clears throat] the goal is to compare different reader. So the reader one, the reader two and the reader three for example that will be AI and so we did like an internal reading um with Frana and the lab and the whole team because of course there is a whole team behind that and we compare with the AAT platform and see how was how we were uh how we can assess the viability and how it can be accurate and this is how we design the study.

We use a case control uh cohort but also a referral cohort to um compare the different measurement and it's the way that we design this is really to try to measure how robust the measure are and if I if there's any bias in the measurement uh for the AI platform compared to like normal reading or clinical reading or sonographer reading. and and and and like in what the first question we we asked apart from designing and choosing the cohort. So there's a difference between like Beta said a case control cohort versus a referral cohort.

So here we didn't select the cases we took all comers. We wanted to make sure we had uh really variability of steady quality. Not only good quality studies really that went from poor to good.

We also wanted to make sure we had a variability in the severity of pulary hypertension in designing the study and this was really an important part in really selecting a correct population. A very other central aspect was making sure that it was really an external validation. Any technology it has to really be an external validation to test the technology.

And then we divided the questions really into three. Uh the first one was was there what's the measurement yield of the technology? How many times it will lead to a measure?

The second one was how what's the what's it really its variability? Does it have bias? What's its percent?

Is it does it precise or is it biased? We want it really to to answer this analytic variability question. And the third one that is very important is is is really what's the clinical difference?

Is it able to discriminate cases from control not only in the case control study but really in the referral cohort and that's a very important distinction. Absolutely. Thank you for taking us through that overview and it's interesting to hear about the sort of three-way design to assess the the quality and the precision of the studies.

So now that we understand uh what we what you set out to do uh bet what were the key takeaways from this study? So the I think the key takeaway is that it's a promising uh technology that we can use but of course that we we of we need to check as physician in the lab in the echolab we saw that for example for measurements that are like preliminary like trigger speed regurgitation um uh the right atrial area for the taps and mode that we had really good uh vi like really good um accurate measurement between the different reading. So the AI reading and the like clinical reading.

Uh so this kind of measurement we can rely on them of course with uh we've every time that we do like any assessment with control um for more like area measurement for the RV. So for the RV fact for example, it was we have like more like the bias was higher um because it's harder to measure um even for like a like a clinical and and someone really experimented uh in a clinical setting. So for this kind of measurement that needed more assessment from more clinical perspective, we can less rely on AI but they still help us and discriminate uh with a good with with a really good uh significant result the PH patient.

So yeah, the main things that we can say is like we can really have a hope of uh increasing how we use this technology, but we can already rely on them for measurements that are like more like like easier to to do in the clinical setting. for example the pig JV and the right atrial area of the tapsi maybe Frana you have more to say about that >> no no I think you you you said it well I think I think the first question is really which measurements it's actually we we when we start something we don't know where it takes us and this is this kind of study u so to to do that we develop new new software to really look at laboratory variability which we hope to to continue working with Echoso society to implement. Um but the the first key takeaway is some measurements are very reliable and have less variability not because of deep learning necessarily but even variability between uh two readers like me and Beta we had some variability and that's important to pause before we often think we don't have a lot of variability but even between us we do but the measurements like the peak TRV RV basil diameter taps and the right atrial area were more reproducible both by the bias and the precision uh measures.

Precision is more we could bias is more describe the central tendency and precision precision describe how how how how do we vary across apart from the central tend tendency. Uh so both both metrics are important for quality and those four had really good measures. uh the areas had good precision but there is really differences between readers and and working with we we started after the study an RV network group with many many colleagues across the world in Canada in the US and in Europe and and Harry Harry and David Auxboro u led to mag meia uh really what's important we came up with an acronym that we hope will be used in guidelines called viewpoint How do we measure it?

Depends on your viewpoint is how can we measure reliably if we don't specify what's the best view. What's the best phase that you need to choose? What's the best interface?

Do you really need to average measures? And how do you index or scale measures when you want to measure? So this study led to that actually also because it we we step back and we say how do we really make sure that all different readers standardize themselves.

Yeah, that's really interesting question and it it it's interesting how we started with this question specifically about artificial intelligence and uh and then extend it to how do we uh how do we standardize the process of microcardiography in the first place and I think we've had similar conversations about right heart caization and pulmonary hypertension. So really interesting and a lot of uh ongoing and future work that we can think about. Uh before we get to sort of future directions, um Francois, what words of caution do you have for readers in interpreting these results in terms of what limitations uh may be present?

>> No, I I I think I think the f the first thing to to to remember is when we design a study for testing a technology, having an external validation is is is essential. Uh the second second part that's really essential is when you want to test the clinical difference and the implication clinically it's important not to test it in a case control setting where we have people that are normal and patients with pulary hypertension it's always going to look good uh in in many cases I think it's important to to use it to test it in a referral based uh where we we have really a true case where people could have coorbidities or other other conditions that may infl influence the quality. uh a word of caution uh is is like everything I think it's important I think more when we rely on multiple metrics uh instead of just one we improve quality for both for clinical use like here if we don't have concordance between between different measures of size then we we have a caution of whether we need to rely on this measures or not and I think that's that's the most important thing it's really an ensemble method or global assessment that becomes very important for probability of disease.

>> Absolutely. Beta, anything to add? >> No, I think France really sum up it really well.

Um, it's exactly what we we need to to be careful about uh that we yeah, we have a case control study uh cohort here, but also a referral. That's that's why it was important to add this referral cohort. And yeah it's uh the limitation it's uh about also the quality of the images and we know that um when we have uh like patient in the hospital uh and in the routine clinical setup we have more like images that have less good quality that that we can have when we do study sometimes and that's a point that we have also in this study that we also um measure the viability with AI uh and the clinical reader uh among the different uh quality uh level of images.

I think what Beta just said is a very important point is that's something that's missing in a lot of echo reports um is we don't often um indicate the level of quality of the signal we're referring to. Now a lot of laboratory would will say is the peak TRV reliable or not or is this measure reliable or not. This is something that hasn't made it yet in routine reporting in in several of imaging lab.

>> Yeah, that's a great point and and I think that's something that we don't see enough unless you look at your own images it'll be very difficult to tell. Um, Betio, what do you think are the next, now that we've seen some some promise here, particularly for some of these longitudinal right ventricular function measures and TR velocity, what do you think are the next steps to assess the viability of this technology on a broader scale? >> Um, so I think the next step will be to um have more prospective cohort study.

Uh so for example like we have patient coming and we do uh we do the ultrasound and after we just use the the the the platform the deep learning and in parallel we do also of course the the the clinical reading that we do recently and after we compare once we finished um it's maybe the next step I think prospective is always another way to go when we want to to go further and maybe the other step for this kind of technology is to apply in different uh it's not just pH in different seity of pH of course but also try to like see in other uh diseases hard complex heart diseases uh I'm thinking about for example the pediatrics uh so it's another way to assess like imaging that could be difficult. So, uh yeah, I think there's a lot a lot of things to do. So, yeah, that's promising.

I think really promising. >> Great. Thank you, Francois.

Any last words? >> No, I I completely agree. I I I think the uh testing at the use cases is is the most important thing both for detection or prognosis or longitudinal monitoring.

I think that's that's going to be the uh one very important aspect. The other exciting or or interesting aspect is really combining different technologies like how does electroc cardiography can be combined really with with echo and combined with biomarkers and simple biomarker to really improve our yield uh in in practice. I think that's that's important.

Maybe one last comment when Beta was was saying also early on we tested an early version of of the software and then we we recently retested a newer version and there's an improvement for example analysis of the whole images can be much faster the like 30 seconds to to a minute often can be done at the same time of acquisition. So that part has has has improved and is is much more efficient than the earlier version which is great for for practice. Thank you.

Yes, this is a really exciting area of uh inquiry here and I think there's major implications for the way that we detect pulmonary hypertension and take care of these patients. Thanks to everyone for joining us today to discuss this article artificial intelligence based echo cardiography and pulmonary arterial hypertension with Dr Espatia Celestin and Dr. Franis Hadad.

Uh and we will see you all next time.

Artificial Intelligence-Based Echocardiography in Pulmonary Arterial Hypertension