in my last video I discussed phylogenetic trees phylogenetic trees are very useful and help us understand the evolutionary history of sequences or species so to construct trees we need a method that is easy and programmable there are many methods that can be used to construct trees there are two general approaches one of them is the cluster approach and the second is an objective-based method in this video I will discuss upgma which belongs to the cluster approach it stands for unweighted pair group method using the arithmetic mean instead of its long name we can use its acronym which is upgma it helps us construct trees with its roots as well we start with two sequences with the shortest distance between them these sequences are closely related to each other that means these are the last to diverge from each other and its node became their ancestor but how can we predict that these two sequences are closely related to each other for that we use pairwise sequence alignment the steps for constructing an evolutionary tree using the upgma method ER as follows first of all we take sequences align them against one another and name them from a to E in the Second Step compare sequence a to all other sequences by using pairwise sequence alignment count the mismatches and record them in the table complete the table by comparing all the sequences Now look for the lowest value to find the first group then by finding the arithmetic means step by step construct the tree let's build an evolutionary tree by going through all of these steps first label five sequences with the letters a b c d and e align them now against each other next perform pairwise sequence alignment to identify mismatches for example align the sequence a against the sequence B count the number of mismatches now there are five mismatches in this case record the number in The Matrix next count the number of mismatches against C the number of mismatches in this case is 3. repeat the process with d and e next find the number of mismatches in sequence B against all of them in the same way the same goes for the sequences C D and E once you have determined the number of mismatches enter them into the Matrix the sequence names are written on the top and left sides of the Matrix and all mismatches are recorded in the boxes the number of mismatches between two sequences indicates how far apart they have become over time once the table is finished choose the lowest value to determine the first group the lowest number indicates that the two intersecting sequences were the last to diverge and are the most closely related to one another once the first group has been chosen The Matrix must be modified we now remove C from both sides and treat a and c as a single entity then using the average determine the next group we have a and c b d and e on both sides fill out this Matrix with the average first we compute the average of ay c and b it can be found as a minus B plus C minus B divided by 2. minus b equals 5 and C minus b equals 7.
we get 12 by adding both of these values it yields 6 when divided by 2. insert the value into the table after that average a c and d it can be found as a minus D plus C minus D divided by 2. so a minus D equals 6 and C minus D equals 4.
we get 10 by adding both of these values when divided by 2 the answer is 5. insert the value into the table find the average of a c and e in the same way the first row is finished copy the last two rows once the Matrix is completed choose the lowest possible value which is 4. the second group is found here which is e and D consider e and D to be a single entity and take e off both sides modify the Matrix once more AC E D and B are now on both sides fill the Matrix once more with the average first we compute the average of AC and e d it can be calculated as AC minus D plus a-c minus E divided by 2 yielding 5.
next write the difference between a c and b which is 6. then compute the average of E D and B it can be calculated as D minus B plus e minus B divided by 2 to yield 5. 5 pick the lowest value again which is 5.
it means that AC is related to E D the final sequence is B the order of sequence B is determined by its values it has a 6 value against AC and a 5.