welcome back so we're talking about data science this is an intro overview lecture series on kind of what is data science how can you use it what are the aspects and one thing I think is just really important to emphasize is the data science is not new we've been doing data science as humans for hundreds thousands of years collecting data modeling the world through that data and I think data science as a terminology means different things to different people so there's what I like to think of as data intensive science data intensive engineering or data-driven
inquiry and that's science that you do based on data ok like if I want to solve so astronomy is a great example of something that is data intensive science I think of the phrase data science this is an emerging scientific discipline which is motivated by data intensive science but it's really the science of how do you handle data collect clean store visualize and model with data so it's a little confusing you have data driven science and that motivates this whole new field of science and engineering called data science and I'm going to use them interchangeably
but that I just want to kind of deconflict those two terms early on and astronomy is a great example I want to walk you through just this very interesting history example that I loved about kind of Tycho Brahe and Kepler and Newton to give some idea of what data science looks like in a historical context so this is Tycho Brahe great Danish astronomer who collected the rich data set of the motion of planets and stars that was critical in Kepler's discovery of his his ellipses and planetary motion so to some extent Tycho Brahe was noticed
inconsistencies between the models of the time kind of the the old law of how the planets would move and he noticed inconsistencies with what he observed so you know he there was this predicted conjunction of planets and it didn't agree with the models to his satisfaction and so he realized this I think was as a teenager that he needed to collect rigorous clean data to store it in a systematic format and to to make a science out of the data collection of planets and stars and he dedicated his life to this he had an island
between Copenhagen and Sweden I don't know if you can see it here but this is his science island of hven where he collected all of this rich data and he guarded this data so this was his life's work and he knew how much value and it turns out Kepler didn't even really have full access to the data until Tico Bray passed away and so so both of the knew the value of the data and kind of moving moving the theory of planetary motion forward and this was a critical piece in Kepler's famous law of the
elliptic planets elliptic motion of planets fun fact about Tico very interesting character I encourage you to read more about him he lost the tip of his nose in a duel when he was a young man arguing about who was a better mathematician on his Science Island he had a pet moose which was apparently very fond of beer and would entertain his guests by drinking a tremendous amount of beer so Chico Bray is a really interesting guy you can only imagine what his personality would be like he had to you know he made his life's work
of very very very careful observations which changed the world forever through through those who came after and I think this also laid the foundation so this this data intensive inquiry laid the foundation for what Newton would go on to do so Kepler described these elliptic motion of the planets and Newton explained why the plants move in these ellipses and actually I think a great quote by Isaac Newton's Newton when he was explaining one of his theories he said that it was because of a preponderance of the evidence and that's another way of saying the data
supported his hypothesis or his theory and something else I think is really fascinating that we should think about as data scientists and modelers and machine learning people today and this is something I talk a lot about with my colleague Nathan Cutts is this idea of the difference between Kepler and Newton so Kepler built a model of how things work the way they work on these elliptical planets this is kind of I think of an attractor of how how the world and how the the solar system works in these elliptical orbits that theory was useful but
it wouldn't have allowed us to to develop the Apollo program and and put people on the moon okay and so what Newton did was somehow a generalization he distilled the abstract physical principle that gave rise to elliptic orbits but in a way that you could tell you what would happen if you left your elliptical orbit so what would happen if you left or pushed on the system out of the way that it always behaves and we've always observed it and his theory truly generalized F equals MA generalized in a way that allowed us to land
people on the moon which is which is really a huge achievement and so we talk about this a lot a lot of machine learning algorithms today most of them I would say do what Kepler did they describe the world as we observe it as the data describes it and it takes this epiphany this great leap to get a model that truly generalizes like what Newton did and so we should be aspiring to make our algorithms go you know from Kepler to Newton and that that's a worthwhile goal it's also very very challenging okay so data
science has been around for a long time there's a really interesting modern book called the fourth paradigm data-intensive scientific discovery which basically shows or describes this progression from kind of theory and analytics Mattox to experiments collecting data from you know running experiments to test hypotheses to simulations and numerix and computations kind of the the digital you know silicon age and now this fourth paradigm of data-driven inquiry and scientific discovery really interesting and you know how this complements this doesn't this doesn't displace theory or numerix or experiments it complements these generate massive amounts of data and
we need a science that ties these together okay just like simulations didn't displace experiments they complement each other okay so that's just a very high-level overview I will point out for those of you who are kind of more interested in the nuts and bolts of machine learning and modeling and kind of the linear algebra and optimization underlying these data science algorithms I'll recommend a book that my colleague Nathan cuts and I just wrote data-driven science and engineering in Cambridge and we have a website data book u-dub com where we filmed up all of our lectures
for all of the chapters and sections so for example you can go to our website and find you know different topics you're interested in and see our YouTube videos so if you're interested hopefully that's a resource to kind of get into the more nitty gritty mathematical aspects okay thank you