Vector data raster data geojson shapefile geoparkette zar cloud optimized geotiff geotiff csv geopandas rasterio shapley pysal lidar point cloud gdal ogre to ogre fiona facial sql post gis bigquery redshift snowflake databricks tippecanoe map tiles geo beam h3 s2 quad key geographic weighted regression exploratory spatial data analysis machine learning location allocation network analytics qgis kepler Gl leaflet deck gl open layers cardo apis geocoding routing javascript python react command line r dems jupiter notebooks spatial feature engineering spatial data science geospatial data engineering that was overwhelming and yes modern gis can seem very overwhelming it's a brand new
territory with different types of tools technologies libraries and languages that may seem very confusing but today what i hope to do is break Down what are the most important things to learn and get started with if you want to take a journey into modern gis in 2022 so with that let's jump right in hi everyone my name is matt forrest and today we're going to learn the most important tools that i think you need to get started with modern gis in 2022 this is not a comprehensive list but i think these are the things that
if you start with you can start to build a really good foundation and grow your skill set In multiple different areas depending on where you want to go so the first area to focus on is some fundamental gis concepts now if you've gone through a gis or geospatial education you've probably studied these things before things such as spatial joins how spatial features touch each other and interact measuring distances nearest neighbors so on and so forth this happens to be one of the things i do most of my job is using spatial data combining it using
Spatial interactions and spatial relationships so this is probably the number one topic i'd focus on and i pull away from a traditional gis education now how you do this with different technologies we'll talk a little bit more about later in the video but that's one place just a really important example of this is exploratory spatial data analysis or esda there's a python package from pycell where you can actually use and implement that Effectively what this does is perform an analysis called spatial autocorrelation or more enzyme and it looks at a specific feature and the features
surrounding it the ones that touch or have different spatial relationships there's lots of different ways you can actually do this in the package they call this spatial weights and you can take a look at that in this link over here now Using this you can actually evaluate the spatial clusters or spatial relationships using some statistical methods and that same analysis can apply to lots of different areas so really understanding how to use spatial relationships is a very important concept in modern gis other concepts i would definitely take a look at are using other data to
create geospatial features things like geocoding addresses is really important and can get really Complex if the address state is not clean clear or consistent also understanding the concept of trade areas where people might go to certain places or interact with certain things now you can do this multiple different ways you can use a simple radius you could use a concept of an isochrone or a drive time or you can use human mobility data to calculate these but having an understanding of how you define a trade area and how the features interact with That data is
very important as well the last piece i would also work on understanding is network analysis so understanding maybe it's a road network maybe it's things like rivers other features like utility lines all encompass network analytics there's great tools for this as well we'll talk about that a little bit more later but understanding the basic concepts of network analytics is also really important now you'll notice that i Didn't cover some concepts from a traditional geography education things like map projections some basic cartographic concepts and all these different pieces as it relates to map projections i would
say the majority of analysis really happens in a handful of projections so having some basic engineering to get into those projections is going to be important but understanding all the dynamics of projections is not as critical in modern Gis as it relates to cartographic design you might want to know how to use a jenks break or equal intervals or some other geographic or cartographic design concepts but it's not as important in modern gis there's great tools to do this and we'll talk about that in the visualization section later in this video now if you've worked
with geospatial data before you know it comes in lots of different shapes sizes and most importantly formats and I'm not just talking about the difference between raster or vector data i'm really talking about all the different file formats that you can use in those two data types we all know the common ones like shape files geojson csvs even into kml's or other file types but there's a whole list almost 200 different file types across raster and vector data sets that you can actually store geospatial data in there's a number of reasons for this and we
can Maybe talk about that a different time but you need to be able to move that into the file type that works best for you what's the best tool to do that gdal gdal is a base library that's mostly written in c plus plus that's incredibly fast and has multiple drivers to different libraries like python or others to actually use in different settings it's embedded in many of the tools you use such as qgis geopandas Fiona and many many more it is one of the most critical pieces of geospatial infrastructure that exists today now how
do you make use of this there's multiple ways you can use it in a python library you can use it in one of those other tools that i mentioned or as i use it quite often you can use it directly on the command line of your computer there's some inspire installation instructions depending on your operating system which you can check out but i Highly recommend putting some work into this why it can solve a wide array of data engineering problems for geospatial data let's say you need to change a projection use gdal do you need
to rasterize or vectorize some data use gdal do you need to manipulate geometries on a very base level you can use gdal Do you need to abstract some data to add a different geometry type column maybe in geojson to import into a different tool guess what you can use gdl gdl covers so many wide ranges of use cases for moving and manipulating geospatial data that is just worth investing some good time into practicing with that library and using it for these base geospatial data engineering problems now there are some other basic geospatial data engineering Concepts
that i think are important things like using unions and aggregating data also using and manipulating data to move into different systems like a database or data warehouse gdel can help with a lot of this so keep that in mind as you're going along and we'll also talk about the importance of spatial sql in geospatial data engineering in a later section but for the most part gdel is going to be your core toolkit for doing insolving a lot of these problems The last piece i wanted to touch on was spatial indices now a spatial index is
effectively a geometry but that is stored in a string or a numeric format now some examples of this are h3 cells s2 cells and quad bins or quad integers these are really important for storing large data sets in point data and making it very easy to quickly aggregate and then visualize that data in other different tools now as you see in this map as i zoom in and out you can see the Cells and the aggregations are changing the data that's underlying this map is actually point data so what i'm doing is effectively joining that
and using the spatial indexing systems in this case h3 to actually aggregate this data but when i'm visualizing the map i actually am not using a geometry i'm just using that h3 string index to join that data to the front end and visualize it that way it's incredibly efficient for storing and querying data and we'll talk more about That in a future video but this is one concept i want to make sure you know about now you can use spatial indices lots of different ways there's libraries written in python and javascript and others that you
can actually use and leverage to create spatial indices you can also use this through spatial sql and can do that directly on a database or a data warehouse my next toolkit that i highly recommend using is qgas qgis is one of the most indisposable and most Important pieces of geospatial infrastructure for modern gis now yes it is a desktop tool and you might be expecting to me to talk about programming and computer languages and all these different things but qgis embeds and has so much of that and is very easy to start using and introducing
yourself to these different components on top of that it has a rich ecosystem of plugins and other tools that you can get started with right away if you come For more of a traditional gis background this is a very easy introduction into using modern gis tools and concepts and we'll talk a little bit more about why that is in a minute i've been using qgis since around 2010 and it was a much different experience back then in terms of downloading the software and using it fortunately now in 2022 it's very easy to use it gets
started there's releases and communities and support infrastructure for actually using and Learning this tool one way i've talked about learning modern gis in the past is through a crawl walk run and then sprint process qgis is the first step of that process that introduces you to a lot of these different concepts now one thing i love about qgis is when you download it it installs gdal for you to use and embeds gdl in a lot of the data abstraction processes and data engineering tools within the platform itself there's so Much you can do with qgis
that's hard to mention here you can create models visualize data transform data perform spatial analyses understand spatial relationships you can even create web maps right from qgis like i said this helps you get started and build some of that basic knowledge in a very familiar and approachable interface i love the fact that in qjs you can connect a spatial database use post gis or others and connect that and start visualizing It in the tool one of the toughest parts about getting started with spatial sql is sometimes if you're just using a pure database or data
warehouse you can actually see the results of what you're producing in the sql you're writing qgis solves that problem by creating a direct connection to the database or the data warehouse and allows you to visualize that data as you write the queries this is really important and a great toolkit to get started at a really Low barrier to entry so it seems like we almost can't talk about modern gis or geospatial data science without talking about python python is one of the fastest growing languages and is one of the most in-demand job skills whether you
look inside gis or outside of it into other areas like data sci and sir data analytics so there's a few core libraries that i think are critically important to modern gis One of the first is geopandas geopandas has your common geospatial or gis toolkit to perform different things like spatial joins measure areas distances intersections so on and so forth has a lot of the core functionality for reading files uses a common data's format which is the geodata frame the comparable version of that in the non-geospatial toolkit of pandas is the data frame so you'll be
picking up skills to use a generic data analytics Toolkit in python with pandas and that data frame is transferable across lots of different libraries geopandas is actually very easy to learn their documentation is really really fantastic it has easy to access tools you can actually click to open up any other documentation pages directly with a binder notebook and you can get started right now just opening it up trying it doing some basic analytics and visualization right from geopandas so The second toolkit is one that's really starting to gain some traction and i highly recommend checking
it out it's called leaf map and it's developed by a fantastic professor from the university of tennessee professor ki sheng wu and he has built i think one of the most comprehensive geospatial toolkits in python that exists today leafmap has lots of different tools and incorporates common visualization libraries you can use data coming from Geodata frames from post gis databases from just about anywhere you can perform vector analytics publish maps and you can perform raster analytics using white box tools and many other toolkits it's really a full comprehensive modern gis toolkit all written in python
ready to use in notebooks my favorite part about all of this though has to be the examples that he's put together there's a comprehensive examples library to do just about anything you might want To do you can pull the example get started and start using it today like i said in a combination of geopandas plus leaf map you can basically replicate just about any geospatial analytics that you might want to perform in a tool like qgis or any other desktop in a python environment going back to the crawl walk run sprint workflow that i described
earlier starting with qgis you can really get a base in modern gis and some of the tools you can use there and then Making that next leap into python with a combination of geopandas leaf map and other toolkits you can really start to build that flow learn what you're doing in queue translate that to python get some of the python tools under your belt and then make the next leap from there which we'll talk about in the next section the two other components within python for geospatial that i would recommend would be the first is
pysol so if you're doing anything related to Spatial data science and really want to get into that statistical modeling and looking at moving from the where to the why things happen pycelle is absolutely the best toolkit to do this there's plenty of examples and you can check those out in this video that i'll post it earlier here if you're looking at doing any sort of geospatial data engineering or want to get under the hood of what's happening with these libraries definitely check them out i Would recommend checking out rasterio for reading and manipulating raster data
fiona for looking at you know vector data and manipulating that type of data set and then the last one is shapely for editing changing manipulating geometries these are really core functional structural components of geospatial python and just about every library out there uses these in one way or another in one extra note one layer underneath All these libraries is our friend gdal gdal like i said is one of those core pieces of modern gis infrastructure that finds itself in just about any tool that is currently used within the modern gis stack why is python really
important and why do i push it so much as this second step in the process one is that you can do so much more with python including geospatial data engineering to build data pipelines you can use it to read data from apis and Actually create your own apis and build your own scripts to manipulate data and make processes really repeatable it has so many uses beyond just the core of data analytics or geospatial data analysis or even spatial data science that's just a good tool to have in your tool belt there's lots of other libraries
that i love and recommend in geospatial libraries like osm and x and other network analytics tools we'll talk about those more but keep in mind i Think a lot of these actually read into leafmap and it makes it one of those accessible toolkits to read data from lots of different sources if you know me at all you know i'm a very big proponent of spatial sql sql is one of the first tools that i really started to use to scale up my modern gis toolkit and i think it's one of the fundamental pieces that helps
you make that next leap whether you're using more data need to organize your data in a consistent way Or you want to have a lot of scale and speed to add to your workflows spatial sql is the place to do that now i recommend adding spatial sql as the third step in the process if you think back to the second step of python you're translating those fundamentals into a programming or coding environment spatial sql is a different type of programming and you need some skills in computer engineering to stand up a database or if you
want to use an Existing product do that as well but sql adds a few special pieces to your workflow that are really going to help you scale and grow when you need to use spatial sql it's probably around the time when your data is getting a little bit unorganized you have too many files floating around or the data is getting too big that it's really hard or the processes are taking too long to run in a local environment or a notebook at that point in time you should start Thinking about looking at a spatial database
or data warehouse something like post gis is the most common for setting up and you can set that up on your local machine and there's a couple different ways to do that in addition if your data is getting very large in terms of the millions or even billions of bros or complex geometries i would start evaluating a data warehouse something like big query snowflake amazon redshift or even databricks these are great tools That are cloud hosted and you can start at a very low barrier to entry to get started and use those cloud services so
why spatial sql what does it actually help you do well there's a few different things and you can check out this video exactly what the use cases are and how that helps you scale but if your data is getting much larger if you're querying more data and you need to maybe create new features or do really large spatial joins sql is going To be the best place to do that having your data living right alongside the code that's running it is very effective for creating scalable workflows and you can decrease the time you're actually using
to perform these spatial analyses going back to the beginning of the video spatial relationship analysis is very very fast in a spatial sql database or data warehouse this is one of the number one reasons i recommend going to a database or data warehouse outside of That what does spatial sequel help you do you can do spatial feature engineering to create different features using multiple different data sets in your spatial database or data warehouse you can actually create tile sets and do visualization from the database or data warehouse too if you have lots of different data
and it's better to visualize that in tiles rather than raw files that's a great place to do that data engineering is also much faster and There's so many supportive functions to do geospatial data and engineering in a spatial sql environment as well i think this is one of the most important pieces in a common workflow i see people using spatial sql for so that wraps up the third step of the process using that database to help you scale and move into that run workflow now we're talking about moving into the sprint workflow any of these
first three steps you can run on your computer in your laptop Today now if you're evolving into much larger data sets larger quantities of data complexities of data or data that's being added to your data system on a frequent basis it might be time to think about using the cloud now cloud is the last piece of a modern gis workflow it is not something that you must use you can do all these pieces locally on your machine and it's one of the core principles of Modern gis anything that you can do in the cloud you
can do locally and they're interoperable but when you're ready the cloud is there for you to use now what i love about the cloud is that it helps you scale you can do things on machines that are more powerful than what's contained within your local laptop or computer and you can share that with people around the world as well but i think the most important thing About the cloud is using cloud native workflows these might be known as serverless workflows are basically tools that you can spin up and use massive computing power just in the
scope of the operation you want to perform so what are the key cloud components that you need to use for modern gis well my top three would be databases or data warehouses places to store manage and query your data and do that really efficiently host it on the cloud The second would be cloud storage systems where you can actually host and keep your files in an organized fashion this really helps and is commonly known as a data lake but effectively it's a giant storage system to keep all your files organized the next would be etl
or elt tools to actually load data in this could be products that stream or batch or streaming data into a database or data warehouse but provide that serverless layer that i was talking About earlier to make that really efficient and effective lots of tools to do this but definitely not time to cover that in this video today some of the other tools that i would take a look at would be notebook services where you can actually run hosted jupyter notebooks things like apis for maps or mapping applications and several other components that are helpful within
the cloud stack as well the last thing that i would have to note would be hosting And using earth observation or satellite imagery data now you can't mention that without mentioning google earth engine which is a really great tool to actually access analyze and run models on top of historic and current imagery within the google earth system so now that i've covered those four key areas for the crawl walk run and sprint i want to cover a few other areas that might be important if you're studying different topics in modern gis using a command Line
for this first time can seem quite overwhelming and you might think you need a background in computer science but having a complete computer science degree or background isn't necessary for modern gis that said those concepts don't hurt and there are some great courses out there i've actually taken the first part of the harvard cs50 course i really enjoyed this it gave you some fundamental understanding of how computers and programming works without Going too deep or too intense into the topic command line is obviously a great tool to know and use and one i definitely recommend
investing some time in and there's plenty of great tutorials out there to get you started the second is basic data structures knowing not only data types things like integers booleans strings dates geometries of course are really really helpful But having that basic structure of other data types things like json dictionaries lists arrays all these different topics are really really helpful as you start to use and advance data structures within your modern gis i really recommend spending some time doing this but being an expert on it is not necessary either speaking of data there's been a
surge in creating cloud native data formats this includes vector data with file formats like geo parkette And raster data with file formats like cloud optimized geotiffs and zars among others now i would definitely recommend checking this out this is a very new and advancing topic but this video by the open geospatial consortium which is actually a video of a complete conference shows some of these different advances in the space as well as some of the use cases you can use with it now while it pains me as i have a traditional geography and cartography Background
to say that visualization might not be the top of the list for modern gis skills the field has advanced quite a bit and with a lot of out of the box tools you can build some really great cartography styling and different components into your maps so tools like kepler gl or cardo have really great styling patterns that you can use to create choropleth categorical maps and it's not always necessary to know what a jenks Break is versus an equal interval versus others so while cartography is of course really important to communicating what you've done you
have a lot of tools that help you do this today if you're doing geospatial application development of course you're going to need some javascript and i also recommend spending some time learning react and redux as well as it gives you a really base framework to build and create repeatable components if you're building Large-scale applications now to advance your javascript skills if you have some basic understanding of javascript i really recommend this course under demi this is the one i use to really advance my javascript skills and really leap frog into really understanding what i was
writing not just kind of following some tutorial the second is react i love react and i think it's one of the pieces that i've picked up over the last couple of years it's really helped me build Really scalable applications make my work way more repeatable and it's well worth the investment this is another great course from udemy that i recommend that actually help to use learn react and redux so definitely take a look at this one as well so the last topic in terms of big data visualization i have to mention tiling now map tiles
are the services that we use to render data on the web very efficiently effectively every base map service apple Maps google maps uses a tiling service to render that data now if you don't know anything about map tiles i wrote a blog post that talks about the history of mapping and mapping online so you can take a look at this to check out more on this topic as well now the good news is that you can create tiles locally you can create tiles in qgis you can create tiles in a spatial database or a spatial
warehouse there's lots of different ways to do this today so i definitely Recommend if you're going to be doing visualization and application development with really large data that you spend and invest some time and understanding how tiles work and how to use that so that's it for today this is my base recommendations on how to learn modern gis in 2022 i hope to update this video in the future to add new recommendations based on the advances taking place all the time in the modern gis and geospatial spaces but for now We'll leave it there i
really appreciate it thanks for taking the time and we will see you on another video vector data raster data geojson shapefile geoparkette zar cloud optimized geotiff geotiff csv geopandas rasterio shapley pysal lidar point cloud gdal ogre to ogre fiona facial sql post gis bigquery redshift snowflake databricks tippecanoe map tiles geo beam h3 s2 quad key geographic weighted regression exploratory spatial data Analysis machine learning location allocation network analytics qgis kepler gl leaflet deck gl open layers cardo apis geocoding routing javascript python react command line r dems jupiter notebooks spatial feature engineering spatial data science geospatial data
engineering that was overwhelming and yes modern gis can seem very overwhelming it's a brand new territory with different types of tools technologies libraries and Languages that may seem very confusing but today what i hope to do is break down what are the most important things to learn and get started with if you want to take a journey into modern gis in 2022 so with that let's jump right in hi everyone my name is matt forrest and today we're going to learn the most important tools that i think you need to get started with modern gis
in 2022 this is not a comprehensive list but i think these are the things that if you start With you can start to build a really good foundation and grow your skill set in multiple different areas depending on where you want to go so the first area to focus on is some fundamental gis concepts now if you've gone through a gis or geospatial education you've probably studied these things before things such as spatial joins how spatial features touch each other and interact measuring distances nearest neighbors so on and so forth this happens to be one
Of the things i do most of my job is using spatial data combining it using spatial interactions and spatial relationships so this is probably the number one topic i'd focus on and i pull away from a traditional gis education now how you do this with different technologies we'll talk a little bit more about later in the video but that's one place just a really important example of this is exploratory spatial data analysis or esda there's a python Package from pycell where you can actually use and implement that effectively what this does is perform an analysis
called spatial autocorrelation or more enzyme and it looks at a specific feature and the features surrounding it the ones that touch or have different spatial relationships there's lots of different ways you can actually do this in the package they call this spatial weights and you can take a look at that in this link over Here now using this you can actually evaluate the spatial clusters or spatial relationships using some statistical methods and that same analysis can apply to lots of different areas so really understanding how to use spatial relationships is a very important concept in
modern gis other concepts i would definitely take a look at are using other data to create geospatial Features things like geocoding addresses is really important and can get really complex if the address state is not clean clear or consistent also understanding the concept of trade areas where people might go to certain places or interact with certain things now you can do this multiple different ways you can use a simple radius you could use a concept of an isochrone or a drive time or you can use human mobility data to calculate these but having an Understanding
of how you define a trade area and how the features interact with that data is very important as well the last piece i would also work on understanding is network analysis so understanding maybe it's a road network maybe it's things like rivers other features like utility lines all encompass network analytics there's great tools for this as well we'll talk about that a little bit more later but understanding the basic concepts of Network analytics is also really important now you'll notice that i didn't cover some concepts from a traditional geography education things like map projections some
basic cartographic concepts and all these different pieces as it relates to map projections i would say the majority of analysis really happens in a handful of projections so having some basic engineering to get into those projections is going to be important but Understanding all the dynamics of projections is not as critical in modern gis as it relates to cartographic design you might want to know how to use a jenks break or equal intervals or some other geographic or cartographic design concepts but it's not as important in modern gis there's great tools to do this and
we'll talk about that in the visualization section later in this video now if you've worked with geospatial data before you know it comes In lots of different shapes sizes and most importantly formats and i'm not just talking about the difference between raster or vector data i'm really talking about all the different file formats that you can use in those two data types we all know the common ones like shape files geojson csvs even into kml's or other file types but there's a whole list almost 200 different file types across raster and vector data sets that
you can actually Store geospatial data in there's a number of reasons for this and we can maybe talk about that a different time but you need to be able to move that into the file type that works best for you what's the best tool to do that gdal gdal is a base library that's mostly written in c plus plus that's incredibly fast and has multiple drivers to different libraries like python or others to actually use in different Settings it's embedded in many of the tools you use such as qgis geopandas fiona and many many more
it is one of the most critical pieces of geospatial infrastructure that exists today now how do you make use of this there's multiple ways you can use it in a python library you can use it in one of those other tools that i mentioned or as i use it quite often you can use it directly on the command line of your computer there's some inspire installation Instructions depending on your operating system which you can check out but i highly recommend putting some work into this why it can solve a wide array of data engineering problems
for geospatial data let's say you need to change a projection use gdal do you need to rasterize or vectorize some data use gdal do you need to manipulate Geometries on a very base level you can use gdal do you need to abstract some data to add a different geometry type column maybe in geojson to import into a different tool guess what you can use gdl gdl covers so many wide ranges of use cases for moving and manipulating geospatial data that is just worth investing some good time into practicing with that library and using it for
these base geospatial data Engineering problems now there are some other basic geospatial data engineering concepts that i think are important things like using unions and aggregating data also using and manipulating data to move into different systems like a database or data warehouse gdel can help with a lot of this so keep that in mind as you're going along and we'll also talk about the importance of spatial sql in geospatial data engineering in a later section but for the most part gdel Is going to be your core toolkit for doing insolving a lot of these problems
the last piece i wanted to touch on was spatial indices now a spatial index is effectively a geometry but that is stored in a string or a numeric format now some examples of this are h3 cells s2 cells and quad bins or quad integers these are really important for storing large data sets in point data and making it very easy to quickly aggregate and then visualize that data in other Different tools now as you see in this map as i zoom in and out you can see the cells and the aggregations are changing the data
that's underlying this map is actually point data so what i'm doing is effectively joining that and using the spatial indexing systems in this case h3 to actually aggregate this data but when i'm visualizing the map i actually am not using a geometry i'm just using that h3 string index to join that data to the front end and visualize it that way it's Incredibly efficient for storing and querying data and we'll talk more about that in a future video but this is one concept i want to make sure you know about now you can use spatial
indices lots of different ways there's libraries written in python and javascript and others that you can actually use and leverage to create spatial indices you can also use this through spatial sql and can do that directly on a database or a data warehouse my next toolkit that I highly recommend using is qgas qgis is one of the most indisposable and most important pieces of geospatial infrastructure for modern gis now yes it is a desktop tool and you might be expecting to me to talk about programming and computer languages and all these different things but qgis
embeds and has so much of that and is very easy to start using and introducing yourself to these different components on top of that it has a rich ecosystem Of plugins and other tools that you can get started with right away if you come for more of a traditional gis background this is a very easy introduction into using modern gis tools and concepts and we'll talk a little bit more about why that is in a minute i've been using qgis since around 2010 and it was a much different experience back then in terms of downloading
the software and using it fortunately now in 2022 it's very easy to use it gets started there's releases And communities and support infrastructure for actually using and learning this tool one way i've talked about learning modern gis in the past is through a crawl walk run and then sprint process qgis is the first step of that process that introduces you to a lot of these different concepts now one thing i love about qgis is when you download it it installs gdal for you to use and embeds gdl in a lot of the data abstraction Processes
and data engineering tools within the platform itself there's so much you can do with qgis that's hard to mention here you can create models visualize data transform data perform spatial analyses understand spatial relationships you can even create web maps right from qgis like i said this helps you get started and build some of that basic knowledge in a very familiar and approachable interface i love the fact that in qjs you can connect a Spatial database use post gis or others and connect that and start visualizing it in the tool one of the toughest parts about
getting started with spatial sql is sometimes if you're just using a pure database or data warehouse you can actually see the results of what you're producing in the sql you're writing qgis solves that problem by creating a direct connection to the database or the data warehouse and allows you to visualize that data as you write the Queries this is really important and a great toolkit to get started at a really low barrier to entry so it seems like we almost can't talk about modern gis or geospatial data science without talking about python python is one
of the fastest growing languages and is one of the most in-demand job skills whether you look inside gis or outside of it into other areas like data sci and sir data analytics so there's a few core libraries that i Think are critically important to modern gis one of the first is geopandas geopandas has your common geospatial or gis toolkit to perform different things like spatial joins measure areas distances intersections so on and so forth has a lot of the core functionality for reading files uses a common data's format which is the geodata frame the comparable
version of that in the non-geospatial toolkit of pandas is the Data frame so you'll be picking up skills to use a generic data analytics toolkit in python with pandas and that data frame is transferable across lots of different libraries geopandas is actually very easy to learn their documentation is really really fantastic it has easy to access tools you can actually click to open up any other documentation pages directly with a binder notebook and you can get started right now just opening it up trying it Doing some basic analytics and visualization right from geopandas so the
second toolkit is one that's really starting to gain some traction and i highly recommend checking it out it's called leaf map and it's developed by a fantastic professor from the university of tennessee professor ki sheng wu and he has built i think one of the most comprehensive geospatial toolkits in python that exists today leafmap has lots of different tools and Incorporates common visualization libraries you can use data coming from geodata frames from post gis databases from just about anywhere you can perform vector analytics publish maps and you can perform raster analytics using white box tools
and many other toolkits it's really a full comprehensive modern gis toolkit all written in python ready to use in notebooks my favorite part about all of this though has to be the examples that he's put together There's a comprehensive examples library to do just about anything you might want to do you can pull the example get started and start using it today like i said in a combination of geopandas plus leaf map you can basically replicate just about any geospatial analytics that you might want to perform in a tool like qgis or any other desktop
in a python environment going back to the crawl walk run sprint workflow that i described earlier starting with qgis you can Really get a base in modern gis and some of the tools you can use there and then making that next leap into python with a combination of geopandas leaf map and other toolkits you can really start to build that flow learn what you're doing in queue translate that to python get some of the python tools under your belt and then make the next leap from there which we'll talk about in the next section the
two other components within python for geospatial that i would Recommend would be the first is pysol so if you're doing anything related to spatial data science and really want to get into that statistical modeling and looking at moving from the where to the why things happen pycelle is absolutely the best toolkit to do this there's plenty of examples and you can check those out in this video that i'll post it earlier here if you're looking at doing any sort of geospatial data engineering or want to get under the Hood of what's happening with these libraries
definitely check them out i would recommend checking out rasterio for reading and manipulating raster data fiona for looking at you know vector data and manipulating that type of data set and then the last one is shapely for editing changing manipulating geometries these are really core functional structural components of geospatial python and just about every library out There uses these in one way or another in one extra note one layer underneath all these libraries is our friend gdal gdal like i said is one of those core pieces of modern gis infrastructure that finds itself in just
about any tool that is currently used within the modern gis stack why is python really important and why do i push it so much as this second step in the process one is that you can do so much more with python including geospatial data Engineering to build data pipelines you can use it to read data from apis and actually create your own apis and build your own scripts to manipulate data and make processes really repeatable it has so many uses beyond just the core of data analytics or geospatial data analysis or even spatial data science
that's just a good tool to have in your tool belt there's lots of other libraries that i love and recommend in geospatial libraries like osm and x and Other network analytics tools we'll talk about those more but keep in mind i think a lot of these actually read into leafmap and it makes it one of those accessible toolkits to read data from lots of different sources if you know me at all you know i'm a very big proponent of spatial sql sql is one of the first tools that i really started to use to scale
up my modern gis toolkit and i think it's one of the fundamental pieces that helps you make that next leap Whether you're using more data need to organize your data in a consistent way or you want to have a lot of scale and speed to add to your workflows spatial sql is the place to do that now i recommend adding spatial sql as the third step in the process if you think back to the second step of python you're translating those fundamentals into a programming or coding environment spatial sql is a different type of programming
and you need some skills in Computer engineering to stand up a database or if you want to use an existing product do that as well but sql adds a few special pieces to your workflow that are really going to help you scale and grow when you need to use spatial sql it's probably around the time when your data is getting a little bit unorganized you have too many files floating around or the data is getting too big that it's really hard or the processes are taking too long to run in A local environment or a
notebook at that point in time you should start thinking about looking at a spatial database or data warehouse something like post gis is the most common for setting up and you can set that up on your local machine and there's a couple different ways to do that in addition if your data is getting very large in terms of the millions or even billions of bros or complex geometries i would start evaluating a data warehouse something Like big query snowflake amazon redshift or even databricks these are great tools that are cloud hosted and you can start
at a very low barrier to entry to get started and use those cloud services so why spatial sql what does it actually help you do well there's a few different things and you can check out this video exactly what the use cases are and how that helps you scale but if your data is getting much larger if you're querying more data and you Need to maybe create new features or do really large spatial joins sql is going to be the best place to do that having your data living right alongside the code that's running it
is very effective for creating scalable workflows and you can decrease the time you're actually using to perform these spatial analyses going back to the beginning of the video spatial relationship analysis is very very fast in a spatial sql database or data warehouse this is one of the number One reasons i recommend going to a database or data warehouse outside of that what does spatial sequel help you do you can do spatial feature engineering to create different features using multiple different data sets in your spatial database or data warehouse you can actually create tile sets and
do visualization from the database or data warehouse too if you have lots of different data and it's better to visualize that in tiles rather than raw Files that's a great place to do that data engineering is also much faster and there's so many supportive functions to do geospatial data and engineering in a spatial sql environment as well i think this is one of the most important pieces in a common workflow i see people using spatial sql for so that wraps up the third step of the process using that database to help you scale and move
into that run workflow now we're talking about moving into the sprint workflow Any of these first three steps you can run on your computer in your laptop today now if you're evolving into much larger data sets larger quantities of data complexities of data or data that's being added to your data system on a frequent basis it might be time to think about using the cloud now cloud is the last piece of a modern gis workflow it is not something that you must use you can do All these pieces locally on your machine and it's one
of the core principles of modern gis anything that you can do in the cloud you can do locally and they're interoperable but when you're ready the cloud is there for you to use now what i love about the cloud is that it helps you scale you can do things on machines that are more powerful than what's contained within your local laptop or computer and you can share that with people Around the world as well but i think the most important thing about the cloud is using cloud native workflows these might be known as serverless workflows
are basically tools that you can spin up and use massive computing power just in the scope of the operation you want to perform so what are the key cloud components that you need to use for modern gis well my top three would be databases or data warehouses places to store manage and Query your data and do that really efficiently host it on the cloud the second would be cloud storage systems where you can actually host and keep your files in an organized fashion this really helps and is commonly known as a data lake but effectively
it's a giant storage system to keep all your files organized the next would be etl or elt tools to actually load data in this could be products that stream or batch or streaming data into a database or Data warehouse but provide that serverless layer that i was talking about earlier to make that really efficient and effective lots of tools to do this but definitely not time to cover that in this video today some of the other tools that i would take a look at would be notebook services where you can actually run hosted jupyter notebooks
things like apis for maps or mapping applications and several other components that are helpful within the Cloud stack as well the last thing that i would have to note would be hosting and using earth observation or satellite imagery data now you can't mention that without mentioning google earth engine which is a really great tool to actually access analyze and run models on top of historic and current imagery within the google earth system so now that i've covered those four key areas for the crawl walk run and sprint i want to cover a few other areas
that might be Important if you're studying different topics in modern gis using a command line for this first time can seem quite overwhelming and you might think you need a background in computer science but having a complete computer science degree or background isn't necessary for modern gis that said those concepts don't hurt and there are some great courses out there i've actually taken the first part of the harvard cs50 course i really enjoyed this it gave you Some fundamental understanding of how computers and programming works without going too deep or too intense into the topic
command line is obviously a great tool to know and use and one i definitely recommend investing some time in and there's plenty of great tutorials out there to get you started the second is basic data structures knowing not only data types things like integers booleans strings dates Geometries of course are really really helpful but having that basic structure of other data types things like json dictionaries lists arrays all these different topics are really really helpful as you start to use and advance data structures within your modern gis i really recommend spending some time doing this
but being an expert on it is not necessary either speaking of data there's been a surge in creating cloud Native data formats this includes vector data with file formats like geo parkette and raster data with file formats like cloud optimized geotiffs and zars among others now i would definitely recommend checking this out this is a very new and advancing topic but this video by the open geospatial consortium which is actually a video of a complete conference shows some of these different advances in the space as well as some of the use cases you can use
with it now While it pains me as i have a traditional geography and cartography background to say that visualization might not be the top of the list for modern gis skills the field has advanced quite a bit and with a lot of out of the box tools you can build some really great cartography styling and different components into your maps so tools like kepler gl or cardo have really great styling patterns that you can use to create Choropleth categorical maps and it's not always necessary to know what a jenks break is versus an equal interval
versus others so while cartography is of course really important to communicating what you've done you have a lot of tools that help you do this today if you're doing geospatial application development of course you're going to need some javascript and i also recommend spending some time learning react and redux as well as it gives you a really base Framework to build and create repeatable components if you're building large-scale applications now to advance your javascript skills if you have some basic understanding of javascript i really recommend this course under demi this is the one i use
to really advance my javascript skills and really leap frog into really understanding what i was writing not just kind of following some tutorial the second is react i love react and i think it's one of the pieces That i've picked up over the last couple of years it's really helped me build really scalable applications make my work way more repeatable and it's well worth the investment this is another great course from udemy that i recommend that actually help to use learn react and redux so definitely take a look at this one as well so the
last topic in terms of big data visualization i have to mention tiling now map tiles are the services that we use to render Data on the web very efficiently effectively every base map service apple maps google maps uses a tiling service to render that data now if you don't know anything about map tiles i wrote a blog post that talks about the history of mapping and mapping online so you can take a look at this to check out more on this topic as well now the good news is that you can create tiles locally you
can create tiles in qgis you can create tiles in a spatial database or a spatial Warehouse there's lots of different ways to do this today so i definitely recommend if you're going to be doing visualization and application development with really large data that you spend and invest some time and understanding how tiles work and how to use that so that's it for today this is my base recommendations on how to learn modern gis in 2022 i hope to update this video in the future to add new recommendations based on the advances Taking place all the
time in the modern gis and geospatial spaces but for now we'll leave it there i really appreciate it thanks for taking the time and we will see you on another video