Leveraging Geospatial Knowledge in Python with GeoPandas – KDnuggets #Imaginations Hub

Leveraging Geospatial Knowledge in Python with GeoPandas – KDnuggets #Imaginations Hub
Image source - Pexels.com

Spatial knowledge consists of data related to areas. This knowledge can come from GPS tracks, earth statement imagery, and maps. Every spatial knowledge level could be exactly positioned on a map utilizing coordinate reference techniques like latitude/longitude pairs for actual placement on maps, which permits us to analyze relationships amongst them.

Spatial knowledge’s true potential lies in its skill to attach knowledge factors and their respective areas, creating infinite prospects for superior evaluation. Geospatial knowledge science is an rising discipline inside knowledge science that seeks to harness geospatial info and extract helpful insights by spatial algorithms and superior strategies resembling machine studying or deep studying to attract significant conclusions about what occasions have taken place and their causes. Geospatial knowledge science provides us perception into the place occasions occur in addition to why they occur.

GeoPandas is an open-source Python package deal particularly tailor-made for working with info. It expands upon pandas’ array of datatypes by offering spatial operations on geometric objects – which facilitates spatial analyses in Python utilizing pandas’ data-manipulation software, pandas. Since GeoPandas is constructed upon Pandas it provides a straightforward path for professionals accustomed to Python syntax to turn into acquainted with GeoPandas syntax rapidly.




We’ve to put in the GeoPandas package deal to have the ability to use it. Nonetheless, it’s essential to notice that GeoPandas depends upon different libraries that should be put in to make use of it with out issues. These dependencies are shapely, Fiona, pyproj, and rtree

There are two methods you possibly can obtain the GeoPandas package deal. First, you should use conda to put in the GeoPandas conda package deal. This technique is really useful as it’s going to present the dependencies of GeoPandas with out the necessity to set up them by yourself. You possibly can run the next command to put in GeoPandas:


The second technique is to make use of pip which is the usual package deal installer in Python. Nonetheless, utilizing this technique would require putting in the remainder of the talked about dependencies. 


As soon as the GeoPandas package deal is put in you possibly can import it into your Python code utilizing the next command:



GeoPandas is used to learn spatial knowledge and convert it into GeoDataFrame. Nonetheless, it is very important word that there are two important sorts of spatial knowledge:

  • Vector knowledge: The vector knowledge describes the options of the geography of earth areas utilizing discrete geometry utilizing the next phrases level, line, and polygon. 
  • Raster knowledge: The raster knowledge encodes the world as a floor represented by a grid. Every pixel of this grid is represented by a steady worth or categorical class.

GeoPandas primarily works with vector knowledge. Nonetheless, it may be used at the side of different Python packages to deal with raster knowledge, resembling rasterio. You should use the highly effective geopandas.read_file() perform to learn a lot of the vector-based spatial knowledge. There are two important knowledge sorts of vector-based partial knowledge:

  • Shapefile: Shapfile is the most typical knowledge format and is taken into account the industry-level knowledge kind. It consists of three information which are compressed and normally supplied as a zipper file:

    The .shp file: This file comprises the form geometry.

    The .dbf file: This file holds attributes for every geometry,

    The .shx file: That is the form index file that helps hyperlink the attributes to the shapes.

  • GeoJSON: It is a new file format of geospatial knowledge launched in 2016. Because it consists of solely a single file it’s simpler to make use of it in comparison with the Shapefile

On this article, we’ll use the geopandas.read_file() perform to learn a GeoJSON file hosted in GitHub containing geospatial knowledge concerning the totally different districts of the town of Barcelona.

Let first begin by loading the information and printing the primary 5 columns of it utilizing the code under:

districts = gpd.read_file(url)


Leveraging Geospatial Data in Python with GeoPandas

Subsequent, to jot down the information right into a file we are able to use the GeoDataFrame.to_file() perform to jot down the information right into a Shapefile by default however you possibly can convert it into GeoJSON utilizing the driver parameter.

districts.to_file("districts.geojson", driver="GeoJSON")



Since GeoDataFrames is a subclass of pandas DataFrame it inherits lots of its properties. Nonetheless, there are some variations the primary distinction is that it might probably retailer geometry columns (also referred to as GeoSeries) and carry out spatial operations. The geometry column in a GeoDataFrame can comprise varied sorts of vector knowledge, together with factors, strains, and polygons. Nonetheless, just one column is taken into account the lively geometry, and all spatial operations might be primarily based on that column.

One other essential characteristic of it’s that each column comes with its related CRS info that tells us the place the candidates are positioned on Earth. The explanation why this characteristic is essential is that if that you must mix two spatial datasets you’ll need to make it possible for they’re expressed in the identical CRS in any other case you’ll get the mistaken outcomes. The CRS info is saved within the crs attribute in GeoPandas:


Leveraging Geospatial Data in Python with GeoPandas


Now that we’ve got set the precise projected CRS, we’re able to discover the attributes of GeoDataFrames.



GeoPandas has 4 helpful strategies and attributes that can be utilized to discover the information. We are going to discover these 4 strategies:

  • Space
  • Centroid
  • Boundary
  • Distance


4.1. Space


The world attribute returns the calculated space of a geometry. Within the instance under we’ll calculate the world of each district in km2.

districts['area'] = districts.space / 1000000


Leveraging Geospatial Data in Python with GeoPandas


4.2. Centroid


The second attribute is the centroid which returns the middle level of the geometry. Within the code snippet under we’ll add a brand new column and save the centroid for every district:



Leveraging Geospatial Data in Python with GeoPandas


4.3. Boundary


The third technique is the boundary attribute which calculates the boundary of a polygon for each district. The code under returns it and saves it right into a separate column:



Leveraging Geospatial Data in Python with GeoPandas


4.4. Distance


The space technique calculates the minimal distance from a sure geometry to a particular location. So for instance within the code under we’ll calculate the gap from the Sagrada Familia church to the centroids of each district in Barcelona. After that, we’ll add the gap in km2 and put it aside in a brand new column.

from shapely.geometry import Level

sagrada_fam = Level(2.1743680500855005, 41.403656946781304)
sagrada_fam = gpd.GeoSeries(sagrada_fam, crs=4326)
sagrada_fam= sagrada_fam.to_crs(epsg=2062)
districts['sagrada_fam_dist'] = [float(sagrada_fam.distance(centroid)) / 1000 for centroid in districts.centroid]


Leveraging Geospatial Data in Python with GeoPandas



Plotting and visualizing your knowledge is a essential step to higher perceive your knowledge. Plotting with GeoPandas is similar as plotting with Pandas fairly simple and tremendous ahead. That is performed by the GeoDataFrame.plot() perform that’s constructed on the matplotlib python package deal. 

Let’s begin by exploring Barcelona by plotting a fundamental plot for its districts:

ax= districts.plot(figsize=(10,6))


Leveraging Geospatial Data in Python with GeoPandas


It is a very fundamental plot that doesn’t inform us lots of info. Nonetheless, we are able to make it extra informative by coloring every district with a special coloration.

ax= districts.plot(column='DISTRICTE', figsize=(10,6), edgecolor="black", legend=True)


Leveraging Geospatial Data in Python with GeoPandas


Lastly, we are able to add extra info to our plot by including the centroids of the districts. 

import contextily
import matplotlib.pyplot as plt

ax= districts.plot(column='DISTRICTE', figsize=(12,6), alpha=0.5, legend=True)
districts["centroid"].plot(ax=ax, coloration="inexperienced")
contextily.add_basemap(ax, crs=districts.crs.to_string())
plt.title('A Coloured Map with the centroid of Barcelona')


Leveraging Geospatial Knowledge in Python with GeoPandas – KDnuggets #Imaginations Hub


Subsequent, we’ll discover a vital characteristic of GeoPandas which is the spatial relation and the way they will relate to one another.



Geospatial knowledge relate to one another in house. GeoPandas makes use of pandas and comely packages for spatial relationships. This part covers widespread operations. There are two important methods to merge GeoPandas DataFrames that are attribute and spatial joins. On this part, we’ll discover each of them.


6.1.Attribute Joins


Attribute joins can help you be a part of two GeoPandas DataFrames utilizing non-geometry variables which makes it just like the common be a part of operations in Pandas. The becoming a member of operation is completed utilizing the pandas.merge() technique as proven within the instance under. On this instance, we’ll be a part of the Barcelona inhabitants knowledge to our geospatial knowledge so as to add extra info to it. 

import pandas as pd
pop =pd.read_csv('2022_padro_sexe.csv', usecols=['Nom_Districte','Nombre'])
pop = pd.DataFrame(pop.groupby('Nom_Districte')['Nombre'].sum()).reset_index()
districts = districts.merge(pop)


Leveraging Geospatial Data in Python with GeoPandas


6.2. Spatial Joins


However spatial joins merge dataframes primarily based on spatial relationships. Within the instance under we’ll establish the districts which have bicycle lanes. We are going to first load the information as proven within the code under:

bike_lane = gpd.read_file(url)
bike_lane = bike_lane.loc[:,['ID','geometry']]
bike_lane.to_crs(epsg=2062, inplace=True)


Leveraging Geospatial Data in Python with GeoPandas


To spatially be a part of two dataframes we are able to use the sjoin() perform. The sjoin() perform takes 4 important arguments: the primary one is the GeoDataFrame, the second argument is the GeoDataFrame that we are going to add to the primary GeoDataFrame, the third argument is the kind of be a part of and the ultimate argument is the predicate which defines the spatial relation we wish to use to match the 2 GeoDataFrames. The most typical partial relationships are intersects, comprises, and inside. On this instance, we’ll the intersects parameter.

lanes_districts = gpd.sjoin(districts, bike_lane, how='interior', predicate="intersects")


Leveraging Geospatial Data in Python with GeoPandas


On this article, I launched you to Geospatial knowledge evaluation utilizing the open-source GeoPandas library. We began with downloading the GeoPandas package deal, after which we mentioned various kinds of Geospatial knowledge and the right way to load them. Lastly, we’ll discover fundamental operations to get your palms on the geospatial dataset. Though there may be nonetheless to discover with the geospatial knowledge evaluation, this weblog acts as a place to begin in your studying journey.
Youssef Rafaat is a pc imaginative and prescient researcher & knowledge scientist. His analysis focuses on creating real-time laptop imaginative and prescient algorithms for healthcare functions. He additionally labored as an information scientist for greater than 3 years within the advertising and marketing, finance, and healthcare area.

Related articles

You may also be interested in