There are 2 basic types of geospatial data:
Vector data
Raster data
For vector data, there are 3 basic data types:
a. Point
b. LineString
c. Polygon
from shapely.geometry import Point
p1 = Point(13, 27.21)
p1
The code is importing the Point class from the shapely.geometry module. The Point class is a geometric object that represents a point in 2D space with x and y coordinates.
Then, the code creates a new Point object named p1 by calling the Point constructor with the arguments (13, 27.21). This creates a point at the coordinates (13, 27.21) in 2D space.
Similarly, we can create a second or multiple points.
p2 = Point(61, 56.21)
p2
p3 = Point(30, -6.21)
p3
p4 = Point(2, -30.21)
p4
from shapely.geometry import LineString
line1 = LineString([p1, p2])
line1
The code creates a LineString object named line1 using the LineString class from the shapely.geometry module.
The LineString object represents a line in 2D space as a sequence of points. The LineString constructor takes a list of Point objects as its argument, so the code is passing a list containing two Point objects p1 and p2 as the argument to create a line segment connecting these two points.
line2 = LineString([p1, p2, p3])
line2
from shapely.geometry import Polygon
Polygon([p1, p2, p3])
Polygon([p1, p2, p3, p4])
We can have as many points as we need or want for polygons.
The MultiLineString
class in Python's Shapely library is used to represent collections of line objects in two-dimensional space. A MultiLineString object is a collection of one or more LineString objects that are considered as a single entity.
Example: dataset of the locations of weather stations in Bangladesh.
In geospatial data analysis, the MultiPoint class in Python's Shapely library is used to represent collections of point objects in two-dimensional space. A MultiPoint object is a collection of one or more Point objects that are considered as a single entity.
from shapely.geometry import MultiPoint, MultiLineString, MultiPolygon
multip1 = MultiPoint([p1, p2, p3, p4])
multip1
The MultiLineString
class in Python's Shapely library is used to represent collections of line objects in two-dimensional space. A MultiLineString
object is a collection of one or more LineString
objects that are considered as a single entity.
Example: dataset of roads in Dhaka.
The MultiPolygon class in Python's Shapely library is used to represent collections of polygon objects in two-dimensional space. A MultiPolygon object is a collection of one or more Polygon objects that are considered as a single entity.
Example: dataset of buildings footprints in New York City.
Let's create two polygons poly1
and poly2
from the four points we defined above and consider these polygons as a single entity -- a MultiPolygon.
# imports the MultiPolygon class from the shapely.geometry module
from shapely.geometry import MultiPolygon
poly1 = Polygon([p1, p2, p3])
poly2= Polygon([p2, p3, p4])
multi_poly = MultiPolygon([poly1, poly2])
multi_poly
poly1
poly2
type(multi_poly)
With all the above objects, we can use the function type
to see or check their data type.
p1.coords
coordinates = list(p1.coords)
coordinates
We can see that the coordinates are given as a list of tuple. The first value is the x-cordinate and the second value is the y-cordiante. Latitude is the x-cordinate and longitude is the y-coordinate.
coords
# latitude
coordinates = list(p1.coords)
coordinates[0][0]
# longitude
coordinates[0][1]
x
and y
attribute of the Point
object# latitude
p1.x
# longitude
p1.y
coords
line2.coords
coordinates = list(line2.coords)
coordinates
It shows the coordinates of the points (here, three points) used to create this line.
To get the latitude of the third point, we can access it using the coordinates
list in the following way:
coordinates[2][0]
xy
The xy
attribute of a LineString
object in Python's Shapely library returns a tuple containing two arrays representing the x and y coordinates of the line vertices.
line2.xy
# Getting longitude of the third point
line2.xy[1][2]
poly1.exterior
list(poly1.exterior.coords)
The bounds
attribute of a Polygon
or MultiPolygon
object in Python's Shapely library returns a tuple containing the minimum and maximum values of the x and y coordinates that define the bounding box of the polygon(s).
line1.length
poly1.length
poly1.area
The centroid of LineString
or Polygon
is a point representing the "center of mass" or "average position" of the object. The centroid is a useful measure of the location and shape of the object.
In Shapely, the centroid
property of a LineString
or Polygon
object returns a Point
object that represents the centroid of the object. This Point
object has an x and y coordinate.
line1.centroid
print(line1.centroid)
poly1.centroid
print(poly1.centroid)
Distance from point 1 (p1
) to point 2 (p2
).
p1.distance(p2)
Distance from point 3 (p3
) to point 1 (p1
).
p3.distance(p1)
Point
object or geometryp1.buffer(33)
MultiPoint
object or geometrymultip1.buffer(14)
LineString
object or geometryline1.buffer(12)
Polygon
object or geometrypoly1.buffer(8)
# just to see poly1 without buffer
poly1
The contains
method is a Boolean function from the Shapely library that returns True
if the point is inside the polygon, and False
otherwise.
poly1.contains(p3) # p3 is one of the corners of this polygon
new_point = Point(32, 28)
# plot the Polygon and Point objects together
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.plot(*poly1.exterior.xy)
ax.plot(new_point.x, new_point.y, 'o', color='red')
plt.show()
poly1.contains(new_point)
Create a new point that is outside the polygon boundary.
new_point2 = Point(25, 45)
# plot the Polygon and Point objects together
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.plot(*poly1.exterior.xy)
ax.plot(new_point2.x, new_point2.y, 'o', color='red')
plt.show()
poly1.contains(new_point2)
As expected, it says that this is out of the boundary of this polygon.
This is similar to contains
with the difference that it checks whether a object is within the boundary of another object.
new_point.within(poly1)
new_point2.within(poly1)
As the name suggests, crosses
allows us to see whether a geometry object (here, LineString) crosses another geometry object (here, a Polygon).
from shapely.geometry import Point, Polygon, LineString
import matplotlib.pyplot as plt
# create multiple Point objects to be utilized for creating a Polygon object
p1 = Point(13, 27.21)
p2 = Point(61, 56.21)
p3 = Point(30, -6.21)
# use these points to create Polygon object
poly1 = Polygon([p1, p2, p3])
# create a Point object
point = Point(50, 6)
# create two LineString objects
line1 = LineString([(32, 28), (25, 45)])
line2 = LineString([(40, -5), (50, 20)])
# plot the Polygon, Point, and LineString objects together
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.plot(*poly1.exterior.xy)
ax.plot(point.x, point.y, 'o', color='red')
ax.plot(*line1.xy, color='blue')
ax.plot(*line2.xy, color='orange')
plt.show()
We see that line1
or blue line crosses the polygon. How can we check this without checking visually? The answer is crosses
.
line1.crosses(poly1)
The result is as expected. Now, does the line line2
with orange color crosses the polygon? We know it doesn't, let's check this by using crosses
again.
line2.crosses(poly1)
The difference()
method is used to compute the difference between two geometric objects. When this method is called on a Polygon object, it returns a new Polygon object that represents the difference between the original Polygon object and another Polygon object passed as an argument to the difference()
method.
p1 = Point([20, 40])
p2 = Point([40, 80])
from shapely.geometry import Point
import matplotlib.pyplot as plt
# create two Point objects
point1 = Point(0.5, 0.5)
point2 = Point(1, 1)
# plot the Point objects together
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.plot(point1.x, point1.y, 'o', color='red')
ax.plot(point2.x, point2.y, 'o', color='blue')
plt.show()
from shapely.geometry import Point, Polygon
import matplotlib.pyplot as plt
# create two Point objects
point1 = Point(0.5, 0.5)
point2 = Point(1, 1)
# create Polygon objects with buffer zone around each point
buffer1 = point1.buffer(0.6)
buffer2 = point2.buffer(0.6)
# plot the Polygon objects with buffer zone
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.fill(*buffer1.exterior.xy, alpha=0.5, color='red')
ax.fill(*buffer2.exterior.xy, alpha=0.5, color='blue')
plt.show()
result = buffer1.difference(buffer2)
import matplotlib.pyplot as plt
from shapely.geometry import Point
# plot the difference between buffer1 and buffer2
fig, ax = plt.subplots()
ax.set_aspect('equal')
# ax.axis('off')
ax.fill(*result.exterior.xy, alpha=0.5, color='red')
plt.show()
result2 = buffer2.difference(buffer1)
import matplotlib.pyplot as plt
from shapely.geometry import Point
# plot the difference between buffer2 and buffer1
fig, ax = plt.subplots()
ax.set_aspect('equal')
# ax.axis('off')
ax.fill(*result2.exterior.xy, alpha=0.5, color='blue')
plt.show()
intersection
is used to obtain the region where two or more geometries overlap. When we apply the intersection
method to two geometries, the resulting geometry is a new geometry that represents the region where the two original geometries overlap. The intersection method can be applied to any type of geometry, including Points, Lines, and Polygons.
intersection = buffer1.intersection(buffer2)
# plot the intersection between buffer1 and buffer2
fig, ax = plt.subplots()
ax.set_aspect('equal')
# ax.axis('off')
ax.fill(*intersection.exterior.xy, alpha=0.5, color='red')
plt.show()
We use union
to combine two or more geometries into a single geometry. When we apply the union
method to two geometries, it results into new geometry that encompasses the area covered by both of the original geometries. We can use the union
method with Points
, Lines
, and Polygons
.
union = buffer1.union(buffer2)
# plot the union of buffer1 and buffer2
fig, ax = plt.subplots()
ax.set_aspect('equal')
# ax.axis('off')
ax.fill(*union.exterior.xy, alpha=0.5, color='red')
plt.show()
The symmetric_difference()
method in the Shapely library computes the symmetric difference between two geometric objects. It returns a new geometric object that represents the area of the geometries that do not overlap.
union = buffer1.symmetric_difference(buffer2)
# plot the union of buffer1 and buffer2
fig, ax = plt.subplots()
ax.set_aspect('equal')
# ax.axis('off')
for polygon in union.geoms:
ax.fill(*polygon.boundary.xy, alpha=0.5, color='red')
plt.show()
We will be using GeoPandas which is basically a Pandas dataframe plus GeoSeries which is a column that contains vector geometry. These geometries are handled as Shapely object and thus the operations that we have already done for different Shapely object can also be applied to GeoPandas geometries.
First, download the shapefile from this link and upload it to your Google Drive. Dowload link: https://drive.google.com/drive/folders/1Z4XFVorvwiLVJ90Ph9fgbov2QHn4LJ09?usp=share_link
from google.colab import drive
drive.mount('/content/drive')
# Put the path to the shapefile in your Google Drive here
path = "/content/drive/MyDrive/Geospatial_Python/Data/"
!pip install geopandas
import geopandas as gpd
import matplotlib.pyplot as plt
# Read the shapefile
bd = gpd.read_file(path+'BGD_adm3.shp')
# Plot the shapefile
bd.plot()
plt.title("Upazilas in Bangladesh")
# Show the plot
plt.show()
We can also zoom in to the northeast part of Bangladesh by using cx
property of the GeoPandas dataframe. Here, we are specifying the range for latitude and longitude.
bd.cx[91:92.5, 24:25.2].plot()
Let's have a look at the content of this GeoPandas dataframe.
bd.head()
In the field of GIS, the columns are called attributes. Column NAME_1
represents the name of the divisions, column NAME_2
represents the name of the districts, and column NAME_3
represents the name of the upazilas.
Now, we can get the data (and map) of any division, district or upazila by using dataframe subsetting.
# Get the data for Dhaka division and map it
dhaka = bd[bd['NAME_1'] == 'Dhaka']
dhaka.head()
We can see that now we have data for Dhaka division only (note the values of NAME_1
column or attribute).
dhaka.plot(color='green')
So, can we make a map of Rajshahi district? We can, by considering the field 'NAME_2' field corresponding to district name.
rajshahi = bd[bd['NAME_2'] == 'Rajshahi']
rajshahi.plot()
Now, we will find out how many upazilas are in Sylhet division.
sylhet_div = bd[bd['NAME_1'] == 'Sylhet'] # NAME_1 corresponds to division name
sylhet_div.head()
# Names of the upazilas
sylhet_div['NAME_3'].unique() # NAME_3 corresponds to upazila name
# Number of upazila
len(sylhet_div['NAME_3'].unique())
# Let's plot Bishwanath upazila
bishwanath = sylhet_div[sylhet_div['NAME_3'] == 'Bishwanath'] # subset the data to have data for Bishwanath upazila only
bishwanath.plot()
# Let's plot Chhatak upazila under Sylhet division
sylhet_div[sylhet_div['NAME_3'] == 'Chhatak'].plot()
We can use contextily
package for adding basemaps.
!pip install contextily
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
# Set the figure size
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the shapefile
bishwanath.plot(ax=ax, alpha=0.5)
# Add a basemap using contextily
ctx.add_basemap(ax=ax, source=ctx.providers.OpenStreetMap.Mapnik)
plt.title("The map of Bishwanath Upazila")
# Show the plot
plt.show()
We see that the OpenStreetMap is not displayed. This is because the CRS (Coordinate Reference System) of bishwanath GeoPandas dataframe and Openstreet Map are not the same. So, we need to put them to the same CRS and then plot.
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
# Set the figure size
fig, ax = plt.subplots(figsize=(10, 10))
bishwanath = bishwanath.to_crs(epsg=3857)
# Plot the shapefile
bishwanath.plot(ax=ax, alpha=0.5)
# Add a basemap using contextily
ctx.add_basemap(ax=ax, source=ctx.providers.OpenStreetMap.Mapnik)
plt.title("The map of Bishwanath Upazila")
# Show the plot
plt.show()
# Set the figure size
fig, ax = plt.subplots(figsize=(10, 10))
The above line creates a matplotlib figure and axis with a specified size of 10 inches by 10 inches.
bishwanath = bishwanath.to_crs(epsg=3857)
This line converts the coordinate reference system (CRS) of the bishwanath
GeoPandas dataframe to EPSG 3857, which is the CRS used by many online basemaps including OpenStreetMap. This ensures that the shapefile and basemap will be properly aligned.
# Plot the shapefile
bishwanath.plot(ax=ax, alpha=0.5)
This line plots the bishwanath
shapefile on the ax
axis with an alpha value of 0.5, making the shapefile semi-transparent.
# Add a basemap using contextily
ctx.add_basemap(ax=ax, source=ctx.providers.OpenStreetMap.Mapnik)
This line adds a basemap to the plot using contextily
. We pass the ax
axis as an argument and specify the OpenStreetMap.Mapnik
provider as the source of the basemap.
%matplotlib inline
# Set the figure size
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the shapefile
sylhet_div.plot(ax=ax, alpha=0.5)
# Add a basemap using contextily
ctx.add_basemap(ax=ax, crs=sylhet_div.crs.to_string(), source=ctx.providers.Stamen.TonerLite)
# Show the plot
plt.show()
Here, we will now read different bands of Landsat and plot these.
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
# Open the TIF file
with rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B1.tif') as src:
# Read the data
data = src.read()
# Visualize the data
show(data)
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
# Open the TIF files for the 5 bands
band1 = rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B1.tif')
band2 = rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B2.tif')
band3 = rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B3.tif')
band4 = rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B4.tif')
band5 = rasterio.open(path+'LC08_L1TP_137043_20180410_20180417_01_T1_B5.tif')
# Read the data for each band
data1 = band1.read(1)
data2 = band2.read(1)
data3 = band3.read(1)
data4 = band4.read(1)
data5 = band5.read(1)
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(10,10))
# Plot the 5 bands in the same plot
im1 = ax.imshow(data1, cmap='Reds', alpha=0.5)
im2 = ax.imshow(data2, cmap='Greens', alpha=0.5)
im3 = ax.imshow(data3, cmap='Blues', alpha=0.5)
im4 = ax.imshow(data4, cmap='Oranges', alpha=0.5)
im5 = ax.imshow(data5, cmap='Purples', alpha=0.5)
# Add a colorbar and title
fig.colorbar(im1, ax=ax, fraction=0.03, pad=0.04)
ax.set_title('Landsat Image')
# Show the plot
plt.show()
First, the code imports the necessary libraries: rasterio for reading the Landsat data and matplotlib for plotting the data.
import rasterio
import numpy as np
import matplotlib.pyplot as plt
Next, the code opens the TIF files for each band of the Landsat data using the rasterio.open()
function:
band1 = rasterio.open('path/to/band1.tif')
band2 = rasterio.open('path/to/band2.tif')
band3 = rasterio.open('path/to/band3.tif')
band4 = rasterio.open('path/to/band4.tif')
band5 = rasterio.open('path/to/band5.tif')
The open()
function returns a rasterio dataset object that can be used to read the data from the file.
To read the data for each band, the code uses the read()
function of the rasterio dataset object:
data1 = band1.read(1)
data2 = band2.read(1)
data3 = band3.read(1)
data4 = band4.read(1)
data5 = band5.read(1)
The read()
function returns a numpy array that contains the data for the band. The 1 argument specifies that we want to read the first band of the dataset.
Next, the code creates a figure and axis object using the subplots()
function of the matplotlib library:
fig, ax = plt.subplots(figsize=(10,10))
The figsize
parameter specifies the size of the figure in inches.
To plot the 5 bands in the same plot, the code uses the imshow()
function of the matplotlib library:
im1 = ax.imshow(data1, cmap='Reds', alpha=0.5)
im2 = ax.imshow(data2, cmap='Greens', alpha=0.5)
im3 = ax.imshow(data3, cmap='Blues', alpha=0.5)
im4 = ax.imshow(data4, cmap='Oranges', alpha=0.5)
im5 = ax.imshow(data5, cmap='Purples', alpha=0.5)
The imshow()
function displays an image on the axis. The cmap
parameter specifies the colormap to use, and the alpha
parameter specifies the transparency of the image.
To add a colorbar to the plot, the code uses the colorbar()
function of the matplotlib library:
fig.colorbar(im1, ax=ax, fraction=0.03, pad=0.04)
The colorbar()
function adds a colorbar to the figure. The im1
argument specifies the image object that the colorbar should be based on. The ax
parameter specifies the axis object that the colorbar should be added to. The fraction
parameter specifies the fraction of the axis height that the colorbar should occupy, and the pad
parameter specifies the space between the axis and the colorbar.
Finally, the code sets the title of the plot using the set_title()
function of the axis object:
ax.set_title('Landsat Image')
And it shows the plot using the show()
function of the matplotlib library:
plt.show()
The formula for calculating NDVI is:
NDVI = (NIR - Red) / (NIR + Red)
where NIR is the near-infrared band and Red is the red band of the Landsat data.
In case of Landsat, band5 is NIR and band4 is Red or red band.
# Calculate NDVI
ndvi = (data5 - data4) / (data5 + data4)
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the NDVI data
im = ax.imshow(ndvi, cmap='RdYlGn')
# Add a colorbar to the plot
fig.colorbar(im, ax=ax, fraction=0.03, pad=0.04)
# Set the title of the plot
ax.set_title('NDVI')
# Show the plot
plt.show()
GeoPandas https://geopandas.org/en/stable/
Shapely https://shapely.readthedocs.io/en/stable/manual.html