Reading a netcdf file

Authors:

Abdullah A. Fahad (a.fahad@nasa.gov) \ Tahmidul Azom Sany (tsany@gmu.edu) \ Torikul Islam Sanjid (torikul.sanjid@gmail.com) \

NetCDF data is a file format used in climate science to store and exchange multidimensional scientific data. It differs from CSV/Excel files as it can store metadata and n-dimensional arrays of data, making it easier to handle large and complex datasets with various dimensions and variables. In this tutorial we will explore:

  • NetCDF data structure
  • How to read/write NetCDF files with xarray
  • Handling NetCDF file format
In [1]:
# Loading Modules
import xarray as xr # reads and handles netcdf files with metadata
import numpy as np # module for numerical computing
import pandas as pd # module for data manipulation and analysis
In [2]:
# Download the NetCDF dataset we will be using
# this file will be saved on your left sidebar in /content/MERRA2_100.instM_2d_asm_Nx.198001.SUB.nc
# it will be deleted if the runtime restarts

!gdown --id 1PGKIhAHVN-LiR0FDwam7nbus3DgVBvT6
/usr/local/lib/python3.10/dist-packages/gdown/cli.py:121: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
  warnings.warn(
Downloading...
From: https://drive.google.com/uc?id=1PGKIhAHVN-LiR0FDwam7nbus3DgVBvT6
To: /content/MERRA2_100.instM_2d_asm_Nx.198001.SUB.nc
100% 477k/477k [00:00<00:00, 6.18MB/s]

Loading Netcdf Data from Google Drive

To work with large datasets, you may need to load the data directly from your Google Drive instead of downloading it. Here's how it done!

  • Step 01: Download and store the data in your google drive. Data link: https://drive.google.com/file/d/1PGKIhAHVN-LiR0FDwam7nbus3DgVBvT6/view?usp=share_link

  • Step 02: Now run this codeblock

    from google.colab import drive
    drive.mount('/content/drive')
  • Step 03: Load the MERRA2 Dataset with xarry (Replace the path with your own path)
    path = 'drive/MyDrive/project/python_tutorial/files/MERRA2_100.instM_2d_asm_Nx.198001.SUB.nc'
    data = xr.open_dataset(path)
In [3]:
# Load the dataset
data = xr.open_dataset('/content/MERRA2_100.instM_2d_asm_Nx.198001.SUB.nc')

Now, we will print the data information to see which variables and dimensions it contains. It shows that it has a 2m air temperature variable (T2M) with one time dimension (1980-01-01), 576 longitude values, and 361 latitude values.

In [4]:
# print the dataset
data
Out[4]:
<xarray.Dataset>
Dimensions:  (time: 1, lon: 576, lat: 361)
Coordinates:
  * time     (time) datetime64[ns] 1980-01-01
  * lon      (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
  * lat      (lat) float64 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0
Data variables:
    T2M      (time, lat, lon) float32 ...
Attributes: (12/32)
    CDI:                               Climate Data Interface version 1.9.8 (...
    Conventions:                       CF-1
    History:                           Original file generated: Thu May  7 21...
    Filename:                          MERRA2_100.instM_2d_asm_Nx.198001.nc4
    Comment:                           GMAO filename: d5124_m2_jan79.inst1_2d...
    Institution:                       NASA Global Modeling and Assimilation ...
    ...                                ...
    Contact:                           http://gmao.gsfc.nasa.gov
    identifier_product_doi:            10.5067/5ESKGQTZG7FO
    RangeBeginningTime:                00:00:00.000000
    RangeEndingTime:                   23:00:00.000000
    history_L34RS:                     'Created by L34RS v1.4.3 @ NASA GES DI...
    CDO:                               Climate Data Operators version 1.9.8 (...

Explaning this Output

Printing the data object will display a summary of the Xarray dataset,including its dimensions,variables, and attributes.

Dimension

Dimensions:     (time: 1,lon: 576,lat: 361)

This line shows the dimensions of the dataset.In this case,there are 3 dimensions:lat,lon and time.The numbers in the parenthesis indicate the size of each dimension.So there are 361 latitudes, 576 longitudes,and 1 timesteps in the dataset.

Coordinates

Coordinates:
   time             (time) datetime64[ns] 1980-01-01
   lon              (lon) float64 -180.0 -179.4 ... 178.8 179.4
   lat              (lat) float64 -90.0 -89.5 -89.0 ... 89.5 90.0

This section shows the coordinate variables that define the dimensions of the dataset. Each coordinate variable has a name in parentheses and a data type e.g. float64 for latitude and longitude, and datetime64 for time.

Data Variables

Data variables:     T2M         (time, lat, lon)float32 ...

This section shows the data variables contained in the dataset. Each variable has a name in parentheses and a data type which is float32 in this case. The dimensions of each variable are listed in parentheses.

Indexes

Indexes:
time        PandasIndex
lon         PandasIndex
lat         PandasIndex

The Indexes section of this example lists the index objects for each of the dataset's dimensions. In this case, the dataset has three dimensions: time, lat, and lon, and each of these dimensions has a PandasIndex object associated with it.

For example,thetime index would contain the dates and times corresponding to the dataset's time dimension, while the lat and lon indices would contain the latitude and longitude values corresponding to the dataset's latitude and longitude dimensions, respectively.

Attributes

This section shows any global attributes that are present in the dataset.Global attributes provide information about the whole dataset such as its title,creator and version.Each attribute has a name and a value.In this example, the Conventions attribute indicates that the dataset follows the Climate and Forecast (CF) metadata conventions, and the title attribute provides a brief description of the dataset.

NetCDF Data Model

The netCDF data model is based on three major components:

  • Variables
  • Dimensions
  • Attributes

Dimensions

Dimensions are used to define the shape of the data stored in a netcdf file.Each dimension is given a name,a length, and a tag indicating whether it is an unlimited dimension,which can be expanded to accommodate new data without rewriting the entire file.

Variables

Variables are used to store multidimensional arrays of data in a NetCDF file. Each variable is given a name, a data type, a shape defined by one or more dimensions, and a set of attributes that describe the data.

Attributes

Attributes are used to store metadata about the data in a NetCDF file. Each attribute is given a name and a value, which can be a scalar, a one-dimensional array, or a multidimensional array. Attributes can be attached to dimensions, variables, or the global attributes of the file itself.

Example

Let's suppose we have defined three dimensions:Time, Latitude, Longitude to store data.

Now,we can create data variables e.g. temperature to store temperature data with a data type and dimensions corresponding to time,lat and lon.

temperature     (time, lat, lon)      float32

Here temperauture is depending on three dimensions.So we will get different values of temperature for any change in one of these dimensions. we can add more data variables here if we want e.g.

precipitation   (time, lat, lon)      float32

wind_speed      (time, lat, lon)      float32

pressure        (time, lat, lon)      float32

Lastly,we might include an attribute named units for the temperature variable to specify that the temperature values are in degrees Celsius. We could also include attributes for the time, lat, and lon dimensions to specify their names and units.

Together, dimensions, variables, and attributes form the basic building blocks of the NetCDF data model. By using these components, NetCDF files can store and organize complex scientific data sets in a way that is platform-independent, flexible, and efficient.

Data types supported by Netcdf

Different data types can be used in a netcdf file for storing scientific data,including:

  • Byte - 8-bit signed integer
  • Short - 16-bit signed integer
  • Integer - 32-bit signed integer
  • Float - 32-bit floating point
  • Double - 64-bit floating point
  • Character - 8-bit character

Creating a Netcdf file

First, We have to install netcdf.

!apt-get install libnetcdf-dev netcdf-bin
!pip install netcdf4

Step-1

Importing the required packages,netcdf and numpy.

import netCDF4 as nc
import numpy as np

Step-2

Suppressing any warnings that may appear during the execution of the script.

import warnings
warnings.filterwarnings('ignore')

Step-3

Creating a new NetCDF file named temperature_pressure.nc for writing using the nc.Dataset method.

ncfile = nc.Dataset('temperature_pressure.nc','w')

Step-4

Defining the dimensions of the data using the createDimension method. In this example, we define the dimensions of time, lat, and lon.

time = ncfile.createDimension('time', None)
lat = ncfile.createDimension('lat', 10)
lon = ncfile.createDimension('lon', 10)

Step-5

Creating variables for the data using the createVariable method. In this example, we create variables for time, lat, lon, temperature, and pressure. We also set their attributes using the .units and .long_name properties.

times = ncfile.createVariable('time', np.float64, ('time',))
times.units = 'days since 2022-01-01 00:00:00'
times.long_name = 'Time'

lats = ncfile.createVariable('lat', np.float32, ('lat',))
lats.units = 'degrees_north'
lats.long_name = 'Latitude'

lons = ncfile.createVariable('lon', np.float32, ('lon',))
lons.units = 'degrees_east'
lons.long_name = 'Longitude'

temp = ncfile.createVariable('temperature', np.float32, ('time', 'lat', 'lon',))
temp.units = 'Celsius'
temp.long_name = 'Temperature'

pres = ncfile.createVariable('pressure', np.float32, ('time', 'lat', 'lon',))
pres.units = 'hPa'
pres.long_name = 'Pressure'

Step-6

Creating some fake temperature and pressure data to write to the file. In this example, we create 5 time values, 10 latitude values, and 10 longitude values. We also create temperature and pressure data using the np.random.randint method.

num_times = 5
num_lats = 10
num_lons = 10

times_data = np.arange(num_times)
lats_data = np.linspace(-90, 90, num_lats)
lons_data = np.linspace(-180, 180, num_lons)

temp_data = np.random.randint(-30, 40, (num_times, num_lats, num_lons))
pres_data = np.random.randint(900, 1100, (num_times, num_lats, num_lons))

Step-7

Writing the data to the variables in the NetCDF file using the [:] notation.

times[:] = times_data
lats[:] = lats_data
lons[:] = lons_data
temp[:] = temp_data
pres[:] = pres_data

Step-8

Setting metadata for the NetCDF file using the .title, .history, and .source properties.

ncfile.title = 'Temperature and Pressure Data'
ncfile.history = 'Created on February 15, 2023'
ncfile.source = 'Generated by Python script'

Step-9

Finally,closing the NetCDF file using the .close() method.

ncfile.close()

Here's the complete code

In [5]:
# Installation of netcdf
!apt-get install libnetcdf-dev netcdf-bin
!pip install netcdf4
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnetcdf-dev is already the newest version (1:4.8.1-1).
libnetcdf-dev set to manually installed.
The following NEW packages will be installed:
  netcdf-bin
0 upgraded, 1 newly installed, 0 to remove and 9 not upgraded.
Need to get 204 kB of archives.
After this operation, 557 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 netcdf-bin amd64 1:4.8.1-1 [204 kB]
Fetched 204 kB in 1s (201 kB/s)
Selecting previously unselected package netcdf-bin.
(Reading database ... 120493 files and directories currently installed.)
Preparing to unpack .../netcdf-bin_1%3a4.8.1-1_amd64.deb ...
Unpacking netcdf-bin (1:4.8.1-1) ...
Setting up netcdf-bin (1:4.8.1-1) ...
Processing triggers for man-db (2.10.2-1) ...
Collecting netcdf4
  Downloading netCDF4-1.6.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 11.9 MB/s eta 0:00:00
Collecting cftime (from netcdf4)
  Downloading cftime-1.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 27.9 MB/s eta 0:00:00
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from netcdf4) (2023.7.22)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from netcdf4) (1.22.4)
Installing collected packages: cftime, netcdf4
Successfully installed cftime-1.6.2 netcdf4-1.6.4
In [6]:
import netCDF4 as nc
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# creating a new NetCDF file
ncfile = nc.Dataset('temperature_pressure.nc','w')

# defining the dimensions of the data
time = ncfile.createDimension('time', None)
lat = ncfile.createDimension('lat', 10)
lon = ncfile.createDimension('lon', 10)

# creating variables for the data and set their attributes
times = ncfile.createVariable('time', np.float64, ('time',))
times.units = 'days since 2022-01-01 00:00:00'
times.long_name = 'Time'
lats = ncfile.createVariable('lat', np.float32, ('lat',))
lats.units = 'degrees_north'
lats.long_name = 'Latitude'
lons = ncfile.createVariable('lon', np.float32, ('lon',))
lons.units = 'degrees_east'
lons.long_name = 'Longitude'
temp = ncfile.createVariable('temperature', np.float32, ('time', 'lat', 'lon',))
temp.units = 'Celsius'
temp.long_name = 'Temperature'
pres = ncfile.createVariable('pressure', np.float32, ('time', 'lat', 'lon',))
pres.units = 'hPa'
pres.long_name = 'Pressure'

# creating some fake temperature and pressure data to write to the file
num_times = 5
num_lats = 10
num_lons = 10
times_data = np.arange(num_times)
lats_data = np.linspace(-90, 90, num_lats)
lons_data = np.linspace(-180, 180, num_lons)
temp_data = np.random.randint(-30, 40, (num_times, num_lats, num_lons))
pres_data = np.random.randint(900, 1100, (num_times, num_lats, num_lons))

# writing the data to the variables in the NetCDF file
times[:] = times_data
lats[:] = lats_data
lons[:] = lons_data
temp[:] = temp_data
pres[:] = pres_data

# setting metadata for the NetCDF file
ncfile.title = 'Temperature and Pressure Data'
ncfile.history = 'Created on February 15, 2023'
ncfile.source = 'Generated by Python script'

# closing the NetCDF file
ncfile.close()

How to read/write data from NetCDF files.

ncdump

ncdump is a command line.It is used to exibits the metadata and data of a netCDF file so that we humans can read.Basically, id is used to show the contents of netCDF files.

ncdump has a lots of options:

ncdump [-c|-h] [-v ...] [-k] [-t] [-s] [-b lang] [-f lang]
       [-l len] [-n name] [-p fdig[,ddig]] [-x] file.nc

  [-c]               Coordinate variable data and header information
  [-h]               Header information only, no data
  [-v var1[,...]]    Data for variable(s) var1,... only
  [-k]               Output kind of netCDF file
  [-t]               Output time data as ISO date-time strings

  [-s]               Output special (virtual) attributes
  [-b [c|f]]         Brief annotations for C or Fortran indices in data
  [-f [c|f]]         Full annotations for C or Fortran indices in data
  [-l len]           Line length maximum in data section (default 80)
  [-n name]          Name for netCDF (default derived from file name)
  [-p n[,n]]         Display floating-point values with less precision
  [-x]               Output NcML instead of CDL (netCDF-3 files only)

  file.nc            Name of netCDF file

**Let's try it on a random netCDF file that is obtained from observations by different organizations.**

**Step-1** We need to download a netcdf file from a organization.for example from Unidata We are downloading this file:

https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc

To download the file we use the code in the cell below.

In [7]:
#downloading the file
!wget -nc https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc -O ECMWF_ERA-40_subset.nc
--2023-07-28 19:51:22--  https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc
Resolving www.unidata.ucar.edu (www.unidata.ucar.edu)... 128.117.149.20
Connecting to www.unidata.ucar.edu (www.unidata.ucar.edu)|128.117.149.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22165040 (21M) [application/x-netcdf]
Saving to: ‘ECMWF_ERA-40_subset.nc’

ECMWF_ERA-40_subset 100%[===================>]  21.14M  10.2MB/s    in 2.1s    

2023-07-28 19:51:25 (10.2 MB/s) - ‘ECMWF_ERA-40_subset.nc’ saved [22165040/22165040]

**Step-2**

Then we can simply use the ncdump:

In [8]:
#Applying ncdump
!ncdump -ct ECMWF_ERA-40_subset.nc
netcdf ECMWF_ERA-40_subset {
dimensions:
	longitude = 144 ;
	latitude = 73 ;
	time = UNLIMITED ; // (62 currently)
variables:
	float longitude(longitude) ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int time(time) ;
		time:units = "hours since 1900-01-01 00:00:0.0" ;
		time:long_name = "time" ;
	short tcw(time, latitude, longitude) ;
		tcw:scale_factor = 0.0013500981745481 ;
		tcw:add_offset = 44.3250482744756 ;
		tcw:_FillValue = -32767s ;
		tcw:missing_value = -32767s ;
		tcw:units = "kg m**-2" ;
		tcw:long_name = "Total column water" ;
	short tcwv(time, latitude, longitude) ;
		tcwv:scale_factor = 0.001327110772669 ;
		tcwv:add_offset = 43.5704635546154 ;
		tcwv:_FillValue = -32767s ;
		tcwv:missing_value = -32767s ;
		tcwv:units = "kg m**-2" ;
		tcwv:long_name = "Total column water vapour" ;
	short lsp(time, latitude, longitude) ;
		lsp:scale_factor = 8.03329303850659e-07 ;
		lsp:add_offset = 0.0263210846406669 ;
		lsp:_FillValue = -32767s ;
		lsp:missing_value = -32767s ;
		lsp:units = "m" ;
		lsp:long_name = "Stratiform precipitation (Large-scale precipitation)" ;
	short cp(time, latitude, longitude) ;
		cp:scale_factor = 4.82483645945993e-07 ;
		cp:add_offset = 0.0158085766594205 ;
		cp:_FillValue = -32767s ;
		cp:missing_value = -32767s ;
		cp:units = "m" ;
		cp:long_name = "Convective precipitation" ;
	short msl(time, latitude, longitude) ;
		msl:scale_factor = 0.1721754257462 ;
		msl:add_offset = 99424.2653245743 ;
		msl:_FillValue = -32767s ;
		msl:missing_value = -32767s ;
		msl:units = "Pa" ;
		msl:long_name = "Mean sea level pressure" ;
	short blh(time, latitude, longitude) ;
		blh:scale_factor = 0.108739383344517 ;
		blh:add_offset = 3570.14367055165 ;
		blh:_FillValue = -32767s ;
		blh:missing_value = -32767s ;
		blh:units = "m" ;
		blh:long_name = "Boundary layer height" ;
	short tcc(time, latitude, longitude) ;
		tcc:scale_factor = 1.52597204419215e-05 ;
		tcc:add_offset = 0.499984740280558 ;
		tcc:_FillValue = -32767s ;
		tcc:missing_value = -32767s ;
		tcc:units = "(0 - 1)" ;
		tcc:long_name = "Total cloud cover" ;
	short p10u(time, latitude, longitude) ;
		p10u:scale_factor = 0.0007584155104299 ;
		p10u:add_offset = -0.440509086897149 ;
		p10u:_FillValue = -32767s ;
		p10u:missing_value = -32767s ;
		p10u:units = "m s**-1" ;
		p10u:long_name = "10 metre U wind component" ;
	short p10v(time, latitude, longitude) ;
		p10v:scale_factor = 0.000664359461014752 ;
		p10v:add_offset = -0.745888358484452 ;
		p10v:_FillValue = -32767s ;
		p10v:missing_value = -32767s ;
		p10v:units = "m s**-1" ;
		p10v:long_name = "10 metre V wind component" ;
	short p2t(time, latitude, longitude) ;
		p2t:scale_factor = 0.00183558351993706 ;
		p2t:add_offset = 262.398478747535 ;
		p2t:_FillValue = -32767s ;
		p2t:missing_value = -32767s ;
		p2t:units = "K" ;
		p2t:long_name = "2 metre temperature" ;
	short p2d(time, latitude, longitude) ;
		p2d:scale_factor = 0.00161126451178551 ;
		p2d:add_offset = 251.887106386855 ;
		p2d:_FillValue = -32767s ;
		p2d:missing_value = -32767s ;
		p2d:units = "K" ;
		p2d:long_name = "2 metre dewpoint temperature" ;
	short e(time, latitude, longitude) ;
		e:scale_factor = 1.16702451907916e-07 ;
		e:add_offset = -0.00232199712964108 ;
		e:_FillValue = -32767s ;
		e:missing_value = -32767s ;
		e:units = "m of water" ;
		e:long_name = "Evaporation" ;
	short lcc(time, latitude, longitude) ;
		lcc:scale_factor = 1.52597204419215e-05 ;
		lcc:add_offset = 0.499984740279558 ;
		lcc:_FillValue = -32767s ;
		lcc:missing_value = -32767s ;
		lcc:units = "(0 - 1)" ;
		lcc:long_name = "Low cloud cover" ;
	short mcc(time, latitude, longitude) ;
		mcc:scale_factor = 1.52597204419215e-05 ;
		mcc:add_offset = 0.499984740279558 ;
		mcc:_FillValue = -32767s ;
		mcc:missing_value = -32767s ;
		mcc:units = "(0 - 1)" ;
		mcc:long_name = "Medium cloud cover" ;
	short hcc(time, latitude, longitude) ;
		hcc:scale_factor = 1.52597204419215e-05 ;
		hcc:add_offset = 0.499984740280558 ;
		hcc:_FillValue = -32767s ;
		hcc:missing_value = -32767s ;
		hcc:units = "(0 - 1)" ;
		hcc:long_name = "High cloud cover" ;
	short tco3(time, latitude, longitude) ;
		tco3:scale_factor = 7.69770539069593e-08 ;
		tco3:add_offset = 0.00736908367510674 ;
		tco3:_FillValue = -32767s ;
		tco3:missing_value = -32767s ;
		tco3:units = "kg m**-2" ;
		tco3:long_name = "Total column ozone" ;
	short tp(time, latitude, longitude) ;
		tp:scale_factor = 1.05226955985452e-06 ;
		tp:add_offset = 0.0344776121286335 ;
		tp:_FillValue = -32767s ;
		tp:missing_value = -32767s ;
		tp:units = "m" ;
		tp:long_name = "Total precipitation" ;

// global attributes:
		:Conventions = "CF-1.0" ;
		:history = "2004-09-15 17:04:29 GMT by mars2netcdf-0.92" ;
data:

 longitude = 0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30, 
    32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65, 
    67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 
    102.5, 105, 107.5, 110, 112.5, 115, 117.5, 120, 122.5, 125, 127.5, 130, 
    132.5, 135, 137.5, 140, 142.5, 145, 147.5, 150, 152.5, 155, 157.5, 160, 
    162.5, 165, 167.5, 170, 172.5, 175, 177.5, 180, 182.5, 185, 187.5, 190, 
    192.5, 195, 197.5, 200, 202.5, 205, 207.5, 210, 212.5, 215, 217.5, 220, 
    222.5, 225, 227.5, 230, 232.5, 235, 237.5, 240, 242.5, 245, 247.5, 250, 
    252.5, 255, 257.5, 260, 262.5, 265, 267.5, 270, 272.5, 275, 277.5, 280, 
    282.5, 285, 287.5, 290, 292.5, 295, 297.5, 300, 302.5, 305, 307.5, 310, 
    312.5, 315, 317.5, 320, 322.5, 325, 327.5, 330, 332.5, 335, 337.5, 340, 
    342.5, 345, 347.5, 350, 352.5, 355, 357.5 ;

 latitude = 90, 87.5, 85, 82.5, 80, 77.5, 75, 72.5, 70, 67.5, 65, 62.5, 60, 
    57.5, 55, 52.5, 50, 47.5, 45, 42.5, 40, 37.5, 35, 32.5, 30, 27.5, 25, 
    22.5, 20, 17.5, 15, 12.5, 10, 7.5, 5, 2.5, 0, -2.5, -5, -7.5, -10, -12.5, 
    -15, -17.5, -20, -22.5, -25, -27.5, -30, -32.5, -35, -37.5, -40, -42.5, 
    -45, -47.5, -50, -52.5, -55, -57.5, -60, -62.5, -65, -67.5, -70, -72.5, 
    -75, -77.5, -80, -82.5, -85, -87.5, -90 ;

 time = "2002-07-01 12", "2002-07-01 18", "2002-07-02 12", "2002-07-02 18", 
    "2002-07-03 12", "2002-07-03 18", "2002-07-04 12", "2002-07-04 18", 
    "2002-07-05 12", "2002-07-05 18", "2002-07-06 12", "2002-07-06 18", 
    "2002-07-07 12", "2002-07-07 18", "2002-07-08 12", "2002-07-08 18", 
    "2002-07-09 12", "2002-07-09 18", "2002-07-10 12", "2002-07-10 18", 
    "2002-07-11 12", "2002-07-11 18", "2002-07-12 12", "2002-07-12 18", 
    "2002-07-13 12", "2002-07-13 18", "2002-07-14 12", "2002-07-14 18", 
    "2002-07-15 12", "2002-07-15 18", "2002-07-16 12", "2002-07-16 18", 
    "2002-07-17 12", "2002-07-17 18", "2002-07-18 12", "2002-07-18 18", 
    "2002-07-19 12", "2002-07-19 18", "2002-07-20 12", "2002-07-20 18", 
    "2002-07-21 12", "2002-07-21 18", "2002-07-22 12", "2002-07-22 18", 
    "2002-07-23 12", "2002-07-23 18", "2002-07-24 12", "2002-07-24 18", 
    "2002-07-25 12", "2002-07-25 18", "2002-07-26 12", "2002-07-26 18", 
    "2002-07-27 12", "2002-07-27 18", "2002-07-28 12", "2002-07-28 18", 
    "2002-07-29 12", "2002-07-29 18", "2002-07-30 12", "2002-07-30 18", 
    "2002-07-31 12", "2002-07-31 18" ;
}
In [8]:

Using Xarray for Netcdf

Xarray is a powerful python library for working with NerCDF format files. It offers a wide range of functions that are similar to those in pandas and numpy, making it easy to manipulate and analyze multidimensional arrays of data.The two primary data structures in Xarray are Dataarray and Dataset, both of which are similar to numpy.ndarrays.Xarray also supports groupby operations, similar to those in pandas , which can be used for more complex data manipulation.Additionally,Xarray provides simple and intuitive visualization capabilities.

Firstly, We can use the xr.open_dataset(file) function to open a NetCDF file.

In [9]:
# Importing xarray
import xarray as xr
# Opening the netcdf file
data_set=xr.open_dataset('/content/ECMWF_ERA-40_subset.nc')
In [10]:
data_set
Out[10]:
<xarray.Dataset>
Dimensions:    (longitude: 144, latitude: 73, time: 62)
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Data variables: (12/17)
    tcw        (time, latitude, longitude) float32 ...
    tcwv       (time, latitude, longitude) float32 ...
    lsp        (time, latitude, longitude) float32 ...
    cp         (time, latitude, longitude) float32 ...
    msl        (time, latitude, longitude) float32 ...
    blh        (time, latitude, longitude) float32 ...
    ...         ...
    e          (time, latitude, longitude) float32 ...
    lcc        (time, latitude, longitude) float32 ...
    mcc        (time, latitude, longitude) float32 ...
    hcc        (time, latitude, longitude) float32 ...
    tco3       (time, latitude, longitude) float32 ...
    tp         (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.0
    history:      2004-09-15 17:04:29 GMT by mars2netcdf-0.92

This output give us a very well-organized view about the dimensions,coordinates,data variables and attributes.

We can see only one variable using the cell below

In [11]:
# data_set['msl']

# or,

data_set.msl
Out[11]:
<xarray.DataArray 'msl' (time: 62, latitude: 73, longitude: 144)>
[651744 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

Here we can see all the information about the variable.The long name of msl varaible is Mean sea level pressure which we can see from Attribute section.

In [12]:
msl=data_set.msl
print(msl)
<xarray.DataArray 'msl' (time: 62, latitude: 73, longitude: 144)>
[651744 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

Cutting a subset

In [13]:
data_set.variables['msl'][0:2,0:4,0:3]
Out[13]:
<xarray.Variable (time: 2, latitude: 4, longitude: 3)>
[24 values with dtype=float32]
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

This code cuts a subset of the msl variable from the netCDF dataset data_set. The subset contains the first two time steps, the first four latitudes, and the first three longitudes. The resulting array is a 3-dimensional numpy array with shape (2, 4, 3) containing the values of the 'msl' variable at the selected indices.

In [14]:
data_set.msl.isel(time=slice(0,2), latitude=slice(0,4), longitude=slice(0,3))
Out[14]:
<xarray.DataArray 'msl' (time: 2, latitude: 4, longitude: 3)>
[24 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 2002-07-01T18:00:00
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

This code also cuts a subset of the msl variable in the data_set dataset using isel() function to index the variable along the time, latitude, and longitude dimensions.The resulting subset will include only the selected indices along each dimension, and the other indices will be excluded.

Dimensions of a variable

In [15]:
data_set.msl.dims
Out[15]:
('time', 'latitude', 'longitude')

Here we showed the dimensions of the data_set.msl variable using the dims attribute.

Coordinates of a variable

In [16]:
data_set.msl.coords
Out[16]:
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00

Using coords attribute gives us a dictionary like object contaning the coordinates of the data along each dimension for the data_set.msl variable.

Atrributes of a variable

In [17]:
data_set.msl.attrs
Out[17]:
{'units': 'Pa', 'long_name': 'Mean sea level pressure'}

data_set.msl.attrs returns the attributes of the msl variable in the data_set dataset.

In [18]:
# ds.air.attrs['who_is_awesome']='xarray'
# ds.air.attrs

Values of a variable

In [19]:
data_set.msl.values
Out[19]:
array([[[100476.77 , 100476.77 , 100476.77 , ..., 100476.77 ,
         100476.77 , 100476.77 ],
        [100618.65 , 100607.28 , 100596.09 , ..., 100647.57 ,
         100637.76 , 100628.29 ],
        [100623.984, 100599.016, 100574.055, ..., 100711.1  ,
         100682.01 , 100653.08 ],
        ...,
        [ 99444.06 ,  99535.15 ,  99626.055, ...,  99149.82 ,
          99247.79 ,  99346.1  ],
        [ 99650.85 ,  99689.586,  99727.984, ...,  99499.51 ,
          99550.125,  99600.57 ],
        [ 99822.85 ,  99822.85 ,  99822.85 , ...,  99822.85 ,
          99822.85 ,  99822.85 ]],

       [[       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        ...,
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan]],

       [[100364.69 , 100364.69 , 100364.69 , ..., 100364.69 ,
         100364.69 , 100364.69 ],
        [100635.34 , 100638.1  , 100640.86 , ..., 100618.13 ,
         100623.81 , 100629.664],
        [100772.4  , 100763.79 , 100755.695, ..., 100775.16 ,
         100774.12 , 100773.086],
        ...,
        [ 99937.87 , 100068.55 , 100199.4  , ...,  99526.88 ,
          99663.94 ,  99800.81 ],
        [100378.805, 100423.914, 100468.85 , ..., 100214.2  ,
         100268.96 , 100323.88 ],
        [100384.31 , 100384.31 , 100384.31 , ..., 100384.31 ,
         100384.31 , 100384.31 ]],

       ...,

       [[       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        ...,
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan]],

       [[101339.375, 101339.375, 101339.375, ..., 101339.375,
         101339.375, 101339.375],
        [101341.95 , 101315.266, 101288.75 , ..., 101416.85 ,
         101391.88 , 101366.92 ],
        [101494.16 , 101432.695, 101371.23 , ..., 101656.52 ,
         101602.45 , 101548.39 ],
        ...,
        [100403.26 , 100481.94 , 100560.625, ..., 100222.125,
         100282.734, 100342.99 ],
        [100692.85 , 100717.48 , 100741.92 , ..., 100617.44 ,
         100642.75 , 100667.89 ],
        [100670.984, 100670.984, 100670.984, ..., 100670.984,
         100670.984, 100670.984]],

       [[       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        ...,
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan]]], dtype=float32)

In the above cell, data_set.msl.values returns the data values of the msl variable as a numpy array.

Pulling out the data of 3rd July,2002

In [20]:
data_set.sel(time='2002-07-03')
Out[20]:
<xarray.Dataset>
Dimensions:    (longitude: 144, latitude: 73, time: 2)
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-03T12:00:00 2002-07-03T18:00:00
Data variables: (12/17)
    tcw        (time, latitude, longitude) float32 ...
    tcwv       (time, latitude, longitude) float32 ...
    lsp        (time, latitude, longitude) float32 ...
    cp         (time, latitude, longitude) float32 ...
    msl        (time, latitude, longitude) float32 1.002e+05 1.002e+05 ... nan
    blh        (time, latitude, longitude) float32 ...
    ...         ...
    e          (time, latitude, longitude) float32 ...
    lcc        (time, latitude, longitude) float32 ...
    mcc        (time, latitude, longitude) float32 ...
    hcc        (time, latitude, longitude) float32 ...
    tco3       (time, latitude, longitude) float32 ...
    tp         (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.0
    history:      2004-09-15 17:04:29 GMT by mars2netcdf-0.92

This code selects the data corresponding to a specific time point, which is 2002-07-03. It uses the sel() method of xarray to select a single time slice along the time dimension.we can also add other dimensions.

Nearest indexing

In [21]:
data_set.sel(longitude=240.2,method='nearest')
Out[21]:
<xarray.Dataset>
Dimensions:    (latitude: 73, time: 62)
Coordinates:
    longitude  float32 240.0
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Data variables: (12/17)
    tcw        (time, latitude) float32 ...
    tcwv       (time, latitude) float32 ...
    lsp        (time, latitude) float32 ...
    cp         (time, latitude) float32 ...
    msl        (time, latitude) float32 1.005e+05 1.004e+05 ... nan nan
    blh        (time, latitude) float32 ...
    ...         ...
    e          (time, latitude) float32 ...
    lcc        (time, latitude) float32 ...
    mcc        (time, latitude) float32 ...
    hcc        (time, latitude) float32 ...
    tco3       (time, latitude) float32 ...
    tp         (time, latitude) float32 ...
Attributes:
    Conventions:  CF-1.0
    history:      2004-09-15 17:04:29 GMT by mars2netcdf-0.92

This code provide us the data at the longitude closest to 240.2 degrees.The sel() method is used to select data along a dimension. In this case, we are selecting data along the longitude dimension, with the value closest to 240.2 degrees. Here, we are using 'nearest' method which finds the nearest value to the given coordinate along given dimension.

In [22]:
data_set.sel(longitude=[240,125,234],latitude=[40.3,50.3],method='nearest')
Out[22]:
<xarray.Dataset>
Dimensions:    (longitude: 3, latitude: 2, time: 62)
Coordinates:
  * longitude  (longitude) float32 240.0 125.0 235.0
  * latitude   (latitude) float32 40.0 50.0
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Data variables: (12/17)
    tcw        (time, latitude, longitude) float32 ...
    tcwv       (time, latitude, longitude) float32 ...
    lsp        (time, latitude, longitude) float32 ...
    cp         (time, latitude, longitude) float32 ...
    msl        (time, latitude, longitude) float32 1.016e+05 1.009e+05 ... nan
    blh        (time, latitude, longitude) float32 ...
    ...         ...
    e          (time, latitude, longitude) float32 ...
    lcc        (time, latitude, longitude) float32 ...
    mcc        (time, latitude, longitude) float32 ...
    hcc        (time, latitude, longitude) float32 ...
    tco3       (time, latitude, longitude) float32 ...
    tp         (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.0
    history:      2004-09-15 17:04:29 GMT by mars2netcdf-0.92

This is same as the above code cell.In this case, the nearest values to longitude values of [240, 125, 234] and latitude values of [40.3, 50.3] will be produced as output.

Pulling out data for time index 0, lat index 2, and lon index 3

In [23]:
data_set.msl.isel(time=0,latitude=2,longitude=3)
Out[23]:
<xarray.DataArray 'msl' ()>
array(100549.086, dtype=float32)
Coordinates:
    longitude  float32 7.5
    latitude   float32 85.0
    time       datetime64[ns] 2002-07-01T12:00:00
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

Here, .isel() method is used to select the data based on the index values for the dimensions specified in the parentheses.We are trying to extract the value fo msl at the first time step, latitude index 2 and longitude index 3.

Slicing

In [24]:
data_set.msl.isel(latitude=slice(10))
Out[24]:
<xarray.DataArray 'msl' (time: 62, latitude: 10, longitude: 144)>
array([[[100476.77 , 100476.77 , ..., 100476.77 , 100476.77 ],
        [100618.65 , 100607.28 , ..., 100637.76 , 100628.29 ],
        ...,
        [100714.89 , 100574.74 , ..., 100938.89 , 100836.79 ],
        [100373.984, 100223.33 , ..., 100625.875, 100502.26 ]],

       [[       nan,        nan, ...,        nan,        nan],
        [       nan,        nan, ...,        nan,        nan],
        ...,
        [       nan,        nan, ...,        nan,        nan],
        [       nan,        nan, ...,        nan,        nan]],

       ...,

       [[101339.375, 101339.375, ..., 101339.375, 101339.375],
        [101341.95 , 101315.266, ..., 101391.88 , 101366.92 ],
        ...,
        [102766.19 , 102712.984, ..., 102826.97 , 102811.65 ],
        [102814.74 , 102754.14 , ..., 102918.39 , 102868.46 ]],

       [[       nan,        nan, ...,        nan,        nan],
        [       nan,        nan, ...,        nan,        nan],
        ...,
        [       nan,        nan, ...,        nan,        nan],
        [       nan,        nan, ...,        nan,        nan]]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * latitude   (latitude) float32 90.0 87.5 85.0 82.5 ... 75.0 72.5 70.0 67.5
  * time       (time) datetime64[ns] 2002-07-01T12:00:00 ... 2002-07-31T18:00:00
Attributes:
    units:      Pa
    long_name:  Mean sea level pressure

Here, isel is used to select the data along the latitude dimension using a slice object slice(10). This selects all the indices less than 10 along the latitude dimension. The resulting xarray dataset contains all the values of the msl variable along the time, latitude, and longitude dimensions, but only for the selected indices along the latitude dimension.

In [25]:
data_set.groupby('time.season')
Out[25]:
DatasetGroupBy, grouped over 'season'
1 groups with labels 'JJA'.

Let's bring another netcdf file to explore more.

In [26]:
ds = xr.tutorial.load_dataset('air_temperature')
ds
Out[26]:
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Seasonal groups

In [27]:
ds.groupby('time.season')
Out[27]:
DatasetGroupBy, grouped over 'season'
4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'.

The groupby method in xarray allows us to group data along one or more dimensions, and apply an aggregation function such as mean,sum,max etc to each group. In this case, the code ds.groupby('time.season') is grouping the data in the dataset ds along the time dimension, using the season attribute. This will group the data into four groups, one for each season (DJF, MAM, JJA, SON).

Seasonal mean

Now we've learned how to group data base on their seasons in the previous code cell.

But we will now group the data in ds by the season component of the time coordinate and then take mean of each group.

In [28]:
seasonal_mean=ds.groupby('time.season').mean()
seasonal_mean
Out[28]:
<xarray.Dataset>
Dimensions:  (lat: 25, season: 4, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Data variables:
    air      (season, lat, lon) float32 247.0 247.0 246.7 ... 299.4 299.4 299.5
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
In [29]:
seasonal_mean.air.plot(col='season',col_wrap=2)
Out[29]:
<xarray.plot.facetgrid.FacetGrid at 0x7b5e95d1a470>

we have created a plot of seasonal mean values of the air variable in the loaded ds dataset. The col parameter is set to 'season', which means that each season will be plotted in a separate column. The col_wrap parameter is set to 2, which means that there will be a maximum of two columns of plots, and if there are more than two seasons, the extra plots will be added to a new row.

Plotting

In [30]:
!wget -nc https://downloads.psl.noaa.gov//Datasets/noaa.oisst.v2/sst.wkmean.1990-present.nc
--2023-07-28 19:51:32--  https://downloads.psl.noaa.gov//Datasets/noaa.oisst.v2/sst.wkmean.1990-present.nc
Resolving downloads.psl.noaa.gov (downloads.psl.noaa.gov)... 140.172.38.86
Connecting to downloads.psl.noaa.gov (downloads.psl.noaa.gov)|140.172.38.86|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 223865152 (213M) [application/x-netcdf]
Saving to: ‘sst.wkmean.1990-present.nc’

sst.wkmean.1990-pre 100%[===================>] 213.49M  58.0MB/s    in 4.1s    

2023-07-28 19:51:36 (51.8 MB/s) - ‘sst.wkmean.1990-present.nc’ saved [223865152/223865152]

In [31]:
>>> data=xr.open_dataset('sst.wkmean.1990-present.nc')
>>> print(data)
<xarray.Dataset>
Dimensions:    (lat: 180, lon: 360, time: 1727, nbnds: 2)
Coordinates:
  * lat        (lat) float32 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
  * lon        (lon) float32 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) datetime64[ns] 1989-12-31 1990-01-07 ... 2023-01-29
Dimensions without coordinates: nbnds
Data variables:
    sst        (time, lat, lon) float32 ...
    time_bnds  (time, nbnds) datetime64[ns] ...
Attributes:
    title:          NOAA Optimum Interpolation (OI) SST V2
    Conventions:    CF-1.0
    history:        Created 10/2002 by RHS
    comments:       Data described in  Reynolds, R.W., N.A. Rayner, T.M.\nSmi...
    platform:       Model
    source:         NCEP Climate Modeling Branch
    institution:    National Centers for Environmental Prediction
    References:     https://www.psl.noaa.gov/data/gridded/data.noaa.oisst.v2....
    NCO:            4.0.0
    dataset_title:  NOAA Optimum Interpolation (OI) SST V2
    source_url:     http://www.emc.ncep.noaa.gov/research/cmb/sst_analysis/
In [32]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

# # Read the NetCDF file
# data = xr.open_dataset('sst.wkmean.1990-present.nc')

# Extract the necessary data variables
sst = data['sst'].data
time = pd.to_datetime(data['time'].data)
lat = data['lat'].data
lon = data['lon'].data

# Calculate the mean sea surface temperature over all longitudes
sst_zm = np.mean(sst, axis=2)

# Extract the tropical region (70°S to 110°S latitude)
sst_tropics = sst_zm[:, 70:110]

# Calculate the mean sea surface temperature over the tropical region
sst_final = np.mean(sst_tropics, axis=1)

# Create the plot
plt.figure(figsize=(9, 4))
plt.plot(time, sst_final, color='r', linewidth=2., linestyle='-', alpha=1., label='SST')
plt.title('Sea Surface Temperature ($^\circ$C) in the Tropics')
plt.xlabel('Time')
plt.ylabel('SST ($^\circ$C)')
plt.legend(fontsize=10)
plt.show()
In [32]:

In [33]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

# Load the netCDF file
data = xr.open_dataset('sst.wkmean.1990-present.nc')

# Extract the necessary variables
sst = data.sst.data
lon = data.lon.data
time = data.time.data

# Create the contour plot
plt.figure(figsize=(9, 4))
plt.contourf(lon, lat,sst[0, :, :],cmap='magma')
plt.colorbar(label='SST ($^\circ$C)')
plt.title('Sea Surface Temperature')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
In [34]:
!wget -nc https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2/icec.mnmean.nc
--2023-07-28 19:51:44--  https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2/icec.mnmean.nc
Resolving downloads.psl.noaa.gov (downloads.psl.noaa.gov)... 140.172.38.86
Connecting to downloads.psl.noaa.gov (downloads.psl.noaa.gov)|140.172.38.86|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64039040 (61M) [application/x-netcdf]
Saving to: ‘icec.mnmean.nc’

icec.mnmean.nc      100%[===================>]  61.07M  39.5MB/s    in 1.5s    

2023-07-28 19:51:46 (39.5 MB/s) - ‘icec.mnmean.nc’ saved [64039040/64039040]

In [35]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

# Load the netCDF file
data = xr.open_dataset('icec.mnmean.nc')

# Extract the necessary variables
lon = data.lon.data
lat = data.lat.data
icec = data.icec.data

# Create a contour plot of sea ice concentration
plt.figure(figsize=(10, 6))
plt.contourf(lon, lat, icec[0, :, :])
plt.colorbar(label='Sea Ice Concentration')
plt.title('Sea Ice Concentration')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

Handling NetCDF file format

nccopy

'nccopy' command is a tool provided by the NetCDF library for copying and compressing NetCDF files. It has options to specify what kind of output to generate and optionally what level of compression to use and how to chunk the output.

nccopy [-k output_kind] [-d level] [-s] [-c chunkspec]
       [-u] [-m n] [-h n] [-e n] input output

  [-k output_kind] kind of output netCDF file
              omitted => same as input
              '1' or 'classic' => classic file format
              '2' or '64-bit-offset' => 64-bit offset format
              '3' or 'netCDF-4' =>  netcdf-4 format
              '4' or 'netCDF-4 classic  model' => netCDF-4 classic model
  [-d level]  deflation level, from 1 (faster but lower compression)
              to 9 (slower but more compression)
  [-s]        shuffling option, sometimes improves compression
  [-c chunkspec] specify chunking for dimensions, e.g. "dim1/N1,dim2/N2,..."
  [-u]        convert unlimited dimensions to fixed size in output
  [-m n]      memory buffer size (default 5 Mbytes)
  [-h n]      set size in bytes of chunk_cache for chunked variables
  [-e n]      set number of elements that chunk_cache can hold
  input       name of input file or Link
  output      name of output file or Link

Example

In [36]:
!nccopy -u -s -d6 ECMWF_ERA-40_subset.nc file_compressed.nc

This creates a compressed copy of the ECMWF_ERA-40_subset.nc with the name file_compressed.nc.

Note that the '!' at the beginning of the command is used to execute shell commnads within the Colab environment.

CDO(Climate Data Operators)

CDO is a collection of command line tools specially designed for working with climate and weather data.It provides a wide range of operations and functionalities to manipulate, analyze and process climate data.

  • File information (info, sinfo, diff, ...)
  • File operations (copy, cat, merge, split*, ...)
  • Selection (selcode, selvar, sellevel, seltimestep, ...)
  • Missing values (setctomiss, setmisstoc, setrtomiss)
  • Arithmetic (add, sub, mul, div, ...)
  • Mathematical functions (sqrt, exp, log, sin, cos, ...)
  • Comparison (eq, ne, le, lt, ge, gt, ...)
  • Conditions (ifthen, ifnotthen, ifthenc, ifnotthenc)
  • Field statistics (fldsum, fldavg, fldstd, fldmin, fldmax, ...)
  • Vertical statistics (vertsum, vertavg, vertstd, vertmin, ...)
  • Time range statistics (timavg, yearavg, monavg, dayavg, ...)
  • Field interpolation (remapbil, remapcon, remapdis, ...)
  • Vertical interpolation (ml2pl, ml2hl)
  • Time interpolation (inttime, intyear)

To install CDO in Colab, we can use the following snippet:

!apt-get install -y cdo

▶ To concatenate and merge files:

!cdo merge file1.nc file2.nc file3.nc merged.nc

Here, file1.nc ,file2.nc ,file3.nc are merged together into merged.nc file

▶ To convert grib file to netcdf file:

!cdo -f nc copy file.grb file.nc

▶ TO shift Longitude of data from 0:360 to -180:180:

!cdo sellonlatbox,-180,180,-90,90 input.nc output.nc

▶ To calculate statistical mean:

!cdo timmean input.nc timmean.nc

▶ To apply mathematical and statistical operations:

!cdo add input1.nc input2.nc sum.nc
!cdo mul input1.nc input2.nc product.nc

▶ To perform time series analysis and temporal aggregations:

!cdo yearmean input.nc yearly_mean.nc
!cdo monmean input.nc monthly_mean.nc

Note that the '!' at the beginning of the command is used to execute shell commnads within the Colab environment.

NCO(netCDF Operators)

NCO (netCDF Operators) is typically used when we need to perform advanced operations on netCDF .

Some NCO command line operators to apply on netCDF files:

ncap2      - arithmetic processor
ncatted    - attribute editor
ncbo       - binary operator
ncdiff     - differencer
ncea       - ensemble averager
ncecat     - ensemble concatenator
ncflint    - file interpolator
ncks       - kitchen sink (extract, cut, paste, print data)
ncpdq      - permute dimensions quickly
ncra       - running averager
ncrcat     - record concatenator
ncrename   - renamer
ncwa       - weighted averager

In colab we have to install NCO using the command:

!apt-get install -y nco

▶ To multiply a variable by a constant:

!ncap2 -s 'variable = variable * 2.0' input.nc output.nc

▶ To add or modify an attribute:

!ncatted -a units,new_units,m,c,'New Units' input.nc output.nc

▶ To add two variables:

!ncbo -s 'variable = variable1 + variable2' input1.nc input2.nc output.nc

▶ To calculate the difference between two variables:

!ncdiff -O -o diff.nc input1.nc input2.nc

Note that the '!' at the beginning of the command is used to execute shell commnads within the Colab environment.

In [36]: