Sentinel-2 Processing Workflow

This notebook demonstrates the complete workflow for downloading and processing Sentinel-2 satellite imagery using the disasters-product-algorithms package.
Authors

Aaron Serre (Editor, UAH)

Kyle Lesinger (Editor, UAH)

Published

January 12, 2025

Run This Notebook

🚀 Launch in Disasters-Hub JupyterHub (requires access)

To obtain credentials to VEDA Hub, follow this link for more information.

Disclaimer: it is highly recommended to run a tutorial within NASA VEDA JupyterHub, which already includes functions for processing and visualizing data specific to VEDA stories. Running the tutorial outside of the VEDA JupyterHub may lead to errors, specifically related to EarthData authentication. Additionally, it is recommended to use the Pangeo workspace within the VEDA JupyterHub, since certain packages relevant to this tutorial are already installed.

If you do not have a VEDA Jupyterhub Account you can launch this notebook on your local environment using MyBinder by clicking the icon below.


Binder

Table of Contents

Sentinel-2 Processing Workflow

This notebook demonstrates the complete workflow for downloading and processing Sentinel-2 satellite imagery using the disasters-product-algorithms package.

Workflow Steps

  1. Configure Environment Variables - Set processing parameters
  2. Download Sentinel-2 Data - Download imagery from Copernicus
  3. Process Sentinel-2 Data - Generate products with COG conversion and event naming
  4. View Results - Examine the generated outputs

Features Demonstrated

  • Cloud Optimized GeoTIFF (COG) conversion
  • Event-based file naming for disaster response
  • Multiple product generation (true color, NDVI, NDWI, MNDWI, NBR, water extent)
  • Cloud masking (L2A only)

Environment Setup

Configure all processing parameters as environment variables for easy modification.

import os
import subprocess
from pathlib import Path

# ==================== CONFIGURATION ====================
# Modify these variables according to your requirements

# --- UPDATED: Credentials for Copernicus (Avoids Subprocess Hang) ---
COP_USER = "kdl0040@uah.edu"              # Update with your new account email
COP_PASS = "^d8#KhU_SDj~V5_"           # Update with your account password

# --- UPDATED: Download parameters (Using a range to find data) ---
TILE_ID = "T17RLN"              # Sentinel-2 tile ID
# POLYGON = os.path.expanduser(f"~/shared-readwrite/process_sentinel2/AOI/Jan2026_WinterStorm_LS_S2_AOI.shp")
# Using a window to catch the Nov 13th overpass
DOWNLOAD_DATE = ["20251230"]#, "20260101", "20260118"] 
PROCESSING_LEVEL = "2"          # 1 = L1C, 2 = L2A

# Processing parameters
PRODUCTS = ["true", "swir"]
EVENT_NAME = "202601_WinterWx_US"
# The satellite pass actually happened on the 12th
PROCESS_DATE = "20251230"        

# COG options
COMPRESSION = "ZSTD"
COMPRESSION_LEVEL = 22
NODATA = 0

# Processing flags
ENABLE_MERGE = True
ENABLE_MASK = False
FORCE_OVERWRITE = True
DST_CRS = 'EPSG:4326'

# Output directory for downloaded and processed data
OUTPUT_DIR = os.path.expanduser(f"/home/jovyan/disasters-docs/Jupyterhub/process_sentinel2/{EVENT_NAME}_s2")

# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

print("Configuration Updated:")
print(f"  Output Directory: {OUTPUT_DIR}")
#print(f"  Tile ID: {TILE_ID}")
print(f"  Search Range: {DOWNLOAD_DATE[0]} to {DOWNLOAD_DATE[-1]}") #Change to read first and last of array)
print(f"  Processing Level: L{PROCESSING_LEVEL}A")
print(f"  Products: {', '.join(PRODUCTS)}")
print(f"  Event Name: {EVENT_NAME}")
Configuration Updated:
  Output Directory: /home/jovyan/disasters-docs/Jupyterhub/process_sentinel2/202601_WinterWx_US_s2
  Search Range: 20251230 to 20251230
  Processing Level: L2A
  Products: true, swir
  Event Name: 202601_WinterWx_US

Download Sentinel-2 Data

Download Sentinel-2 imagery from the Copernicus Data Space Ecosystem.

Note: You will be prompted for your Copernicus credentials. - Register at: https://dataspace.copernicus.eu/

  • Figure out how to add coper. email and password into the script as an environment variable
# Use the * operator to unpack DOWNLOAD_DATE if it is a list
# This turns ["date1", "date2"] into "date1", "date2"
download_cmd = [
    "download_sentinel2",
    OUTPUT_DIR,
    "-tile", TILE_ID,
    # "-polygon", POLYGON,
    "-date", *DOWNLOAD_DATE,        
    "-level", PROCESSING_LEVEL,
    "-u", COP_USER, 
    "-p", COP_PASS, 
    "-y"
]

print("--- DOWNLOADING DATA ---")

# Execute download with real-time output
process = subprocess.Popen(
    download_cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True,
    bufsize=1
)

# Print output in real-time
for line in process.stdout:
    print(line, end='')

# Wait for completion
return_code = process.wait()

if return_code != 0:
    print("--- ERROR ---")
    print("Download failed. Check output above for details.")
else:
    print("✓ Download completed successfully.")
--- DOWNLOADING DATA ---
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[4], line 18
     15 print("--- DOWNLOADING DATA ---")
     17 # Execute download with real-time output
---> 18 process = subprocess.Popen(
     19     download_cmd,
     20     stdout=subprocess.PIPE,
     21     stderr=subprocess.STDOUT,
     22     universal_newlines=True,
     23     bufsize=1
     24 )
     26 # Print output in real-time
     27 for line in process.stdout:

File /srv/conda/envs/notebook/lib/python3.12/subprocess.py:1026, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize, process_group)
   1022         if self.text_mode:
   1023             self.stderr = io.TextIOWrapper(self.stderr,
   1024                     encoding=encoding, errors=errors)
-> 1026     self._execute_child(args, executable, preexec_fn, close_fds,
   1027                         pass_fds, cwd, env,
   1028                         startupinfo, creationflags, shell,
   1029                         p2cread, p2cwrite,
   1030                         c2pread, c2pwrite,
   1031                         errread, errwrite,
   1032                         restore_signals,
   1033                         gid, gids, uid, umask,
   1034                         start_new_session, process_group)
   1035 except:
   1036     # Cleanup if the child failed starting.
   1037     for f in filter(None, (self.stdin, self.stdout, self.stderr)):

File /srv/conda/envs/notebook/lib/python3.12/subprocess.py:1955, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session, process_group)
   1953     err_msg = os.strerror(errno_num)
   1954 if err_filename is not None:
-> 1955     raise child_exception_type(errno_num, err_msg, err_filename)
   1956 else:
   1957     raise child_exception_type(errno_num, err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'download_sentinel2'

Process Sentinel-2 Data

Process the downloaded imagery to generate various products with COG conversion and event naming.

Note: The processing script has been configured with unbuffered output to display progress in real-time within JupyterHub. You’ll see: - Progress bars for scene processing - Detailed product generation steps - COG conversion progress - Error messages if any products fail - Final processing summary with success/failure counts - Log file location for detailed error tracking

print("--- HELPER FLAG ARGUMENTS ---")
# Build processing command
process_cmd = [
    "process_sentinel2",
    "-h"
]

# Execute processing
help_flags = subprocess.run(process_cmd, cwd=os.getcwd())
print(help_flags)
# Build processing command
process_cmd = [
    "process_sentinel2",
    OUTPUT_DIR,
    "-p", *PRODUCTS,
    "-event", EVENT_NAME,
    "-compression", COMPRESSION,
    "-compression_level", str(COMPRESSION_LEVEL),
    "-dst_crs", DST_CRS,
    "-event", EVENT_NAME
]

# Add optional parameters
if PROCESS_DATE:
    process_cmd.extend(["-date", PROCESS_DATE])

if ENABLE_MERGE:
    process_cmd.append("-merge")

if ENABLE_MASK:
    process_cmd.append("-mask")

if FORCE_OVERWRITE:
    process_cmd.append("-force")

if NODATA is not None:
    process_cmd.extend(["-nodata", str(NODATA)])

print("Processing Sentinel-2 data...")
print(f"Command: {' '.join(process_cmd)}")
print()

# Execute processing with real-time output
# The scripts have built-in unbuffered output for JupyterHub compatibility
process = subprocess.Popen(
    process_cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True,
    bufsize=1  # Line buffered
)

# Print output in real-time
for line in process.stdout:
    print(line, end='')  # end='' because line already has newline

# Wait for completion
return_code = process.wait()

if return_code == 0:
    print("\n✓ Processing completed successfully!")
else:
    print(f"\n✗ Processing failed with return code {return_code}")

View Results

Examine the generated output files and directory structure.

import glob
import os
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
import numpy as np

# We'll check both "output" and the EVENT_NAME folder to be safe
potential_paths = [
    os.path.join(OUTPUT_DIR, "output"),
    os.path.join(OUTPUT_DIR, EVENT_NAME)
]

output_dir = None
for path in potential_paths:
    if os.path.exists(path):
        output_dir = path
        break

if output_dir:
    print(f"✓ Found output directory: {output_dir}\n")
    
    # Find all finished GeoTIFF files (ignoring temp files)
    tif_files = sorted([f for f in glob.glob(os.path.join(output_dir, "**/*.tif"), recursive=True) 
                        if not f.endswith('.tmp.tif')])
    
    if tif_files:
        product_types = {}
        for tif_file in tif_files:
            product_type = os.path.basename(os.path.dirname(tif_file))
            if product_type not in product_types:
                product_types[product_type] = []
            product_types[product_type].append(tif_file)

        # --- PLOTTING SECTION ---
        print(f"Displaying {len(product_types)} product types...")
        
        # Create a grid for plotting (1 row per product type)
        fig, axes = plt.subplots(len(product_types), 1, figsize=(12, 8 * len(product_types)))
        if len(product_types) == 1: axes = [axes] # Handle single product case

        for i, (p_type, files) in enumerate(sorted(product_types.items())):
            ax = axes[i]
            sample_file = files[0] # Plot the first file available for this type
            
            with rasterio.open(sample_file) as src:
                # If it's true color (3 bands), we read all. Otherwise, just one.
                if src.count >= 3:
                    arr = src.read([1, 2, 3])
                    # Normalize for display if needed
                    arr = np.ma.transpose(arr, (1, 2, 0))
                else:
                    arr = src.read(1)
                
                img = show(src, ax=ax, title=f"Product: {p_type}\n{os.path.basename(sample_file)}")
                print(f"✓ Generated plot for {p_type}")

        plt.tight_layout()
        plt.show()

        # List file sizes below the images
        print("\nFile Details:")
        print("=" * 50)
        for p_type, files in sorted(product_types.items()):
            print(f"\n{p_type}:")
            for f in files:
                size = os.path.getsize(f) / (1024*1024)
                print(f"  - {os.path.basename(f)} ({size:.1f} MB)")
    else:
        print("No finished GeoTIFF files found. Check if the processing script is still running.")
else:
    print(f"Output directory not found in: {potential_paths}")

Output File Naming Convention

Files are named with the event prefix and formatted date:

Original: S2B_MSIL2A_colorInfrared_20251111_161419_T17RLN.tif

With Event Naming: 202311_Example_Event_S2B_MSIL2A_colorInfrared_161419_T17RLN_2025-11-11_day.tif

This naming convention: - Adds event prefix for organization - Removes date from middle position - Adds formatted date (YYYY-MM-DD) at the end - Includes _day suffix for AWS/cloud compatibility

Next Steps

You can now: 1. Load and visualize the GeoTIFF files using libraries like rasterio or GDAL 2. Upload the COG files to cloud storage (S3, GCS, etc.) 3. Process additional dates or tiles by modifying the configuration variables 4. Generate additional products by updating the PRODUCTS list

Back to top