import os
import subprocess
# Check if disasters-aws-conversion exists, if not clone it
repo_name = "disasters-aws-conversion"
repo_url = "https://github.com/Disasters-Learning-Portal/disasters-aws-conversion.git"
if not os.path.exists(repo_name):
print(f"π₯ Cloning {repo_name} repository...")
try:
result = subprocess.run(
["git", "clone", repo_url, f"{repo_name}"],
capture_output=True,
text=True,
check=True
)
print(f"β
Successfully cloned {repo_name}")
except subprocess.CalledProcessError as e:
print(f"β Error cloning repository: {e.stderr}")
else:
print(f"β
{repo_name} repository already exists")π Simple Disaster COG Processing
This simplified notebook converts disaster satellite imagery to Cloud Optimized GeoTIFFs (COGs) with just a few cells.
β¨ Features
- See files first - List S3 files before configuring
- Smart configuration - Define filename functions after seeing actual files
- Auto-discovery - Automatically categorizes your files
- Simple processing - Just run the cells in order
π Launch in Disasters-Hub JupyterHub (requires access)
To obtain credentials to VEDA Hub, follow this link for more information.
If you do not have a VEDA Jupyterhub Account you can launch this notebook on your local environment using MyBinder by clicking the icon below.
π§ Step 0: Setup - Clone Required Repository
Run this cell first! This notebook requires the disasters-aws-conversion repository for processing functions.
π Step 1: Basic Configuration
Set your event details and S3 paths:
# ========================================
# INPUTS
# ========================================
# S3 Paths
BUCKET = 'nasa-disasters' # S3 bucket (DO NOT CHANGE)
DESTINATION_BASE = 'drcs_activations_new' # Where to save COGs in S3 bucket (DO NOT CHANGE)
GEOTIFF_DIR = 'drcs_activations' # This is where all non-converted files should be placed
# Event Details
EVENT_NAME = '202510_Flood_AK' # Your sensor or product name (e.g, Sentinel-1, Planet, Landsat)
SUB_PRODUCT_NAME = 'aria' # Sub-directories within PRODUCT_NAME (RGB, trueColor, SWIR, etc.). Can leave blank and it will read from PRODUCT_NAME.
SOURCE_PATH = f'{GEOTIFF_DIR}/{EVENT_NAME}/{SUB_PRODUCT_NAME}' # Where your files are
# Processing Options
OVERWRITE = False # Set to True to replace existing files
VERIFY = True # Set to True to verify results after processing
SAVE_RESULTS = True # Set to False to skip saving results CSV to /output directory
print(f"Event: {GEOTIFF_DIR}")
print(f"Source: s3://{BUCKET}/{SOURCE_PATH}")
print(f"Destination: s3://{BUCKET}/{DESTINATION_BASE}/")Event: drcs_activations
Source: s3://nasa-disasters/drcs_activations/202510_Flood_AK/aria
Destination: s3://nasa-disasters/drcs_activations_new/
π Step 2: Connect to S3 and List Files
Letβs see what files are available before configuring filename transformations:
# Import necessary modules
import sys
import os
from pathlib import Path
# Add parent directory to path for importing functions
sys.path.insert(0, str(Path('..').resolve()))
# Import S3 operations
from core.s3_operations import (
initialize_s3_client,
list_s3_files,
get_file_size_from_s3
)
# Initialize S3 client
print("π Connecting to S3...")
s3_client, _ = initialize_s3_client(bucket_name=BUCKET, verbose=False)
if s3_client:
print("β
Connected to S3\n")
# List all TIF files
print(f"π Files in s3://{BUCKET}/{SOURCE_PATH}:")
print("="*60)
files = list_s3_files(s3_client, BUCKET, SOURCE_PATH, suffix='.tif')
if files:
print(f"Found {len(files)} .tif files:\n")
for i, file_path in enumerate(files[:10], 1): # Show first 10
filename = os.path.basename(file_path)
try:
size_gb = get_file_size_from_s3(s3_client, BUCKET, file_path)
print(f"{i:2}. {filename:<60} ({size_gb:.2f} GB)")
except:
print(f"{i:2}. {filename}")
if len(files) > 10:
print(f"\n... and {len(files) - 10} more files")
print("\n" + "="*60)
print("\nπ‘ Use this information to create filename functions in Step 3")
else:
print("β οΈ No .tif files found in the specified path.")
print(" Check your SOURCE_PATH configuration.")
else:
print("β Could not connect to S3. Check your AWS credentials.")
files = []π Connecting to S3...
β
Connected to S3
π Files in s3://nasa-disasters/drcs_activations/202510_Flood_AK/aria:
============================================================
Found 6 .tif files:
1. OPERA_L3_DSWX-S1_V1_WTR_2025-10-08_mosaic.tif (0.01 GB)
2. OPERA_L3_DSWX-S1_V1_WTR_2025-10-10_mosaic.tif (0.03 GB)
3. OPERA_L3_DSWX-S1_V1_WTR_2025-10-12_mosaic.tif (0.03 GB)
4. OPERA_L3_DSWX-S1_V1_WTR_2025-10-15_mosaic.tif (0.02 GB)
5. OPERA_L3_DSWX-HLS_V1_WTR_2025-10-08_mosaic.tif (0.01 GB)
6. OPERA_L3_DSWX-HLS_V1_WTR_2025-10-13_mosaic.tif (0.01 GB)
============================================================
π‘ Use this information to create filename functions in Step 3
π·οΈ Step 3a: Define Categorization and Filename Transformations
Based on the files you see above, configure: 1. Categorization patterns - Regex patterns to identify file types 2. Filename functions - How to transform filenames 3. Output directories - Where each category should be saved
# ========================================
# CATEGORIZATION AND OUTPUT CONFIGURATION
# ========================================
import re
# STEP 1: Define how to extract dates from filenames
def extract_date_from_filename(filename):
"""Extract date from filename in YYYYMMDD format."""
dates = re.findall(r'\d{8}', filename)
if dates:
date_str = dates[0]
return f"{date_str[0:4]}-{date_str[4:6]}-{date_str[6:8]}"
return None
# STEP 2: Define filename transformation functions for each category
def create_truecolor_filename(original_path, event_name):
"""Create filename for trueColor products."""
filename = os.path.basename(original_path)
stem = os.path.splitext(filename)[0]
date = extract_date_from_filename(stem)
if date:
stem_clean = re.sub(r'_\d{8}', '', stem)
return f"{event_name}_{stem_clean}_{date}_day.tif"
return f"{event_name}_{stem}_day.tif"
def create_colorinfrared_filename(original_path, event_name):
"""Create filename for colorInfrared products."""
filename = os.path.basename(original_path)
stem = os.path.splitext(filename)[0]
date = extract_date_from_filename(stem)
if date:
stem_clean = re.sub(r'_\d{8}', '', stem)
return f"{event_name}_{stem_clean}_{date}_day.tif"
return f"{event_name}_{stem}_day.tif"
def create_naturalcolor_filename(original_path, event_name):
"""Create filename for naturalColor products."""
filename = os.path.basename(original_path)
stem = os.path.splitext(filename)[0]
date = extract_date_from_filename(stem)
if date:
stem_clean = re.sub(r'_\d{8}', '', stem)
return f"{event_name}_{stem_clean}_{date}_day.tif"
return f"{event_name}_{stem}_day.tif"
# STEP 3: Configure categorization patterns (REQUIRED)
# These regex patterns determine which files belong to which category
CATEGORIZATION_PATTERNS = {
'trueColor': r'trueColor|truecolor|true_color',
'colorInfrared': r'colorInfrared|colorIR|color_infrared',
'naturalColor': r'naturalColor|natural_color',
# Add patterns for ALL file types you want to process
# Files not matching any pattern will be skipped with a warning
}
# STEP 4: Map categories to filename transformation functions
FILENAME_CREATORS = {
'trueColor': create_truecolor_filename,
'colorInfrared': create_colorinfrared_filename,
'naturalColor': create_naturalcolor_filename,
# Must have an entry for each category in CATEGORIZATION_PATTERNS
}
# STEP 5: Specify output directories for each category
OUTPUT_DIRS = {
'trueColor': 'Landsat/trueColor',
'colorInfrared': 'Landsat/colorIR',
'naturalColor': 'Landsat/naturalColor',
# Must have an entry for each category in CATEGORIZATION_PATTERNS
}
# OPTIONAL: Specify no-data values (None = auto-detect)
NODATA_VALUES = {
'trueColor': 0,
'colorInfrared': 0,
'naturalColor': 0
# Leave empty or set to None for auto-detection
}
π·οΈ Step 3b: Test the new functions to verify what the inputs and outputs will be.
print("β
Configuration defined")
print(f"\nπ Categories and output paths:")
for category, path in OUTPUT_DIRS.items():
pattern = CATEGORIZATION_PATTERNS.get(category, 'No pattern defined')
print(f" β’ {category}:")
print(f" Pattern: {pattern}")
print(f" Output: {DESTINATION_BASE}/{path}")
# Test with sample filename if files exist
if files:
sample_file = files[0]
sample_name = os.path.basename(sample_file)
# Check which category it would match
matched_category = None
for cat, pattern in CATEGORIZATION_PATTERNS.items():
if re.search(pattern, sample_name, re.IGNORECASE):
matched_category = cat
break
if matched_category:
new_name = FILENAME_CREATORS[matched_category](sample_file, EVENT_NAME)
print(f"\nπ Example transformation:")
print(f" Original: {sample_name}")
print(f" Category: {matched_category}")
print(f" β New: {new_name}")
print(f" β Output: {DESTINATION_BASE}/{OUTPUT_DIRS[matched_category]}/{new_name}")
else:
print(f"\nβ οΈ Warning: Sample file doesn't match any category pattern:")
print(f" File: {sample_name}")
print(f" Add a pattern to CATEGORIZATION_PATTERNS to process this file type")β
Configuration defined
π Categories and output paths:
β’ trueColor:
Pattern: trueColor|truecolor|true_color
Output: drcs_activations_new/Landsat/trueColor
β’ colorInfrared:
Pattern: colorInfrared|colorIR|color_infrared
Output: drcs_activations_new/Landsat/colorIR
β’ naturalColor:
Pattern: naturalColor|natural_color
Output: drcs_activations_new/Landsat/naturalColor
π Step 4: Initialize Processor and Preview
Now letβs set up the processor and preview all transformations:
# Import our simplified helper
from notebooks.notebook_helpers import SimpleProcessor
# Create full configuration with categorization patterns
config = {
'event_name': EVENT_NAME,
'bucket': BUCKET,
'source_path': SOURCE_PATH,
'destination_base': DESTINATION_BASE,
'overwrite': OVERWRITE,
'verify': VERIFY,
'save_results': SAVE_RESULTS, # Add save results flag
'categorization_patterns': CATEGORIZATION_PATTERNS, # IMPORTANT: Include patterns
'filename_creators': FILENAME_CREATORS,
'output_dirs': OUTPUT_DIRS,
'nodata_values': NODATA_VALUES
}
# Initialize processor
processor = SimpleProcessor(config)
# Connect to S3 (already connected, but needed for processor)
if processor.connect_to_s3():
print("β
Processor ready\n")
# Discover and categorize files
num_files = processor.discover_files()
if num_files > 0:
# Show preview of transformations
processor.preview_processing()
print("\nπ Review the transformations above.")
print(" β’ Files will be saved to the directories specified in OUTPUT_DIRS")
print(" β’ If files appear as 'uncategorized', add patterns to CATEGORIZATION_PATTERNS")
print(" β’ When ready, proceed to Step 5 to process the files.")
else:
print("β οΈ No files found to process.")
else:
print("β Could not initialize processor.")β
All modules loaded successfully
π Connecting to S3...
β
Connected to S3 successfully
β
Processor ready
π Searching for files in: geotiffs_to_convert/202408_TropicalStorm_Debby/landsat8
β οΈ No .tif files found
β οΈ No files found to process.
βοΈ Step 5: Process Files
Run this cell to start processing all files:
# Process all files
if 'num_files' in locals() and num_files > 0:
print("π Starting processing...")
print("This may take several minutes depending on file sizes.\n")
# Process everything
results = processor.process_all()
# Display results
if not results.empty:
print("\nπ Processing Complete!")
display(results) if 'display' in dir() else print(results)
else:
print("β οΈ No files to process. Complete Steps 1-4 first.")π Starting processing...
This may take several minutes depending on file sizes.
π Starting processing...
π¦ Processing colorInfrared (3 files)
βοΈ Skipped: LC08_colorInfrared_20240715_155319_016036.tif (exists)
βοΈ Skipped: LC08_colorInfrared_20240715_155343_016037.tif (exists)
βοΈ Skipped: LC08_colorInfrared_20240715_15547_016038.tif (exists)
π¦ Processing naturalColor (3 files)
βοΈ Skipped: LC08_naturalColor_20240715_155319_016036.tif (exists)
βοΈ Skipped: LC08_naturalColor_20240715_155343_016037.tif (exists)
βοΈ Skipped: LC08_naturalColor_20240715_15547_016038.tif (exists)
π¦ Processing trueColor (3 files)
βοΈ Skipped: LC08_trueColor_20240715_155319_016036.tif (exists)
βοΈ Skipped: LC08_trueColor_20240715_155343_016037.tif (exists)
βοΈ Skipped: LC08_trueColor_20240715_15547_016038.tif (exists)
============================================================
β
PROCESSING COMPLETE
============================================================
Results:
βοΈ Skipped: 9
Processing time: 0.0 minutes
π Results saved to: output/202408_TropicalStorm_Debby/results_20250929_191143.csv
============================================================
π Processing Complete!
source_file category status \
0 LC08_colorInfrared_20240715_155319_016036.tif colorInfrared skipped
1 LC08_colorInfrared_20240715_155343_016037.tif colorInfrared skipped
2 LC08_colorInfrared_20240715_15547_016038.tif colorInfrared skipped
3 LC08_naturalColor_20240715_155319_016036.tif naturalColor skipped
4 LC08_naturalColor_20240715_155343_016037.tif naturalColor skipped
5 LC08_naturalColor_20240715_15547_016038.tif naturalColor skipped
6 LC08_trueColor_20240715_155319_016036.tif trueColor skipped
7 LC08_trueColor_20240715_155343_016037.tif trueColor skipped
8 LC08_trueColor_20240715_15547_016038.tif trueColor skipped
reason output_path \
0 already exists s3://nasa-disasters/drcs_activations_new/Lands...
1 already exists s3://nasa-disasters/drcs_activations_new/Lands...
2 already exists s3://nasa-disasters/drcs_activations_new/Lands...
3 already exists s3://nasa-disasters/drcs_activations_new/Lands...
4 already exists s3://nasa-disasters/drcs_activations_new/Lands...
5 already exists s3://nasa-disasters/drcs_activations_new/Lands...
6 already exists s3://nasa-disasters/drcs_activations_new/Lands...
7 already exists s3://nasa-disasters/drcs_activations_new/Lands...
8 already exists s3://nasa-disasters/drcs_activations_new/Lands...
time_seconds
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
# Analyze results
if 'results' in locals() and not results.empty:
print("π PROCESSING STATISTICS")
print("="*40)
# Success rate
total = len(results)
success = len(results[results['status'] == 'success'])
failed = len(results[results['status'] == 'failed'])
skipped = len(results[results['status'] == 'skipped'])
print(f"Total files: {total}")
print(f"β
Success: {success}")
print(f"β Failed: {failed}")
print(f"βοΈ Skipped: {skipped}")
print(f"\nSuccess rate: {(success/total*100):.1f}%")
# Failed files
if failed > 0:
print("\nβ Failed files:")
failed_df = results[results['status'] == 'failed']
for idx, row in failed_df.iterrows():
print(f" - {row['source_file']}: {row.get('error', 'Unknown error')}")
# Processing times
if 'time_seconds' in results.columns:
success_df = results[results['status'] == 'success']
if not success_df.empty:
avg_time = success_df['time_seconds'].mean()
max_time = success_df['time_seconds'].max()
print(f"\nβ±οΈ Timing:")
print(f"Average: {avg_time:.1f} seconds per file")
print(f"Slowest: {max_time:.1f} seconds")
else:
print("No results to analyze. Run Step 5 first.")π‘ Tips & Troubleshooting
Workflow Summary:
- Setup - Clone disasters-aws-conversion repository (Step 0)
- Configure basic settings (Step 1)
- List files from S3 to see naming patterns (Step 2)
- Define functions to transform filenames (Step 3)
- Preview transformations (Step 4)
- Process all files (Step 5)
- Review results (Step 6)
Common Issues:
- βModuleNotFoundError: No module named βcoreββ or import errors
- Run Step 0 first to clone the disasters-aws-conversion repository
- Restart kernel and run all cells from the beginning
- βNo files foundβ
- Check
SOURCE_PATHin Step 1 - Verify bucket permissions
- Ensure files have
.tifextension
- Check
- Wrong filename transformations
- Review actual filenames in Step 2
- Adjust functions in Step 3
- Re-run Step 4 to preview
- Files being skipped
- Files already exist in destination
- Set
OVERWRITE = Truein Step 1
- Processing errors
- Check AWS credentials
- Verify S3 write permissions
- Check available disk space for temp files
Need More Control?
Use the full template at disaster_processing_template.ipynb for: - Manual chunk configuration - Custom compression settings - Detailed memory management - Advanced processing options