JupyterHub Training Guide - Disasters Hub
Table of Contents
- Introduction
- Getting Started
- JupyterHub Interface Overview
- Working with Jupyter Notebooks
- Data Management
- Environment and Package Management
- Terminal and Command Line Access
- Collaboration and Sharing
- Resource Management
- Best Practices
- Troubleshooting
- Keyboard Shortcuts
- Resources and Links
Introduction
What is JupyterHub?
JupyterHub is a multi-user server that manages and provides web-based Jupyter notebook environments for multiple users. It allows you to:
- Access powerful computing resources through your web browser
- Write and execute code in Python, R, Julia, and other languages
- Visualize data with interactive plots and charts
- Collaborate with team members on shared projects
- Work from anywhere without local setup requirements
The Disasters Hub
The Disasters Hub (https://hub.disasters.2i2c.cloud/) is a specialized JupyterHub instance designed for disaster response and analysis work. It provides:
- Pre-configured environments for geospatial analysis
- Access to disaster-related datasets
- Collaboration tools for response teams
- Integration with cloud storage services
- Scalable computing resources
Key Benefits
β
No Installation Required - Everything runs in your browser
β
Pre-configured Environments - Common packages already installed
β
Persistent Storage - Your work is saved between sessions
β
Collaboration Ready - Share notebooks with team members
β
Scalable Resources - Access to GPU and high-memory instances when needed
Getting Started
Accessing the Disasters Hub
- Navigate to the Hub
- Open your web browser (Chrome, Firefox, Safari, or Edge recommended)
- Go to: https://hub.disasters.2i2c.cloud/
- Bookmark this URL for easy access
- Authentication
- Youβll see a login screen with authentication options
- Common authentication methods:
- GitHub: Use your GitHub credentials
- Google: Use your Google account
- Institutional Login: Use your organizationβs credentials
- Select your authentication method and follow the prompts
- First-Time Login
- Accept terms of service if prompted
- Your home directory will be created automatically
- Initial setup may take 30-60 seconds
Server Selection
After login, you may be presented with server options:
Server Options:
βββββββββββββββββββββββββββββββββββββββ
β β’ Small (2 CPU, 4GB RAM) β
β β’ Medium (4 CPU, 8GB RAM) β
β β’ Large (8 CPU, 16GB RAM) β
β β’ GPU Instance (if available) β
βββββββββββββββββββββββββββββββββββββββ
Tips for Server Selection: - Start with Small for basic notebook work - Use Medium for data processing tasks - Choose Large for machine learning or big data - Select GPU only when needed (limited availability)
JupyterHub Interface Overview
The JupyterLab Interface
Once logged in, youβll see the JupyterLab interface:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β [File] [Edit] [View] [Run] [Kernel] [Tabs] [Settings] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π File Browser β Main Work Area |
β βββ π data β β
β βββ π notebooksβ [Launcher Tab] β
β βββ π scripts β β’ Notebook (Python 3) β
β βββ π README β β’ Console β
β β β’ Terminal |
β [+] New β β’ Text File β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Interface Components
- Top Menu Bar
- File operations, editing, running code
- Kernel management
- View options and settings
- Left Sidebar
- File Browser (π): Navigate and manage files
- Running Terminals and Kernels (βΆ): Monitor active sessions
- Command Palette (π§): Access all commands
- Extension Manager (π§©): Add functionality
- Main Work Area
- Multiple tabs for notebooks, terminals, and files
- Drag tabs to rearrange or create split views
- Right-click tabs for additional options
- Status Bar
- Current kernel status
- Line/column position
- File encoding and type
Creating Your First Notebook
- Click the Python 3 icon in the Launcher
- Or: File β New β Notebook
- Select kernel (usually Python 3)
- Rename your notebook: Right-click on βUntitled.ipynbβ β Rename
Working with Jupyter Notebooks
Notebook Basics
A Jupyter notebook consists of cells that can contain: - Code: Executable Python (or other language) code - Markdown: Formatted text, equations, and images - Raw: Unformatted text
Cell Operations
Running Cells
- Run current cell:
Shift + Enter(run and move to next) - Run current cell in place:
Ctrl + Enter(stay in cell) - Run all cells: Menu β Run β Run All Cells
Cell Types
# Code Cell Example
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv')
data.head()# Markdown Cell Example
## Analysis Results
- **Finding 1**: Data shows increasing trend
- **Finding 2**: Correlation coefficient: 0.85
$$E = mc^2$$ # LaTeX equationCell Management
- Insert cell above:
A(in command mode) - Insert cell below:
B(in command mode) - Delete cell:
DD(press D twice in command mode) - Copy cell:
C(in command mode) - Paste cell:
V(in command mode) - Undo deletion:
Z(in command mode)
Working with Kernels
The kernel is the computational engine that executes your code.
Kernel Operations
- Restart kernel: Kernel β Restart
- Restart and clear output: Kernel β Restart & Clear Output
- Restart and run all: Kernel β Restart & Run All
- Interrupt execution: Kernel β Interrupt (or
I,Iin command mode) - Change kernel: Kernel β Change Kernel
Kernel Status Indicators
- β: Kernel idle
- β: Kernel busy
- [*]: Cell currently executing
- [1]: Cell execution number
Notebook Best Practices
Use meaningful cell divisions
- One concept or operation per cell
- Separate imports, data loading, processing, visualization
Document your work
# Good practice: Add comments and markdown cells # Load disaster response data df = pd.read_csv('disaster_data.csv') # Data preprocessing df['date'] = pd.to_datetime(df['date']) df = df.dropna()Clear output before sharing
- Kernel β Restart & Clear Output
- Reduces file size and removes sensitive output
Data Management
File Upload/Download
Uploading Files
Drag and drop files directly into the file browser
Upload button: Click the β¬ button in the file browser toolbar
Terminal upload: Use
wgetorcurlin terminalwget https://example.com/data.csv curl -O https://example.com/data.zip
Downloading Files
Right-click file in browser β Download
From notebook:
from IPython.display import FileLink FileLink('results.csv') # Creates downloadable link
Working with Cloud Storage
AWS S3 Integration
import boto3
import pandas as pd
# Read from S3
df = pd.read_csv('s3://bucket-name/path/to/file.csv')
# Write to S3
df.to_csv('s3://bucket-name/output/results.csv', index=False)Google Cloud Storage
# Read from GCS
df = pd.read_csv('gs://bucket-name/path/to/file.csv')
# Using gsutil in terminal
!gsutil cp gs://bucket/file.csv ./data/Data Organization
Recommended directory structure:
home/
βββ data/
β βββ raw/ # Original, immutable data
β βββ processed/ # Cleaned, transformed data
β βββ external/ # Data from external sources
βββ notebooks/
β βββ exploratory/ # Initial explorations
β βββ analysis/ # Detailed analysis
β βββ reports/ # Final reports
βββ scripts/ # Reusable Python scripts
βββ results/ # Output files, figures
βββ requirements.txt # Package dependencies
Data Persistence
β οΈ Important: Your home directory is persistent, but understand the storage limits:
- Home directory: Usually 10-100 GB (persistent)
- Shared data: Read-only datasets available to all users
- Temporary storage:
/tmpcleared on restart - Best practice: Store large datasets in cloud storage, not home directory
Environment and Package Management
Installing Packages
Using pip (Python packages)
# In a notebook cell
!pip install package_name
# Install specific version
!pip install pandas==1.3.0
# Install from requirements file
!pip install -r requirements.txt
# Install in user directory (if no write permissions)
!pip install --user package_nameUsing conda
# In a notebook cell
!conda install -c conda-forge package_name -y
# Install multiple packages
!conda install numpy pandas matplotlib -y
# Create new environment
!conda create -n myenv python=3.9 -y
!conda activate myenv # Note: Activation in notebooks is trickyManaging Python Environments
Check current environment
import sys
print(sys.executable) # Python interpreter path
print(sys.version) # Python version
# List installed packages
!pip list
!conda listCreating isolated environments
# In terminal
python -m venv myproject
source myproject/bin/activate # Linux/Mac
pip install -r requirements.txtUsing Different Kernels
Install IPython kernel:
python -m ipykernel install --user --name mykernel --display-name "My Kernel"List available kernels:
jupyter kernelspec listRemove a kernel:
jupyter kernelspec uninstall mykernel
Terminal and Command Line Access
Opening Terminal
- From Launcher: Click βTerminalβ icon
- From menu: File β New β Terminal
- Keyboard shortcut: (varies by setup)
Common Terminal Commands
# Navigation
pwd # Print working directory
ls -la # List files with details
cd ~/notebooks # Change directory
# File operations
mkdir project # Create directory
cp file1.txt file2.txt # Copy file
mv oldname newname # Move/rename
rm file.txt # Delete file (careful!)
# File viewing
cat file.txt # Display file contents
head -n 10 data.csv # First 10 lines
tail -n 10 log.txt # Last 10 lines
less large_file.txt # Page through file
# Process management
ps aux # List processes
top # Monitor resources
kill -9 PID # Kill process
# Git operations
git status
git add .
git commit -m "message"
git pushWorking with Data Files
# Count lines in file
wc -l data.csv
# View CSV structure
head -1 data.csv | tr ',' '\n' | nl
# Search in files
grep "pattern" file.txt
grep -r "pattern" ./directory
# Compress/decompress
zip archive.zip file1 file2
unzip archive.zip
tar -czf archive.tar.gz directory/
tar -xzf archive.tar.gzCollaboration and Sharing
Real-time Collaboration
Some JupyterHub deployments support real-time collaboration:
- Share workspace link: Get shareable link from hub admin
- Collaborative editing: Multiple users can edit simultaneously
- See collaborator cursors: Real-time cursor positions
- Chat integration: Built-in chat for discussion
Version Control Best Practices
Clear outputs before committing:
jupyter nbconvert --clear-output notebook.ipynbUse .gitignore:
.ipynb_checkpoints/ __pycache__/ *.pyc .DS_Store data/ # Don't commit large data filesNotebook diff tools:
# Install nbdime for better notebook diffs pip install nbdime nbdime config-git --enable
Resource Management
Understanding Resource Limits
Your JupyterHub instance has resource limits:
# Check available resources
import psutil
# Memory
memory = psutil.virtual_memory()
print(f"Total RAM: {memory.total / 1e9:.2f} GB")
print(f"Available: {memory.available / 1e9:.2f} GB")
print(f"Used: {memory.percent}%")
# CPU
print(f"CPU cores: {psutil.cpu_count()}")
print(f"CPU usage: {psutil.cpu_percent()}%")
# Disk
disk = psutil.disk_usage('/')
print(f"Disk space: {disk.total / 1e9:.2f} GB")
print(f"Disk used: {disk.percent}%")Monitoring Resource Usage
JupyterLab Extension
- Install Resource Usage extension
- Shows real-time memory and CPU usage in status bar
Command line monitoring
# Real-time resource monitoring
top
htop # If installed
# Memory usage
free -h
# Disk usage
df -h
du -sh * # Directory sizesOptimizing Resource Usage
Clear variables when done:
# Clear specific variable del large_dataframe # Clear all variables %reset -f # Garbage collection import gc gc.collect()Use efficient data types:
# Use categories for strings with few unique values df['category'] = df['category'].astype('category') # Use smaller numeric types when possible df['count'] = df['count'].astype('int32') # Instead of int64Process data in chunks:
# Read large CSV in chunks chunk_size = 10000 for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): process_chunk(chunk)
Shutting Down Properly
Always shut down kernels and terminals when done:
- Shutdown kernel: Kernel β Shutdown
- Close terminals: Exit or
Ctrl+D - Hub Control Panel: File β Hub Control Panel β Stop My Server
- Logout: File β Log Out
β οΈ Important: Idle servers may be automatically culled after a period of inactivity (usually 1-2 hours).
Best Practices
Project Organization
Use consistent naming:
2024-01-15_earthquake_analysis.ipynb # Good untitled1.ipynb # BadCreate project templates:
# notebook_template.ipynb # 1. Imports import pandas as pd import numpy as np import matplotlib.pyplot as plt # 2. Configuration pd.set_option('display.max_columns', None) plt.style.use('seaborn') # 3. Data Loading # 4. Data Exploration # 5. Analysis # 6. ResultsDocument dependencies:
# Generate requirements.txt !pip freeze > requirements.txt
Security Considerations
Never commit credentials:
# Bad api_key = "sk-abc123def456" # Good - Use environment variables import os api_key = os.environ.get('API_KEY')Use secrets management:
# Store secrets in .env file from dotenv import load_dotenv load_dotenv() # Access secrets secret = os.getenv('SECRET_KEY')Be careful with outputs:
- Clear cells containing sensitive information
- Review notebooks before sharing
Performance Tips
Vectorize operations:
# Slow results = [] for i in range(len(df)): results.append(df.iloc[i]['column'] * 2) # Fast results = df['column'] * 2Use built-in functions:
# Use pandas/numpy operations instead of loops df['new_col'] = df['col1'] + df['col2'] # VectorizedProfile your code:
%%time # Time entire cell %timeit function() # Time single line # Detailed profiling %load_ext line_profiler %lprun -f function_to_profile function_to_profile()
Troubleshooting
Common Issues and Solutions
Kernel Wonβt Start
- Check resources: Server might be full
- Try different kernel: Some kernels may be broken
- Restart server: Hub Control Panel β Stop β Start
Package Import Errors
# Check if package is installed
import importlib
if importlib.util.find_spec("package_name") is None:
!pip install package_name
# Restart kernel after installation
from IPython import get_ipython
get_ipython().kernel.do_shutdown(True)Out of Memory Errors
- Clear unnecessary variables:
del variable_name - Use smaller data samples for testing
- Request larger server instance
- Process data in chunks
Notebook Wonβt Save
- Check disk space:
df -hin terminal - Check file permissions:
ls -la notebook.ipynb - Save with new name: File β Save As
- Download backup: File β Download
Connection Issues
- Check internet connection
- Try different browser
- Clear browser cache
- Check if hub is under maintenance
Getting Help
Built-in help:
help(function_name) function_name? # Quick help function_name?? # Source codeDocumentation:
- JupyterHub docs: https://jupyterhub.readthedocs.io
- JupyterLab docs: https://jupyterlab.readthedocs.io
- 2i2c docs: https://docs.2i2c.org
Community support:
- Discourse forum
- GitHub issues
- Stack Overflow with tags:
jupyter,jupyterhub
Keyboard Shortcuts
Command Mode (Blue cell border)
Press Esc to enter command mode
| Shortcut | Action |
|---|---|
Enter |
Enter edit mode |
A |
Insert cell above |
B |
Insert cell below |
D,D |
Delete cell |
Y |
Change to code cell |
M |
Change to markdown cell |
Shift+Up/Down |
Select multiple cells |
Shift+M |
Merge selected cells |
C |
Copy cell |
X |
Cut cell |
V |
Paste cell below |
Shift+V |
Paste cell above |
Z |
Undo cell deletion |
0,0 |
Restart kernel |
I,I |
Interrupt kernel |
Edit Mode (Green cell border)
Press Enter to enter edit mode
| Shortcut | Action |
|---|---|
Esc |
Enter command mode |
Ctrl+Enter |
Run cell |
Shift+Enter |
Run cell, select below |
Alt+Enter |
Run cell, insert below |
Ctrl+S |
Save notebook |
Tab |
Code completion |
Shift+Tab |
Tooltip |
Ctrl+] |
Indent |
Ctrl+[ |
Dedent |
Ctrl+A |
Select all |
Ctrl+Z |
Undo |
Ctrl+Y |
Redo |
JupyterLab Shortcuts
| Shortcut | Action |
|---|---|
Ctrl+Shift+C |
Command palette |
Ctrl+B |
Toggle left sidebar |
Ctrl+Shift+D |
Toggle file browser |
Ctrl+Shift+F |
Find and replace |
Ctrl+Shift+[ |
Previous tab |
Ctrl+Shift+] |
Next tab |
Alt+W |
Close tab |
Resources and Links
Official Documentation
- JupyterHub Documentation: https://jupyterhub.readthedocs.io
- JupyterLab Documentation: https://jupyterlab.readthedocs.io
- Jupyter Notebook Documentation: https://jupyter-notebook.readthedocs.io
- 2i2c Infrastructure Guide: https://docs.2i2c.org
Tutorials and Learning Resources
- Jupyter Tutorial: https://jupyter.org/try
- Real Python Jupyter Guide: https://realpython.com/jupyter-notebook-introduction/
- DataCamp Jupyter Tutorial: https://www.datacamp.com/tutorial/tutorial-jupyter-notebook
- Official Jupyter Examples: https://github.com/jupyter/jupyter/wiki/Gallery-of-Jupyter-Notebooks
Disaster Response Specific Resources
- NASA Disasters Program: https://disasters.nasa.gov
- USGS Hazards Data: https://www.usgs.gov/natural-hazards
- NOAA Disaster Data: https://www.ncdc.noaa.gov/billions/
- Copernicus Emergency Management: https://emergency.copernicus.eu
Python Libraries for Disaster Analysis
# Geospatial analysis
import geopandas as gpd
import rasterio
import xarray as xr
import folium
# Data processing
import pandas as pd
import numpy as np
import dask.dataframe as dd
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# Machine learning
from sklearn import *
import tensorflow as tf
import torch
# Earth observation
import ee # Google Earth Engine
import planetary_computer as pc
import pystac_clientHelpful Extensions
Install JupyterLab extensions for enhanced functionality:
# Variable inspector
jupyter labextension install @lckr/jupyterlab_variableinspector
# Table of contents
jupyter labextension install @jupyterlab/toc
# Git integration
pip install jupyterlab-git
# Code formatter
pip install jupyterlab-code-formatterCommunity and Support
- Jupyter Discourse Forum: https://discourse.jupyter.org
- Stack Overflow: https://stackoverflow.com/questions/tagged/jupyter
- GitHub Issues: https://github.com/jupyterhub/jupyterhub/issues
- 2i2c Support: https://2i2c.org/support
- Gitter Chat: https://gitter.im/jupyterhub/jupyterhub
Quick Reference PDFs
- JupyterLab Cheat Sheet: https://www.datacamp.com/cheat-sheet/jupyterlab-cheat-sheet
- Jupyter Shortcuts PDF: https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/
- Markdown Guide: https://www.markdownguide.org/cheat-sheet/
Appendix: Sample Workflow
Hereβs a complete example workflow for disaster analysis:
# 1. Setup and Imports
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# 2. Load Data
# Earthquake data
earthquakes = pd.read_csv('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_month.csv')
earthquakes['time'] = pd.to_datetime(earthquakes['time'])
# 3. Data Processing
# Filter recent events
recent = earthquakes[earthquakes['time'] > datetime.now() - timedelta(days=7)]
# Convert to GeoDataFrame
geometry = gpd.points_from_xy(recent.longitude, recent.latitude)
geo_df = gpd.GeoDataFrame(recent, geometry=geometry, crs='EPSG:4326')
# 4. Analysis
print(f"Total earthquakes in last 7 days: {len(recent)}")
print(f"Average magnitude: {recent['mag'].mean():.2f}")
print(f"Largest earthquake: {recent['mag'].max():.2f}")
# 5. Visualization
# Static plot
fig, ax = plt.subplots(figsize=(12, 8))
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.plot(ax=ax, color='lightgray', edgecolor='black')
geo_df.plot(ax=ax, color='red', markersize=geo_df['mag']**2, alpha=0.6)
plt.title('Recent Earthquakes (M4.5+)')
plt.show()
# Interactive map
m = folium.Map(location=[0, 0], zoom_start=2)
for idx, row in geo_df.iterrows():
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=row['mag']*2,
popup=f"M{row['mag']} - {row['place']}",
color='red',
fill=True
).add_to(m)
m.save('earthquake_map.html')
# 6. Export Results
geo_df.to_csv('processed_earthquakes.csv', index=False)
print("Analysis complete! Results saved.")Last Updated: 2024
Version: 1.0
Disasters Hub Training Guide
For additional assistance, contact your hub administrator or visit the 2i2c support portal.