Arctic Sea Ice Classification (TSViT)
Building a planetary-scale perception system that uses Vision Transformers to see through clouds and darkness, providing mission-critical intelligence on Arctic sea ice.
The Challenge
Massive Datasets
The AI4ARCTIC dataset exceeds 500GB, making standard download methods impractical and prone to failure. Required building robust, programmatic solutions for reliable data acquisition.
Data Imbalance
Severe lack of high-density sea ice examples in the training data negatively affected early model performance, requiring architectural innovations to overcome.
Multi-Modal Complexity
Fusing optical, SAR, and meteorological data streams while maintaining temporal consistency presented significant technical challenges.
The Process & My Contribution
Framework Modernization
Upgraded the entire deep learning framework to a modern stack (Python 3.11, PyTorch 2.1+), ensuring compatibility and performance. This involved rewriting legacy code and establishing new development standards.
Robust Data Pipeline
Developed a custom API client to programmatically download and manage the massive AI4ARCTIC dataset. Implemented retry logic, checkpointing, and parallel downloads to ensure reliability.
HPC Environment Setup
Configured the software environment and data pipelines for large-scale training on the 224-core university HPC cluster. Wrote SLURM scripts for efficient job scheduling and resource allocation.
Model Evolution
Researched and implemented the state-of-the-art TSViT model, evolving from a single-modal baseline (v1) to temporal analysis (v2), with plans for full multi-modal fusion (v3).
Architecture & Technical Deep Dive
Why Temporal-Spatial Vision Transformer (TSViT)?
The choice of Vision Transformer architecture was inspired by its success in remote sensing domains. The project aims to replicate that success while incorporating advanced explainability features from Swin Transformer variants.
V1 - Single Modality Baseline
Initial implementation focused on single data modality to establish performance baseline.
V2 - Temporal Analysis
Evolution to leverage temporal dimension, moving from purely spatial to spatio-temporal analysis. This was key to overcoming data imbalances.
V3 - Multi-Modal Fusion (Future)
Next stage: full multi-modal approach fusing optical, SAR, and meteorological data for comprehensive classification.
Tech Stack
Core Framework
Python 3.11 PyTorch 2.1+ CUDA 12.1ML Libraries
einops transformers timmData Handling
Pandas Xarray NetCDF/ZarrHPC & Orchestration
SLURM Bash CondaData Pipeline Architecture
1. Programmatic Download
Custom Python script interfacing with data provider's API for reliable, sequential download of hundreds of gigabytes.
2. Data Staging & Preprocessing
Raw satellite data (SAR, optical, meteorological) staged on HPC cluster. Preprocessing includes normalization, temporal alignment, and spatio-temporal patch extraction.
3. Efficient Loading
Custom PyTorch Dataset and DataLoader classes for efficient patch loading, data augmentation, and GPU feeding to ensure no idle time.
Outcomes & Impact
Evolution from single-modal to temporal approach was critical in overcoming data imbalance issues
Foundation for upcoming research paper on multi-modal sea ice classification
End-to-end case study in building AI systems for critical environmental monitoring
Prepared for distributed training using PyTorch DDP for even larger experiments
Future Work
- Implement full multi-modal fusion architecture (V3)
- Scale to distributed training across multiple nodes
- Incorporate explainability features from Swin Transformer
- Submit paper to top-tier conference (Target: August 31, 2025)
- Deploy edge-optimized version for real-time maritime applications