Back to Projects

Arctic Sea Ice Classification (TSViT)

Building a planetary-scale perception system that uses Vision Transformers to see through clouds and darkness, providing mission-critical intelligence on Arctic sea ice.

Research Project 2024 - Present View Code

The Challenge

Massive Datasets

The AI4ARCTIC dataset exceeds 500GB, making standard download methods impractical and prone to failure. Required building robust, programmatic solutions for reliable data acquisition.

Data Imbalance

Severe lack of high-density sea ice examples in the training data negatively affected early model performance, requiring architectural innovations to overcome.

Multi-Modal Complexity

Fusing optical, SAR, and meteorological data streams while maintaining temporal consistency presented significant technical challenges.

The Process & My Contribution

Framework Modernization

Upgraded the entire deep learning framework to a modern stack (Python 3.11, PyTorch 2.1+), ensuring compatibility and performance. This involved rewriting legacy code and establishing new development standards.

Robust Data Pipeline

Developed a custom API client to programmatically download and manage the massive AI4ARCTIC dataset. Implemented retry logic, checkpointing, and parallel downloads to ensure reliability.

HPC Environment Setup

Configured the software environment and data pipelines for large-scale training on the 224-core university HPC cluster. Wrote SLURM scripts for efficient job scheduling and resource allocation.

Model Evolution

Researched and implemented the state-of-the-art TSViT model, evolving from a single-modal baseline (v1) to temporal analysis (v2), with plans for full multi-modal fusion (v3).

Architecture & Technical Deep Dive

Why Temporal-Spatial Vision Transformer (TSViT)?

The choice of Vision Transformer architecture was inspired by its success in remote sensing domains. The project aims to replicate that success while incorporating advanced explainability features from Swin Transformer variants.

V1 - Single Modality Baseline

Initial implementation focused on single data modality to establish performance baseline.

V2 - Temporal Analysis

Evolution to leverage temporal dimension, moving from purely spatial to spatio-temporal analysis. This was key to overcoming data imbalances.

V3 - Multi-Modal Fusion (Future)

Next stage: full multi-modal approach fusing optical, SAR, and meteorological data for comprehensive classification.

Tech Stack

Core Framework

Python 3.11 PyTorch 2.1+ CUDA 12.1

ML Libraries

einops transformers timm

Data Handling

Pandas Xarray NetCDF/Zarr

HPC & Orchestration

SLURM Bash Conda

Data Pipeline Architecture

1. Programmatic Download

Custom Python script interfacing with data provider's API for reliable, sequential download of hundreds of gigabytes.

2. Data Staging & Preprocessing

Raw satellite data (SAR, optical, meteorological) staged on HPC cluster. Preprocessing includes normalization, temporal alignment, and spatio-temporal patch extraction.

3. Efficient Loading

Custom PyTorch Dataset and DataLoader classes for efficient patch loading, data augmentation, and GPU feeding to ensure no idle time.

Outcomes & Impact

98.4%

Test Accuracy

500GB+

Dataset Processed

224

HPC Cores Utilized

Evolution from single-modal to temporal approach was critical in overcoming data imbalance issues

Foundation for upcoming research paper on multi-modal sea ice classification

End-to-end case study in building AI systems for critical environmental monitoring

Prepared for distributed training using PyTorch DDP for even larger experiments

Future Work

Implement full multi-modal fusion architecture (V3)
Scale to distributed training across multiple nodes
Incorporate explainability features from Swin Transformer
Submit paper to top-tier conference (Target: August 31, 2025)
Deploy edge-optimized version for real-time maritime applications