Back to Projects

LibraryIQ

Unlocking the hidden intelligence of a university library by building its 'digital nervous system' - an end-to-end platform managing 400k+ volumes via high-throughput data pipeline with real-time sensor networks providing holistic operational intelligence.

Innovation Lab 2024 - Present Lead Architect & Developer

The Challenge

Data Blindness

University libraries operate on anecdotal evidence rather than data, leading to inefficient resource allocation, underutilized spaces, and disconnect from actual student needs.

Massive Catalog Scale

Managing and enriching metadata for 400,000+ volumes requires processing 10M+ API calls while handling memory constraints and rate limits efficiently.

Privacy vs Intelligence

Need to gather operational intelligence about space usage and patterns while maintaining strict privacy standards - no personally identifiable information collection.

The Process & My Contribution

This project evolved from a simple data-enrichment script into a full-scale intelligence platform through systematic thinking and iterative development.

1

System Architecture Design

Designed entire multi-stage architecture from data ingestion pipeline to CV-powered sensor network, ensuring each component feeds into a holistic system for actionable intelligence.

2

Data Pipeline Engineering

Built robust Python pipeline handling 10M+ API calls. When initial processing crashed at 110,000 ISBNs due to memory constraints, re-engineered to flush results in 350-record batches, ensuring stability.

3

Computer Vision Integration

Expanded scope to include YOLOv8 for anonymous foot traffic analysis and MLX90640 thermal sensors for privacy-preserving occupancy tracking, aligning with thesis requirements.

4

Research & Publication Planning

Structured project as foundation for master's thesis and roadmap for 7-8 academic papers covering data engineering, privacy-preserving AI, and library science applications.

Architecture & Technical Implementation

System Components

Data Backbone

High-throughput Python pipeline utilizing ISBNdb API and web scraping, engineered to handle 10M+ API calls for 400,000+ volume catalog enrichment.

Gate Counting System

YOLOv8-based computer vision using existing security cameras for real-time, anonymous foot traffic analysis without storing any personal data.

Occupancy Tracking

MLX90640 thermal sensor network providing privacy-preserving analysis of furniture usage and study zone patterns - sees heat, not faces.

Service Analytics

Enhanced Desk Tracker system providing granular data on service demand, wait times, and staff allocation optimization.

Golden Database

PostgreSQL implementation centralizing all operational data, creating single source of truth for library intelligence.

API Layer

FastAPI backend exposing insights to dashboards and enabling integration with other library services.

Data Pipeline Architecture

Batch Processing Strategy

Engineered resilient batch processing system that:

  • Processes ISBNs in configurable batch sizes (optimized at 350 records)
  • Implements exponential backoff for API rate limiting
  • Maintains checkpoint system for crash recovery
  • Flushes results to disk periodically to prevent memory overflow
  • Handles malformed data gracefully with comprehensive error logging

Tech Stack

Data Pipeline

Python Pandas BeautifulSoup Asyncio

Computer Vision

YOLOv8 OpenCV MLX90640 TensorFlow

Backend & Storage

FastAPI PostgreSQL Redis SQLAlchemy

Analytics

Plotly Dash Scikit-learn Prophet

Empowering Staff: Data Superpowers

The system augments library staff, automating tedious tasks and providing clear, actionable insights.

Collections Manager

X-Ray Vision into Collection Health

Dashboards showing trending subjects, collection gaps, and data-driven purchasing recommendations. Builds more relevant collections while eliminating wasteful spending.

Access Services Staff

Real-Time Building Omniscience

Live map showing floor activity, study room availability, and service desk wait times. Enables proactive student guidance and higher service levels.

Library Director

Precognition for Strategic Planning

Predictive analytics forecasting budget and staffing needs based on historical patterns. Enables evidence-based arguments for funding and strategic adjustments.

Social Media Manager

Never-Ending Content Fountain

Automatic feed of interesting, real-time data points to share. Makes outreach timely, relevant, and genuinely helpful to the student community.

Outcomes & Impact

400K+
Volumes Cataloged
110K+
ISBNs Processed
10M+
API Calls Handled
7-8
Papers Planned

Foundation Dataset: Successfully processed 110,000+ ISBNs forming core dataset for the platform

Academic Impact: Foundation for master's thesis and portfolio of research papers on privacy-preserving AI

Operational Intelligence: Will provide UTPB library with unprecedented real-time insights for data-driven decisions

Open Source Contribution: Being built as reference implementation for other academic libraries worldwide

Privacy-First Design: Demonstrates how to gather intelligence without compromising individual privacy

Key Innovations

  • Memory-Efficient Batch Processing: Novel approach to handling massive API operations within constrained environments
  • Privacy-Preserving Analytics: Thermal sensors and anonymous CV providing insights without personal data
  • Holistic System Design: Integration of disparate data sources into unified intelligence platform
  • Staff Augmentation Focus: Technology designed to empower humans, not replace them
  • Academic-Commercial Bridge: Production-quality system serving as research platform

Future Development

The platform continues to evolve with ambitious plans for expansion:

Planned Features

  • Predictive Maintenance: ML models predicting equipment failures before they occur
  • Personalized Recommendations: Privacy-preserving recommendation engine for students
  • Energy Optimization: Smart HVAC control based on occupancy patterns
  • Virtual Assistant: AI-powered chat interface for library services
  • Cross-Library Federation: Network effect by connecting multiple library systems

Research Papers Pipeline

  • "Privacy-Preserving Occupancy Analytics Using Thermal Imaging"
  • "High-Throughput Data Pipeline Architecture for Library Systems"
  • "Computer Vision Applications in Academic Library Management"
  • "Predictive Analytics for Collection Development"
  • "Open-Source Intelligence Platform for Academic Libraries"