

Data Science Projects
This section showcases end-to-end projects in Data Science (DS), Data Analysis (DA), and Computer Science, spanning domains including aviation safety, finance, and full-stack development. Projects are hosted on their respective platforms — GitHub, Hugging Face (HF), and Kaggle — where notebooks, datasets, and live demos are available in full.

Project: Aviation Human Factor Incidents and Preventive Measures
Narrative: NASA ASRS (Aviation Safety Reporting System) dataset, filtered to ~40,000 human-factor incidents (2005–2025). An end-to-end machine learning pipeline was developed to identify human-factor risks and generate data-driven preventive insights.
Dataset: 111K ASRS | 40K filtered (human factors)
Models: MiniLM embeddings | XGBoost (Binary Relevance, 69 labels)
Features: Text embeddings + 39 structured variables
Deployment: Hugging Face Spaces | Gradio app | REST API inference
Tools: Python | Pandas | scikit-learn | SentenceTransformers | XGBoost | Gradio

Project: Multi-Task NLP for Aviation Incident Risk Estimation
Narrative: NASA ASRS dataset (~38,000 incident reports, 2012–2022) combining structured metadata and free-text narratives. A multi-task NLP model was developed to classify key incident dimensions and support data-driven risk analysis.
Dataset: 38K ASRS reports (2012–2022)
Models: DistilBERT (shared encoder + 3 heads)
Pipeline: Text consolidation | Tokenization | 512-token constraint (~21.5% truncation)
Training: AdamW | Stratified split | Class imbalance handling
Tools: Python | PyTorch | HuggingFace Transformers | scikit-learn

Project: Finance Prediction
Narrative: Yahoo Finance dataset (~13,800 records, 2015–2025) covering aviation-sector companies. A machine learning model was developed to predict 30-day stock performance and support data-driven financial analysis.
Dataset: 13.8K rows (Yahoo Finance, 2015–2025)
Features: Rolling stats | Volatility | Momentum | Daily returns
Target: stock_growth (binary, ~54/46 balance)
Models: Logistic Regression | Random Forest
Results: Evaluated via ROC-AUC across 5 aviation-sector stocks
Tools: Python | Pandas | scikit-learn | yfinance

Project: Netflix Movies Rating Prediction
Narrative: Netflix movies dataset (~16,000 titles, 2010–2025) enriched with engagement-based features. A regression model was developed to predict ratings and uncover patterns influencing audience perception.
Dataset: 16K Netflix movies (2010–2025)
Features: Director/Cast engagement (LOO) | Log transformations
Results: vote_count_log correlation ~0.59 (vs ~0.19 raw)
Models: XGBoost Regressor
Tools: Python | Pandas | NumPy | scikit-learn

Project: Safety in Aviation Industry — EDA (Exploratory Data Analysis)
Narrative: Aviation accident dataset (1908–2023) combining historical records for safety analysis. Exploratory data analysis was conducted to identify long-term trends, fatality patterns, and support data-driven insights on aviation safety evolution.
Dataset: ~100K+ aviation accident records (1908–2023)
Features: Temporal trends | Fatality rates | Aircraft & operation attributes
Results: Clear long-term safety improvement trends | Risk pattern identification | Power BI dashboard
Tools: Python | Pandas | NumPy | Matplotlib | Seaborn | Power BI

Project: Movies & Shows Suggester Website (Harvard CS50 Final Project)
Narrative: Movie and TV datasets integrated into a web-based recommendation system. A Flask application was developed to suggest content using similarity logic and natural language queries.
Dataset: Movies + Shows databases (Netflix-focused)
Features: Metadata filtering | Similarity-based recommendations | NLP query interface
Results: Interactive recommendation system | Natural language search (OpenAI API)
Tools: Python | Flask | SQL | OpenAI API | HTML/CSS

Project: ASRS Dataset Creation
Narrative: NASA ASRS dataset (~111,000 aviation incident reports, 2005–2025) collected and processed from raw sources. A data pipeline was developed to clean, standardize, and structure the dataset for scalable analysis and machine learning applications.
Dataset: 111K ASRS reports (2005–2025)
Features: Data cleaning | Deduplication (ACN) | Schema standardization | Data validation
Results: Public dataset published (Hugging Face & Kaggle) | Foundation for NLP projects
Tools: Python | Pandas | NumPy | glob | os