top of page
white.png

Data Science Projects

This section showcases end-to-end projects in Data Science (DS), Data Analysis (DA), and Computer Science, spanning domains including aviation safety, finance, and full-stack development. Projects are hosted on their respective platforms — GitHub, Hugging Face (HF), and Kaggle — where notebooks, datasets, and live demos are available in full.

white.png

Project: Aviation Human Factor Incidents and Preventive Measures

Narrative: NASA ASRS (Aviation Safety Reporting System), ~111,000 incident reports (2005–2025), filtered to ~40,000 human-factor incidents. The dataset was collected, cleaned, and preprocessed by the author. An end-to-end ML (Machine Learning) pipeline was built using sentence embeddings (MiniLM) combined with structured features, feeding XGBoost (eXtreme Gradient Boosting) multi-label classifiers across four target groups and 69 labels. A preventive recommendations layer sits on top of model outputs. Deployed as a live Gradio app on Hugging Face Spaces.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Multi-Task NLP for Aviation Incident Risk Estimation

Narrative: ~38,000 aviation safety incident reports from NASA's ASRS, covering January 2012 to March 2022, including structured metadata and unstructured narrative descriptions. The project applies NLP (Natural Language Processing) techniques to classify incident risk factors from free-text reports, combining text embeddings with structured tabular features. Deployed as a live Gradio app on Hugging Face Spaces.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Finance Prediction

Narrative: Daily OHLCV (Open, High, Low, Close, Volume) stock data for major aircraft manufacturers and airlines (2015–2025), sourced from Yahoo Finance. Applies time-series analysis and regression models to explore price prediction in the aviation financial sector.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Netflix Movies Rating Prediction

Narrative: An independent regression project analysing Netflix movie metadata (2010–2025). Evaluate whether engineered metadata features can be used to predict movie ratings using regression models.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Safety in Aviation Industry — EDA (Exploratory Data Analysis)

Narrative: A Data Analytics project exploring historical aviation accident data (1908–2023). Merging two datasets, and addresses the questions: Is it safer to fly today than in the past? Will it be even safer in the future? Includes trend analysis, fatality rate modelling, and Power BI (Business Intelligence) dashboard output.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Movies & Shows Suggester Website (Harvard CS50 Final Project)

Narrative: A Flask web application that queries two movie databases (shows.db, movies.db) to recommend the best movies and shows from Netflix. Integrates an OpenAI API (Application Programming Interface) for natural language queries such as "best shows in 2021" or "movies with actors X, Y, Z."

white.png

Project: ASRS Dataset Creation

Narrative: Documents the full pipeline used to collect and prepare the NASA ASRS aviation safety incident dataset — downloaded in batches of up to 5,000 records per query, cleaned, and published as a standalone open dataset on Hugging Face and Kaggle.

Links:

GitHub

Hugging Face

Kaggle

white.png

Project: Currency Transfer Tracker (Harvard CS50SQL Final Project)

Narrative: A relational database designed to track personal financial movements — investments, transfers, and expenditures — built as the final project of the Harvard CS50SQL course. Demonstrates schema design, normalization, and query writing.

white.png

Project: Space Collision Game (Harvard CS50P Final Project)

Narrative: A real-time terminal game built in Python as the final project of the Harvard CS50P course. The player controls a meteor using keyboard inputs to avoid collisions with randomly generated planets. Demonstrates Python fundamentals, loop control, and library integration (blessed, random).

© 2017-2026 - Matheus Hagemann

bottom of page