My Data Science Projects

These projects are my most recent work in completing the
10-month Data Science program with TripleTen (Graduated October '25)
Through this program, I was able to gain more in-depth experience using Python and SQL.
I also deepened by knowledge of ML, Time Series Analysis, Computer Vision, Statistics, Probability and NLP.

Interconnect Co

Our goal here was to help the telecom operator Interconnect forecast their churn of clients. Using contract, personal, internet and phone data, we were able to develop a model that could accurately determine when clients were about to churn which helped the company focus more targeting marketing strategies and promotions towards those customers reducing overall churn rate.

---------------------------------
MODULE | Final Project
Concepts
ML | OHE | Scaling | Pipelines | LR | Accuracy Score | AUC-ROC
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn

Green Seed Supermarket

The Green Seed Supermarket company wanted to develop a system that can detect if an underage customer is attempting to purchase alcohol. We used Computer Vision to help build a model that would verify a person’s age using cameras at the checkout counters. We used the Tensorflow library to help create a model that would help Green Seed Supermarket accomplish this goal.

---------------------------------
MODULE | Computer Vision
Concepts
Computer Vision | Scripts
Libs & Packs
pandas | numpy | matplotlib | os | math | tensorflow | keras

Film Junky Union

The Film Junky Union, a new edgy community for classic movie enthusiasts, was developing a system for filtering and categorizing movie reviews. Our goal here was to train a model to automatically detect negative reviews to being work on this new system. We used the F1 Score metric to measure effectiveness for this project. MLTK, TG-IDF, spaCy and LR models were used to help develop the result the company was looking for.

---------------------------------
MODULE | ML for Texts
Concepts
ML | DummyClassifier | LR | f1 Score
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | math | re | spaCy | tqdm | LightGBM | nltk

Sweet Lift Taxi Co

The company Sweet Lift Taxi Co asked to build a model to predict the amount of taxi orders for any given hour. Their goal was to attract more drivers during peak hours. rMSE metric was used to evaluate these model’s predictions. Time Series methods such as trends and seasonality were used to analyze the data and later train the models. Autoregression, Moving Average, ARMA and AutoARIMA models were used with the data to find the best fit for the data provided.

---------------------------------
MODULE | Time Series
Concepts
ML | Seasonality | Time Series Analysis | MSE
Libs & Packs
pandas | numpy | matplotlib | sklearn | statsmodels: seasonal_decompose, AutoReg, ARIMA | pmdARIMA

Rusty Bargain Car Co

For this project, a model was built to help the Rusty Bargain Car Co develop an app to attract new customers. The app would allow potential customers to quickly find out the market value of their car. The model was built using historical data such as technical specifications, trim versions and car prices. LinearRegression, RandomForest and LightGBM models were used for this project.

---------------------------------
MODULE | Numerical Methods
Concepts
ML | OHE | Pipelines | RF Model | LightGBM
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | LightGBM | time | warnings | IPython.core.magic

Sure Tomorrow Insurance

In this project, the Sure Tomorrow Insurance Company asked us to create a model that would help find similar customers to a given customer to help inform the company for marketing purposes. We also were able to predict whether a new customer is likely to receive an insurance benefit from the company and also predict the number of insurance benefits that new customer is likely to receive.

---------------------------------
MODULE | Linear Algebra
Concepts
ML | NearestNeighbors | KNeighborsClassifier | Scaling | f1 Score | MSE | r2 Score
Libs & Packs
pandas | numpy | seaborn | sklearn | math | IPython.display

Zyfra Gold

In this project, our goal was to help create a machine learning model that would help predict the amount of gold recovered from gold ore. A model was developed to optimize production of a gold mine and eliminate unprofitable parameters. First, recovery values were validated to ensure accuracy of the data provided by the company. After data preprocessing, concentrations of gold, silver and lead were measured and compared. Models were built using RandomForest and XGBoost.

---------------------------------
MODULE | Integrated Project 2
Concepts
ML | MAE | KFolds | RF Model | XGB Model
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | XGBoost

Oily Giant CO

This project was focused on finding the best place for a new well to be constructed. Three regions were explored and their profitability, and risk, were calculated to decipher the best possible region to break ground for a new well.

---------------------------------
MODULE | ML in Business
Concepts
ML | LR Model | MSE | TrainTestSplit
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn

Beta Bank Churn

For BetaBank, analysis was done into the behaviors of their clients that lead to contract termination. Because of the data provided by the company, three models were trained and tested on data that was upsampled, downsampled and balanced to find the best results. Final testing was scored with the F1 Score metric and the AUC-ROC Score. We were able to train a model to help anticipate a client’s intention to terminate their contract and lower churn rates for the company overall.

---------------------------------
MODULE | Supervised Learning
Concepts
ML | LR Model | Recall & Precision | f1 Score | AUC-ROC | DecisionTree | RF Model
Libs & Packs
pandas | numpy | scipy.stats | matplotlib | sklearn

ML with Megaline

In this project, we use MachineLearning to help Megaline analyze subscriber behavior and recommend one of its newer plans: Smart or Ultra. Three models were used and tested for accuracy to find the best model for the job.

---------------------------------
MODULE | Intro to ML
Concepts
ML | TrainTestSplit | Accuracy | DecisionTree | RF Model | LR Model
Libs & Packs
pandas | sklearn

Chicago Taxi Ride Analysis

Goal for this project was to provide insight into ride share company customer preferences and the impact of external factors on rides. In the analysis, we looked at the Top 10 Companies used by customers in Chicago, and then the Top 10 Drop Off Locations. We compared data against days that had inclement weather in the city and saw how that affected ride share activity. Hypothesis testing was also done to determine the best recommendations to give regarding ride share companies in Chicago.

---------------------------------
MODULE | SQL
Concepts
ML | Hypothesis Testing
Libs & Packs
pandas | matplotlib | IPython.display | scipy.stats: ttest_ind

IceCube Video Games

The goal for this project was to provide accurate information regarding sales data for an online store called “IceCube” that sells video games all over the world. The objective was to provide insight into potential big winners to help plan advertising campaigns. After initial data preprocessing, focus was on exploring the amount of games released by year, sales by genre and platform, and game performance across platforms among other analyses. We also explored sales and genre data across regions to help inform out business decisions. Hypothesis testing was performed at the end of this project to help solidify accurate conclusions to help the company make effective advertising campaigns.

---------------------------------
MODULE | Integrated Project 1
Concepts
ML | Hypothesis Testing | Data Visualization
Libs & Packs
pandas | numpy | seaborn | matplotlib | plotly.express | scipy.stats: peasronr, ttest_ind

Car Price App

The intention with this project was to create, and deploy, a web application to a cloud service that is accessible to the public. The application was deployed through Render and was initially created on my local computer and changes were pushed to a GitHub Repo. The application shows the comparison between two car models by price and also by odometer reading.

(Link to Web App)

---------------------------------
MODULE | Software Dev Tools
Concepts
Command Line | Development Environments | Git & GitHub
Libs & Packs
pandas | streamlit | plotly.express | altair

Megaline Telecom Co
Statistical Analysis

Goal for this project was to determine which prepaid plan, in the telecom operator Megaline, brings in the most revenue to adjust its advertising budget. After data preprocessing, focus was on parsing out important datetime values for better analysis. Focused on Mean, Variance and Standard Deviation for comparison of the plans. Also tested hypotheses to determine the best plan for the company.

---------------------------------
MODULE | SDA
Concepts
Manipulating Data | Statistical Analysis | Hypothesis Testing | Data Visualization
Libs & Packs
pandas | numpy | seaborn | matplotlib | scipy

Instacart Insights

Objective was to provide insight into the ordering habits of customers for Instacart to help inform business decisions. After preprocessing, EDA was focused on exploring things such as the top 20 items that customers purchase, what days of the week had the most orders placed, and which items were reordered the most overall.

---------------------------------
MODULE | EDA
Concepts
EDA | Data Manipulation | Data Visualization
Libs & Packs
pandas | matplotlib