My Data Science Projects
These projects are my most recent work in completing the
10-month Data Science program with TripleTen
(Graduated October '25)
Through this program, I was able to gain more in-depth experience using Python and SQL.
I also deepened by knowledge of ML, Time Series Analysis, Computer Vision, Statistics,
Probability and NLP.
Our goal here was to help the telecom operator Interconnect forecast their churn of clients.
Using contract, personal, internet and phone data, we were able to develop a model that could
accurately determine when clients were about to churn which helped the company focus more
targeting marketing strategies and promotions towards those customers reducing overall churn
rate.
---------------------------------
MODULE | Final Project
Concepts
ML | OHE | Scaling | Pipelines | LR | Accuracy Score | AUC-ROC
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn
The Green Seed Supermarket company wanted to develop a system that can detect if an underage
customer is attempting to purchase alcohol. We used Computer Vision to help build a model that
would verify a person’s age using cameras at the checkout counters. We used the Tensorflow
library to help create a model that would help Green Seed Supermarket accomplish this goal.
---------------------------------
MODULE | Computer Vision
Concepts
Computer Vision | Scripts
Libs & Packs
pandas | numpy | matplotlib | os | math | tensorflow | keras
The Film Junky Union, a new edgy community for classic movie enthusiasts, was developing a system
for filtering and categorizing movie reviews. Our goal here was to train a model to
automatically detect negative reviews to being work on this new system. We used the F1 Score
metric to measure effectiveness for this project. MLTK, TG-IDF, spaCy and LR models were used to
help develop the result the company was looking for.
---------------------------------
MODULE | ML for Texts
Concepts
ML | DummyClassifier | LR | f1 Score
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | math | re | spaCy | tqdm | LightGBM |
nltk
The company Sweet Lift Taxi Co asked to build a model to predict the amount of taxi orders for
any given hour. Their goal was to attract more drivers during peak hours. rMSE metric was used
to evaluate these model’s predictions. Time Series methods such as trends and seasonality were
used to analyze the data and later train the models. Autoregression, Moving Average, ARMA and
AutoARIMA models were used with the data to find the best fit for the data provided.
---------------------------------
MODULE | Time Series
Concepts
ML | Seasonality | Time Series Analysis | MSE
Libs & Packs
pandas | numpy | matplotlib | sklearn | statsmodels: seasonal_decompose, AutoReg, ARIMA |
pmdARIMA
For this project, a model was built to help the Rusty Bargain Car Co develop an app to attract
new customers. The app would allow potential customers to quickly find out the market value of
their car. The model was built using historical data such as technical specifications, trim
versions and car prices. LinearRegression, RandomForest and LightGBM models were used for this
project.
---------------------------------
MODULE | Numerical Methods
Concepts
ML | OHE | Pipelines | RF Model | LightGBM
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | LightGBM | time | warnings |
IPython.core.magic
In this project, the Sure Tomorrow Insurance Company asked us to create a model that would help
find similar customers to a given customer to help inform the company for marketing purposes. We
also were able to predict whether a new customer is likely to receive an insurance benefit from
the company and also predict the number of insurance benefits that new customer is likely to
receive.
---------------------------------
MODULE | Linear Algebra
Concepts
ML | NearestNeighbors | KNeighborsClassifier | Scaling | f1 Score | MSE | r2 Score
Libs & Packs
pandas | numpy | seaborn | sklearn | math | IPython.display
In this project, our goal was to help create a machine learning model that would help predict the
amount of gold recovered from gold ore. A model was developed to optimize production of a gold
mine and eliminate unprofitable parameters. First, recovery values were validated to ensure
accuracy of the data provided by the company. After data preprocessing, concentrations of gold,
silver and lead were measured and compared. Models were built using RandomForest and XGBoost.
---------------------------------
MODULE | Integrated Project 2
Concepts
ML | MAE | KFolds | RF Model | XGB Model
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn | XGBoost
This project was focused on finding the best place for a new well to be constructed. Three
regions were explored and their profitability, and risk, were calculated to decipher the best
possible region to break ground for a new well.
---------------------------------
MODULE | ML in Business
Concepts
ML | LR Model | MSE | TrainTestSplit
Libs & Packs
pandas | numpy | seaborn | matplotlib | sklearn
For BetaBank, analysis was done into the behaviors of their clients that lead to contract
termination. Because of the data provided by the company, three models were trained and tested
on data that was upsampled, downsampled and balanced to find the best results. Final testing was
scored with the F1 Score metric and the AUC-ROC Score. We were able to train a model to help
anticipate a client’s intention to terminate their contract and lower churn rates for the
company overall.
---------------------------------
MODULE | Supervised Learning
Concepts
ML | LR Model | Recall & Precision | f1 Score | AUC-ROC | DecisionTree | RF Model
Libs & Packs
pandas | numpy | scipy.stats | matplotlib | sklearn
In this project, we use MachineLearning to help Megaline analyze subscriber behavior and
recommend one of its newer plans: Smart or Ultra. Three models were used and tested for accuracy
to find the best model for the job.
---------------------------------
MODULE | Intro to ML
Concepts
ML | TrainTestSplit | Accuracy | DecisionTree | RF Model | LR Model
Libs & Packs
pandas | sklearn
Goal for this project was to provide insight into ride share company customer preferences and the
impact of external factors on rides. In the analysis, we looked at the Top 10 Companies used by
customers in Chicago, and then the Top 10 Drop Off Locations. We compared data against days that
had inclement weather in the city and saw how that affected ride share activity. Hypothesis
testing was also done to determine the best recommendations to give regarding ride share
companies in Chicago.
---------------------------------
MODULE | SQL
Concepts
ML | Hypothesis Testing
Libs & Packs
pandas | matplotlib | IPython.display | scipy.stats: ttest_ind
The goal for this project was to provide accurate information regarding sales data for an online
store called “IceCube” that sells video games all over the world. The objective was to provide
insight into potential big winners to help plan advertising campaigns. After initial data
preprocessing, focus was on exploring the amount of games released by year, sales by genre and
platform, and game performance across platforms among other analyses. We also explored sales and
genre data across regions to help inform out business decisions. Hypothesis testing was
performed at the end of this project to help solidify accurate conclusions to help the company
make effective advertising campaigns.
---------------------------------
MODULE | Integrated Project 1
Concepts
ML | Hypothesis Testing | Data Visualization
Libs & Packs
pandas | numpy | seaborn | matplotlib | plotly.express | scipy.stats: peasronr, ttest_ind
The intention with this project was to create, and deploy, a web application to a cloud service
that is accessible to the public. The application was deployed through Render and was initially
created on my local computer and changes were pushed to a GitHub Repo. The application shows the
comparison between two car models by price and also by odometer reading.
(Link to Web App)
---------------------------------
MODULE | Software Dev Tools
Concepts
Command Line | Development Environments | Git & GitHub
Libs & Packs
pandas | streamlit | plotly.express | altair
Goal for this project was to determine which prepaid plan, in the telecom operator Megaline,
brings in the most revenue to adjust its advertising budget. After data preprocessing, focus was
on parsing out important datetime values for better analysis. Focused on Mean, Variance and
Standard Deviation for comparison of the plans. Also tested hypotheses to determine the best
plan for the company.
---------------------------------
MODULE | SDA
Concepts
Manipulating Data | Statistical Analysis | Hypothesis Testing | Data Visualization
Libs & Packs
pandas | numpy | seaborn | matplotlib | scipy
Objective was to provide insight into the ordering habits of customers for Instacart to help
inform business decisions. After preprocessing, EDA was focused on exploring things such as the
top 20 items that customers purchase, what days of the week had the most orders placed, and
which items were reordered the most overall.
---------------------------------
MODULE | EDA
Concepts
EDA | Data Manipulation | Data Visualization
Libs & Packs
pandas | matplotlib