Soumya Ogoti - Project Portfolio

Large Language Models (LLMs)

Applied Research Project on Using Large Language Models for MND Assistive Devices

Python

Alpaca

Langchain

Streamlit

This research introduces a novel communication interface using Large Language Models (LLMs) for Motor Neurone Disease (MND) patients, allowing selection of responses in multiple tones. A lightweight browser-based chat interface facilitates seamless interaction. Evaluation of LLMs considered parameters, training data, and hardware requirements, with prompt engineering for MND-specific criteria. A user study assessed satisfaction and effectiveness, with quantitative evaluation against BERT. Among models evaluated, GPT-3.5 Turbo with memory is identified as superior.

Find Out More

Prediction Analytics

Attrition Forecasting for Merger and Acquisition

Python

Excel

Pandas

Sklearn

Matplotlib

Identified factors and determined probability of severance acceptance by employees using regression models. Objectively categorised employees into groups to offer severances to, without discrimination. Used Linear Programming to optimally choose groups for severance with the objective to minimize company costs, prevent mass exodus and maintain stable employee proportion.

Find Out More

Data Visualisation: Designing a Deck in Tableau

Exploring Profitable Property Investment in London: A Data-driven Analysis

Python

Pandas

Tableau

Identified profitable boroughs in London for property investment and Airbnb rental, using data from three sources: Airbnb listings, historical housing prices, and council tax records. Tableau visualizations were created, including bar charts, heatmaps, time series plots, and lollipop charts, to explore various aspects of the data. Key insights revealed the City of London, Westminster, and Greenwich as the most profitable for investment

Find Out More

Data Visualisation: Designing Impactful Charts

Exploring Pay Equity in an Organization: A Visual Analysis

Python

Pandas

Matplotlib

Examined equity within an organization using self-designed custom lollipop charts. Design considerations, including color-coding and scaled tips, were implemented to visualize multiple interacting pararameters together in a single chart such as salary comparisons based on gender, race, and ethnicity. The visualization offered insights into unbiased pay practices across job positions, supported by regression analysis confirming equitable salary distributions.

Find Out More

Database Design, Descriptive Statistics, and Insights using PostgreSQL

Bug Data Analysis for the Mozilla Project

Python

Matplotlib

PostgreSQL

Psycopg2

SCiPy

Using PostgreSQL and Python, bug data was structured into a database, including tables for bug reports, users, changes history, customer fields, flags, and comments. Descriptive statistical analysis was conducted using psycopg2 for insights into bug metrics such as bug distribution by severity and priority, resolution time, user engagement, and bug dependencies. The findings were used to provide recommendations for bug management and customer support strategies.

Find Out More

Deep Learning for Sequence Prediction

Predicting Wind Turbine Operating Modes

Python

Pandas

Matplotlib

Tensorflow

Predicted wind turbine operating modes from time series sensor data. Sequences of sensor data were analysed using dense networks like, Conv1D, Simple RNN, and GRU networks. An alternate approach where the data was transformed into images and fed into 2D CNNs, following the approach outlined in Rahimilarki et al. (2022) was also explored. The best-performing model, derived from Rahimilarki et al. and enhanced with additional CNN layers, fine-tuning, batch normalization, dropout, and learning rate scheduling, achieved the highest accuracy of 87.3% on the test dataset.

Find Out More

Natural Language Processing: Text Classification

Analyzing BeerAdvocate Reviews

Python

Pandas

Matplotlib

Seaborn

Transformers

Classified user reviews on BeerAdvocate using natural language processing (NLP) techniques. A comprehensive analysis of several domain specific features such as TFIDF, LDA and Doc2Ver in combination with classifiers like Multinomial Naive Bayes classifier, Random Forests, OneVsOne, SVMs was performed. Deep learning models such as Bidirectional LSTM and BERT with learnt tokenization and embeddings were also analysed. The BERT model with smart padding emerged as the top performer, showcasing its ability to generalize well across diverse domains like beer reviews.

Find Out More

Exploratory Data Analysis and Visualisation

Wine Market Competitor Analysis

Python

Selenium

Pandas

Matplotlib

Seaborn

Explored the wine market through competitor websites, centering on wine attributes (type, origin, vintage, ABV), pricing (75cL bottle), and reviews (quantity and scores), aiming to uncover popular products and price ranges. Data collection was done using BeautifulSoup and Selenium while exploratory data analysis and visualisation was performed using Matplotlib and Seaborn.

Find Out More

Predictive Analytics for Risk Management

Forecasting Credit Card Default

Python

Pandas

Matplotlib

Seaborn

Predicted credit card default likelihood for a bank's customers and determine key drivers for credit approval decisions. Developed a MVP with logistic regression to establish a baseline. Addressed data complexities using extensive EDA, feature engineering, and class-balanced sampling. Optimized model performance using hyperparameter tuning and Youden's J Statistic, to select the best model based on ROC-AUC.

Find Out More

Spatio-Econometric Methods and Machine Learning Models

Predictive Analysis of Airbnb Prices in European Cities

Python

Pandas

Tensorflow

Sklearn

A comprehensive examination of Airbnb prices in popular European cities, utilizing spatio-econometric methods to predict listing prices based on various attributes was conducted. After data cleaning and intial EDA, multiple machine learning models including Decision Trees, Random Forest, and XGBoost were used to predict pricing. Neural networks such as Multi Layer Perceptrons and Autoencoders were also employed. Through comparative analysis, the XGBoost model, with feature selection, emerged as the top performer, offering valuable insights for pricing strategies and investment decisions for Airbnb hosts.

Find Out More

Strategic Business Analytics: Data driven decision making

Consulting Report: Addressing VFS Global's Key Challenges

This consulting report outlines data-driven solutions to tackle VFS Global's key challenges: visa application processing delays, data security, and customer service issues. Through strategic analysis and predictive modeling, solutions were proposed, including optimization of processing times, implementation of anomaly detection for security, and AI-driven chatbots for enhanced customer support. A comprehensive innovation roadmap was provided to guide VFS Global in implementing these solutions.

Find Out More

Contact me on LinkedIn