Projects | Sonalisingh

project highlights

Please see GitHub for detailed code.

social media network plot

Generated an artificial social network and created a community membership matrix, connected probability matrix, adjacency matrix, and network plot to model connections. For the network plot, analyzed expected and observed number of members for each sub-community.

Language: Python

Libraries: seaborn, matplotlib, numpy, networkx

relating meat consumption and CO2 emissions

Designed a linear regression for predicting a country's CO2 emissions given the percentage of meat consumption. Implemented k-means clustering to group countries based on interpretable ranges of CO2 emissions. Incorporated interactive functionality within ggplot to highlight top-GDP countries for added dimensionality.

Language: R

Libraries: ggplot

cost + schedule risk dashboard

Aggregated 42 discrete multi-million-dollar cost models into a cost summary database to track and visualize summaries for internal and client use. Data was used as the foundation for 3+ long term client deliverables (12+ months) and used to support a cost summary dashboard for client and stakeholder use.

Language: PowerBI, Excel

who knows who?

Created an adjacency matrix for a sample social network in order to identify closest neighbors via Floyd Warshall algorithm. Further analyzed common neighbors and similarity among members via Jaccard similarity scores and Adamic Adar indices. Visualized findings via heatmaps.

Language: Python

Libraries: pandas, matplotlib, numpy, scipy, sklearn, networkx

salinity tolerance in common biofuels

Analyzed 30,000+ genes in P. hallii, a lab-friendly alternative to the common biofuel Switchgrass, to test for resilience to extreme conditions, specifically high salinity tolerance. Separated primer sequences within P. hallii DNA and filtered low quality data based on visualizations for each sample. Findings used to support a graduate thesis.

Language: Python, R

Libraries: ggplot

beating wikiracer

Designed an algorithm to find the shortest path between two pages in the Wikipedia using DFS, BFS, Dijkstra's, and A*.

Language: Python

Libraries: pandas, urllib, collections

boggle

Designed a randomly generated boggle game board with an interactive user interface and a correct backend answer key. Implemented BFS and DFS search methods, hash mapping, and trie dictionaries to find all valid words on each unique boggle game board to support the correct backend answer key.

Language: Python

Libraries: collections, numpy

how does a city's size affect poverty?

Performed a PCA of an external dataset across 2 components. Plotted PC1 and PC2 and checked for outliers. Incorporated interactive functionality with ggplot to highlight counties and enhance interpretability within clustered data.

Language: R

Libraries: ggplot, colorspace

movie review sentiment classifier

Designed a neural sentiment classifier that trained a model of 6000+ movie reviews in order to predict positive or negative sentiment for an unlabeled test set. Used both perceptron and logistic regression for training, as well as unigram, bigram, and n-gram for bag of words vectors.

Language: Python

Libraries: torch, numpy, nltk

mario kart image classifier

Designed an convolution neural net to classify Mario Kart screenshots into 6 classes based on a training set of 21,000 images. Logged training loss at every iteration and the training and validation accuracy at each epoch in order to optimize hyperparameters (e.g., type of loss function, learning rate, number of layers, d_model size).

Language: Python

Libraries: torch, numpy

transformer language model

Designed a transformer, both from scratch and using the torch.nn built in transformer, that predicted the next letter of a phrase given some/no context. Optimizations to enhance performance included adding batching and fine tuning hyperparameters (e.g., type of loss function, learning rate, number of heads in multi-head attention, number of layers, d_model size).

Language: Python

Libraries: torch, numpy

who wrote this blog?

Classified a blog’s political orientation based on its connection within a blog citation network. Compared various clustering techniques, including k-means, hierarchical, mean shifting, and spectral (ultimate choice).

Language: Python

Libraries: pandas, urllib, collections