LUCIENDMENSAH

Projects

Data Science

Airbnb Data Analysis Project

This project is a refinement of a final exam I had for a programming class. The program takes in Airbnb CSV files and processes the hosting info, room info, the price, and the types of rooms. I gained experience with Pandas for data processing and matplotlib for data visualization in this project. Future goals for this project are to add interactive visualizations.

Natural Language Processing

Trans* Twitter Analysis

This repository is a Computational Linguistics project exploring how Twitter users use language and neologisms to express gender freedom. The code for this project will support a research paper exploring these themes. I use regex, Pandas, Matplotlib within this project to reach conclusions and display visualizations. As I continue to work on this, I also hope to include exploration of n-grams within the tweets and possible machine learning to gauge user sentiment.

Machine Learning

Perceptron Learning Algorithm

This repo is my implimentation of a PLA from scratch. It is a classification algorithm, and so this project picks random data points, has an established target function, and a predicted function uses PLA to, hopefully, converge with the target function. If the predicted function converges accurately with the target function, this would mean that it is accurately classing each of the points as positive or negative. My next goal with the project is to use PLA in an applicable data science situation.

Text Generation

This project uses an LSTM to try and generate text of Akan folktales.

Florida Education Analysis

This project looks at women's education rates in Pinellas, Pasco, and Hillsborough counties, practices ML by training a model to predict, given a woman in a specific county and age range, the likihood that they would obtain a certain degree.

Digital Humanities Projects

Newcomb Technology Website

As the Digital Research Internship team aims to improve their technical knowledge, I created the GitHub website of all of our projects and team membership to increase our online presence.

ViaNolaVie Article

This curation of articles was created to recognize the deep ties with AAVE and Black culture into the mainstream and point out the linguistic influence within New Orleans and bring recognition to it.

Maternal Child Health

Birth Outcomes Mapping

A Tableau mapping visualization with the goal of discovering the disparities of Black birthing people within New Orleans

Changing Representations

Mapping Project

Data Visualization utilizing Tableau for the Changing Representations project, mapping the rise of Latina Politicians in the US throughout time.

Project Website

A website created as a part of the DRI team for a digital project that maps the history of Latinas who have run for public office, served in various government roles, as well as created women-centered political organizations. This digital project aims to create greater awareness of the history of Latinas in American politics across the United States.

Publications

Twi Research Papers

Contextual Text Embeddings for Twi

Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this research work is the development of several pretrained transformer language models for the Akuapem and Asante dialects of Twi, paving the way for advances in application areas such as Named Entity Recognition (NER), Neural Machine Translation (NMT), Sentiment Analysis (SA) and Part-of-Speech (POS) tagging. Specifically, we introduce four different flavours of ABENA -- A BERT model Now in Akan that is fine-tuned on a set of Akan corpora, and BAKO - BERT with Akan Knowledge only, which is trained from scratch. We open-source the model through the Hugging Face model hub and demonstrate its use via a simple sentiment classification example.

English-Twi Parallel Corpus for Machine Translation

We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.

NLP for Ghanaian Languages

NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then present the NLP Ghana organization and outline its aims, scope of work, some of the methods employed and contributions made thus far in the NLP community in Ghana.

Queer Linguistics

Stay tuned! These are coming soon!!