Student Projects
CMDA Student Capstone Projects
In the Computational Modeling and Data Analytics (CMDA) Capstone Project course at Virginia Tech, teams of three or four students spend the semester tackling an open-ended, client-driven project. In addition to the technical aspects of the project, students are mentored by CMDA Faculty in teamwork, project management, and technical leadership. Through the lens of their particular projects, the teams also consider the ethical aspects of data science and mathematical modeling.
Expand the links below to read about the Teams' investigations!
- What are the genes responsible for the defense of the Bacterial Leaf Streak pathogen?
- Analytic methods: RNA-sequencing, normalization, pairwise comparisons
- Visual analytics: heat maps
Team's Story
Our project aimed to examine the specific genes responsible for the defense of a tobacco plant with an R (defense) gene when signaled by an effector gene in the pathogen. Our clients and their team set up an experiment with three groups (Baseline, T, and NT) to single out specific genes expressed in the T group. We used RNA-Sequencing to process/clean the mRNA samples and align them to the tobacco plant reference genome and count the occurrences of all the genes in each sample. Then through statistical analyses (including normalization and pairwise comparisons), we found the significantly differentially expressed genes and created heatmaps and volcano plots to visualize differences in gene expression levels between treatments.
At first our team felt fairly overwhelmed by the amount of new information we would have to learn. None of us had any experience in biology and had never taken any classes and so we went in expecting this project to be constant reading and researching. Luckily though, our sponsor Dr. Zhang helped us a lot in teaching us the basics of what we would need to know for our project, giving us presentations, talking us through our many questions and giving us other resources such as articles to read through to get a better understanding.
Project Team Members:
- Alex Vidal, B.S. in Computational Modeling and Data Analytics, May 2024
- Yelebe Desta, B.S. in Computational Modeling and Data Analytics, May 2024
- Ian Sekelesky, B.S. in Computational Modeling and Data Analytics, May 2024
CBHDS Sponsors:
Learning the tools was also difficult, getting accustomed to the LinuxOS in the VT ARC system, how to write bash scripts and how high-performance computing works was also hard but fun to learn. We gained a lot of knowledge and experience in the world of Bioinformatics and even though we probably won’t be going into the field directly, it did give us a lot of confidence in knowing that we were able to learn, understand, and produce results in a field we previously thought was so alien and confusing. Again, we are super thankful for Dr. Zhang, and Tanner Barbour for helping us out through the project, we couldn’t have done it without y'all.
- Is there a pattern to the spread of Salmonella?
- Analytic methods: comparative genomic analysis
- Visual analytics: robust phylogenetic trees
Team's Story
The main objectives of our project were to retrieve genomic data for over 100 strains of Salmonella enterica and use bioinformatic tools to extract and compare their gene families. We performed comparative genomic analysis on these genomes to identify genetic similarities and differences to determine evolutionary relationships. All of this work led to the creation of a robust phylogenetic tree visualization that was able to show common ancestors of the genomes as well as geographic transmission patterns. This tree gives insight into evolutionary trends of Salmonella and gives a clear visualization of how our collected genomes were related to one another.
Our team was initially hesitant in approaching this project when reading its description and seeing new research methods that we had never heard of. However, in our first Zoom meeting with the CBHDS we were walked through an overview of the project and learned many of the biology-related terms related to the project. We immediately became interested in the work that we had to do because we knew the process and the finish line.
Project Team Members:
- Judson Powers, B.S. in Computational Modeling and Data Analytics, May 2024
- Nicholas Emig, B.S. in Computational Modeling and Data Analytics, May 2024
- Siddarth Ravikanti, B.S. in Computational Modeling and Data Analytics, May 2024
CBHDS Sponsors:
It was frustrating when we failed at using the bioinformatic tools, but it was rewarding when we were able to run them correctly with help from our Dr. Zhang. Finally, seeing our results and the final phylogenetic tree gave a unique insight into the concept of evolution, as we were able to see specifically how all our collected strains of Salmonella were related to each other, and how certain strains evolved over time to create new, unique strains.
Working on a bioinformatics project was a first for the CMDA Capstone program, and we are very glad we had the chance to be the trailblazers for this unique project and experience.
- Can we better understand the dominating preference for ultra-processed foods over minimally processed foods in America?
- Analytic methods: Linear mixed effects modeling
Team's Story
Working as a group was tough at times but with help from one another along with the support from Dr. Ahrens and Ms. Lozano we were able to work through it. The experience was something that we never had before in a normal classroom setting. The project allowed us to get a taste of what it would be like in the real world in terms of large-scale projects. In the end, although there was uncertainty at times, we were proud of our final product and the hard work made it that much sweeter.
Project Team Members:
- Jacob Parker, B.S. in Computational Modeling and Data Analytics
- Sabrina Hart, B.S. in Computational Modeling and Data Analytics
- Aaron Ni, B.S. in Computer Science and B.S. in Computational Modeling and Data Analytics
- Anish Monokonda, B.S. in Computational Modeling and Data Analytics
CBHDS Sponsors:
- Can we find demographic and food perception factors that predict overall food preference across respondents of a national survey? Additionally, do people prefer ultra-processed foods over minimally processed foods overall?
- Analytic methods: Linear mixed effects modeling
- This team was recognized by Mr. Brian Sanchez and Ms. Nancy Schuessler with an honorary sponsorship of their project.
Team's Story
Being a part of a new study, Big Byte Analytics had the opportunity to tackle the poor diet issue caused by ultra-processed foods (UPFs) in the United States. We were interested in finding what factors lead to increased UPF intake using survey data and nutrition information data. Over the course of the project, we have faced significant delays in data collection and we unfortunately were unable to conduct analysis on the true nutrition information of the foods featured in the survey. We hope that future teams can use this data along with our cleaned data and results to answer our original research question.
Regardless of our setbacks, we have found some meaningful results that we believe will serve as a great first step towards finding a solution to the poor diet problem in the United States. We have found that an individual’s age and perception of food (i.e. how healthy they think the food is, perceived calorie count, etc.) has significant influence on food preference. We have also found that most people do not have a particular preference between UPFs and minimally-processed foods and concluded that high UPF intake could also be strongly influenced by factors such as low cost, accessibility, advertising, and lack of health/food knowledge.
Project Team Members:
- Laura Nury, B.S. in Computational Modeling and Data Analytics (CryptoCyber Option, Minor in Mathematics), May 2023
- Rithvik Guntor, B.S. in Computational Modeling and Data Analytics (Minor: Mathematics and Computer Science), December 2022
- Renny Adjei, B.S. in Computational Modeling and Data Analytics (Minor: Statistics and Mathematics), May 2023
CBHDS Sponsors:
- Can we build on the work done by team Diet Code and expand the predictive modeling of cleaned NHANES data to additional metabolic outcomes?
- Analytic methods: multiple imputation through chained equations, stepwise linear and logistic regression, random forests, neural networks.
- This team was recognized by Mr. & Mrs. Mark and Nancy Scheffel with an honorary sponsorship of their project.
Team's Story
Due to the scope of organizing and cleaning the NHANES data, team diet Code from Spring 2021 was limited in their ability to analyze their data and model diabetes. Building on that work, team Diet Science used the cleaned NHANES data to predict the presence of hypertension and obesity, and to model the amount of LDL cholesterol, a known risk factor for a number of metabolic diseases.
The team first resolved the problem of missing data by sneering that the missingness was at random and then using multiple imputation by chained equations. After imputation, they modeled their outcomes using a combination of stepwise regression, random forests, and neural networks. Of note, the team was able to greatly improve the performance of their initial neural networks by iteratively expanding and refining those models, a process requiring much determination and skill.
Project Team Members:
- Colin Brant, B.S. in Computational Modeling and Data Analytics, May 2022
- Thomas Stapor, B.S. in Computational Modeling and Data Analytics, May 2022
- Visvas Kaja, B.S. in Computational Modeling and Data Analytics, May 2022
CBHDS Sponsors:
- Ian Crandell
- Alexandra Hanlon
- Can we harmonize and compile several years of NHANES data and use it to predict diabetes?
- Approaches: in-depth study of multiple years of messy survey data, manual harmonization and data manipulation
- Analytic methods: stepwise logistic regression
Team's Story
This project was somewhat different from traditional CMDA projects as its main objective was to tackle a problem which, while common, is not often discussed in class. Typically, a student begins their analytic process with a clean data set to which they can immediately apply whatever summary or analytic method they choose. This is rare in practice, which more commonly begins with a very disorganized collection of data that needs to be transformed into a usable form. For instance, some variable names correspond to different questions in different years, and the only way to resolve these discrepancies is with meticulous study.
After this impressive step, the data was still plagued with issues such as missing data, multicollinearity, and potential sampling bias. The team resolved these via complete case analysis and stepwise logistic regression, and were able to predict diabetes in the sample with a high level of accuracy. This project taught the team the value of careful study and exposed them to an often underappreciated dimension of data analysis.
Project Team Members:
- Lauren Bradley, B.S. in Computational Modeling and Data Analytics, May 2022
- Evan Briscoe, B.S. in Computational Modeling and Data Analytics, December 2021
- Jake Lavitt, B.S. in Computational Modeling and Data Analytics, December 2021
CBHDS Sponsors:
- Ian Crandell
- Xin Xing
- Alexandra Hanlon
- Our project objectives were to study the impact of COVID-19 on the Virginia real estate market and explore inter- and intra-county mobility trends. We were able to answer the following research question: How did COVID-19 impact home values and mobility trends throughout Virginia?
- Analytic methods: Linear Regression and Spatial Regression
- Visual Analytics: An interactive application with Choropleth Maps and Time Series Plots
Team's Story
This project started off very open ended, and we were responsible for coming up with our own research question. With the help of our client and coach, we decided to focus on home values, COVID-19 cases, and mobility trends in Virginia at the county level. From there, our group worked together to produce a final product, while meeting with our client and coach on a weekly basis to brainstorm ideas and go over our progress.
Although we had experience working with data in the past, we ran into some novel concepts such as spatial regression and creating dashboards that took some time to get accustomed to. This project was one of our first experiences working on a team for an extended period of time, and it helped us better understand the importance of team dynamics and time management. Overall, working with CBHDS was a great experience, and we learned a lot throughout the process.
Project Team Members:
- Yohannes Afework, B.S. in Computational Modeling and Data Analytics, May 2021
- Devon Lee, B.S. in Computational Modeling and Data Analytics, May 2021
- Jaffar Shaik, B.S. in Computational Modeling and Data Analytics, May 2021
- Naod Teklie, B.S. in Computational Modeling and Data Analytics, May 2021
CBHDS Sponsors:
- Ian Crandell
- Alicia Lozano
- Alexandra Hanlon
- How has COVID-mandated social distancing affected United States mental health outcomes at the State level?
- Visual analytics: an interactive application to demonstrate the change in Mental Health Scores week by week during the pandemic
Team's Story
Over the course of the semester, Team New Horizons has been working with CBHDS to study the effects of social distancing on mental health for their Computational Modeling and Data Analytics (CMDA) capstone project. The team consists of Jeff Straw, Demory Williamson, Xumanning Luo, and Bella Marku, all of whom are seniors in CMDA. Their primary motivation for choosing the topic was the opportunity to learn more about how the pandemic has affected people emotionally, not just physically.
The project incorporated both mobility data from Google and mental health data from the CDC to determine how mental health symptoms changed in relation to the amount of time spent at home, as compared to a baseline from before the pandemic began. To display their results, the team developed an interactive web dashboard where users can click on states to obtain that state’s specific mental health results since the start of the pandemic. The biggest challenge the team faced was finding publicly available data, especially since the pandemic is still ongoing, but the datasets they incorporated allowed them to create a comprehensive dashboard that provides significant information for its users.
Project Team Members:
- Xumanning Luo, B.S. in Computational Modeling and Data Analytics, May 2021
- Bella Marku, B.S. in Computational Modeling and Data Analytics, May 2021
- Jeff Straw, B.S. in Computational Modeling and Data Analytics, May 2021
- Demory Williamson, B.S. in Computational Modeling and Data Analytics, May 2021
CBHDS Sponsors:
- Ian Crandell
- Kevin McKee
- Alexandra Hanlon
- How do remediation measures affect species spread?
- Analytic methods: generalized linear models and classification trees
- Visual analytics: an interactive application to demonstrate the effect of various parameters and conditions
Team's Story
Our CMDA capstone project began with the unexpected challenge of working around COVID-19, but ultimately the project provided us with an entirely unique and informative experience. Extensive collaboration on such a large project was difficult over solely the internet. However, with the guidance of Dr. Alexandra Hanlon and Jennifer West, we became familiar with working efficiently online. Dr. Hanlon and Jennifer were extremely helpful in advising us and providing us with the resources to succeed. We couldn’t have done this without them!
For as unusual as this semester has been, it went by extremely fast. Our entire team feels like we just became acquainted with the CBHDS team. This seemingly short experience has provided us with valuable skills in not only data science, but more so in the importance of teamwork and communication. It was exciting to be able to apply what we have learned in the classroom to a real world problem, and a privilege to do so under the guidance of our mentors.
Project Team Members:
- Evan Mitchell, B.S. in Computational Modeling and Data Analytics, May 2021
- Colton Mumley, B.S. in Computational Modeling and Data Analytics, December 2020
- Akshay Patel, B.S. in Computational Modeling and Data Analytics, May 2021
CBHDS Sponsors:
- Jennifer West
- Alexandra Hanlon