CMDA Student Capstone Projects
In the Computational Modeling and Data Analytics (CMDA) Capstone Project course at Virginia Tech, teams of three or four students spend the semester tackling an open-ended, Client-driven project. CBHDS has been honored to participate as the Client for four CMDA project teams in Fall 2020, Spring 2021, Fall 2021, and Spring 2022. In addition to the technical aspects of the project, students are mentored by CMDA Faculty in teamwork, project management, and technical leadership. Through the lens of their particular projects, the teams also consider the ethical aspects of data science and mathematical modeling.
Expand the links below to read about the Teams' investigations!
- Our project objectives were to study the impact of COVID-19 on the Virginia real estate market and explore inter- and intra-county mobility trends. We were able to answer the following research question: How did COVID-19 impact home values and mobility trends throughout Virginia?
- Analytic methods: Linear Regression and Spatial Regression
- Visual Analytics: An interactive application with Choropleth Maps and Time Series Plots
This project started off very open ended, and we were responsible for coming up with our own research question. With the help of our client and coach, we decided to focus on home values, COVID-19 cases, and mobility trends in Virginia at the county level. From there, our group worked together to produce a final product, while meeting with our client and coach on a weekly basis to brainstorm ideas and go over our progress.
Although we had experience working with data in the past, we ran into some novel concepts such as spatial regression and creating dashboards that took some time to get accustomed to. This project was one of our first experiences working on a team for an extended period of time, and it helped us better understand the importance of team dynamics and time management. Overall, working with CBHDS was a great experience, and we learned a lot throughout the process.
Project Team Members:
- Yohannes Afework, B.S. in Computational Modeling and Data Analytics, May 2021
- Devon Lee, B.S. in Computational Modeling and Data Analytics, May 2021
- Jaffar Shaik, B.S. in Computational Modeling and Data Analytics, May 2021
- Naod Teklie, B.S. in Computational Modeling and Data Analytics, May 2021
- How do remediation measures affect species spread?
- Analytic methods: generalized linear models and classification trees
- Visual analytics: an interactive application to demonstrate the effect of various parameters and conditions
Our CMDA capstone project began with the unexpected challenge of working around COVID-19, but ultimately the project provided us with an entirely unique and informative experience. Extensive collaboration on such a large project was difficult over solely the internet. However, with the guidance of Dr. Alexandra Hanlon and Jennifer West, we became familiar with working efficiently online. Dr. Hanlon and Jennifer were extremely helpful in advising us and providing us with the resources to succeed. We couldn’t have done this without them!
For as unusual as this semester has been, it went by extremely fast. Our entire team feels like we just became acquainted with the CBHDS team. This seemingly short experience has provided us with valuable skills in not only data science, but more so in the importance of teamwork and communication. It was exciting to be able to apply what we have learned in the classroom to a real world problem, and a privilege to do so under the guidance of our mentors.
Photo credit: Ian Trueman, University of Wolverhampton, Bugwood.org
- How has COVID-mandated social distancing affected United States mental health outcomes at the State level?
- Visual analytics: an interactive application to demonstrate the change in Mental Health Scores week by week during the pandemic
Over the course of the semester, Team New Horizons has been working with CBHDS to study the effects of social distancing on mental health for their Computational Modeling and Data Analytics (CMDA) capstone project. The team consists of Jeff Straw, Demory Williamson, Xumanning Luo, and Bella Marku, all of whom are seniors in CMDA. Their primary motivation for choosing the topic was the opportunity to learn more about how the pandemic has affected people emotionally, not just physically.
The project incorporated both mobility data from Google and mental health data from the CDC to determine how mental health symptoms changed in relation to the amount of time spent at home, as compared to a baseline from before the pandemic began. To display their results, the team developed an interactive web dashboard where users can click on states to obtain that state’s specific mental health results since the start of the pandemic. The biggest challenge the team faced was finding publicly available data, especially since the pandemic is still ongoing, but the datasets they incorporated allowed them to create a comprehensive dashboard that provides significant information for its users.
- Can we harmonize and compile several years of NHANES data and use it to predict diabetes?
- Approaches: in-depth study of multiple years of messy survey data, manual harmonization and data manipulation
- Analytic methods: stepwise logistic regression
Team's Story (FALL 2021)
This project was somewhat different from traditional CMDA projects as its main objective was to tackle a problem which, while common, is not often discussed in class. Typically, a student begins their analytic process with a clean data set to which they can immediately apply whatever summary or analytic method they choose. This is rare in practice, which more commonly begins with a very disorganized collection of data that needs to be transformed into a usable form. For instance, some variable names correspond to different questions in different years, and the only way to resolve these discrepancies is with meticulous study.
After this impressive step, the data was still plagued with issues such as missing data, multicollinearity, and potential sampling bias. The team resolved these via complete case analysis and stepwise logistic regression, and were able to predict diabetes in the sample with a high level of accuracy. This project taught the team the value of careful study and exposed them to an often underappreciated dimension of data analysis.
- Can we build on the work done by team Diet Code and expand the predictive modeling of cleaned NHANES data to additional metabolic outcomes?
- Analytic methods: multiple imputation through chained equations, stepwise linear and logistic regression, random forests, neural networks.
- This team was recognized by Mr. & Mrs. Mark and Nancy Scheffel with an honorary sponsorship of their project.
Team's Story (SPRING 2022)
Due to the scope of organizing and cleaning the NHANES data, team diet Code from Spring 2021 was limited in their ability to analyze their data and model diabetes. Building on that work, team Diet Science used the cleaned NHANES data to predict the presence of hypertension and obesity, and to model the amount of LDL cholesterol, a known risk factor for a number of metabolic diseases.
The team first resolved the problem of missing data by sneering that the missingness was at random and then using multiple imputation by chained equations. After imputation, they modeled their outcomes using a combination of stepwise regression, random forests, and neural networks. Of note, the team was able to greatly improve the performance of their initial neural networks by iteratively expanding and refining those models, a process requiring much determination and skill.