Big Data and Health Projects and Collaborators

Current Projects

1940 Census Linkage Project:  Examining the long-term effects of childhood social condition on health

The overarching goal of the “Alcoa Study” is to explain the disparate health and economic trajectories demonstrated in a cohort of 40,000+ men and women who work at a multisite manufacturing company as they progress from work-life into retirement. Outcomes of focus are incident chronic disease, work-disability, retirement assets and mortality. The aim of the linkage project is to link a subset of the Alcoa cohort to the publically available 1940 census to create a multigenerational database to examine social determinants of health and health disparities intergenerationally.


Mark Cullen, Stanford

David Rehkopf, Stanford


Human Capital and Political Unrest in Egypt

Discussions of the economic and social causes of the Arab Spring often focus on the role of young people, little research has examined the effects of exposure to unrest on other outcomes—particularly on human capital investments that are critically important at this stage of the life course.   We examine the effects of exposure to political unrest on human capital investments for a nationally representative panel of young people in Egypt surveyed before and after the January 25th Revolution. We exploit exogenous temporal and geospatial variation in exposure to unrest by curating and linking a database of newspaper accounts of all political unrest events occurring throughout the transition period in Egypt.


Jenny Liu, UCSF

Maia Sieverding, American University in Beirut


Impact of State and Federal Nutrition Policies on Childhood Obesity

Despite nationwide plateaus and declines in childhood obesity among some groups. For example, obesity rates remain high among Pacific Islander and Native American children. Policy interventions to improve the food environment inside schools—those that regulate the nutritional content of “competitive” foods and beverages and of school meals—are promising approaches that could potentially exert long lasting impacts on childhood obesity. This project will employ a quasi-experimental design and administrative data on fitnessgrams to examine the extent to which changes in regulations of  school meals impacted the obesity rate of Pacific Islander and Native American children


Emma Sanchez, SFSU


Evolution of Alternative Splicing

For decades, it's been hypothesized that changes in gene expression support rapid adaptation and phenotypic divergence.  With empirical expression data increasingly available, it's an exciting time to investigate these classic ideas.  While studies of evolutionary gene expression have confirmed that changes in expression levels underlie some adaptation, there are still many adaptive processes that are not explained.  We hypothesize that changes in alternative splicing may support some processes of rapid adaptation.  We are developing methods to quantify the role of alternative splicing in adaptation. 

Collaborators: Manuel Irimia (Centre for Genomic Regulation), Scott Roy (SFSU), SFSU undergraduates Kasey Arzumanova and Jamie Moon


Acknowledged Programmer Project

Some foundational papers in population genetics and evolution include acknowledgement of non-author programmers or analysts (for example, the acknowledgements includes “I thank X for programming and analyzing all of the computational simulations”).  We are quantifying how often this sort of acknowledgement was given (potentially indicating a shift in authorship practices), and the extent to which women and under-represented minorities served as these “acknowledged programmers.”

Collaborators: Emilia Huerta Sanchez (UC Merced), SFSU undergraduates Edgar Castellanos, Fran Catalan, Samantha Dung, Andrea Lopez, Ezequiel Lopez, Rochelle Reyes, and Win Thu


Estimating false positive rates in forensic identification

Forensic DNA analysis is a powerful quantitative tool for assessing genetic identity.  For cold cases, forensic genetic identification relies on the largest (in number of entrants) genetic databases globally.  These technologies, and their application in the context of very large databases, raise complex questions in both technical population genetic and social impact.  We use statistical genetic and simulation approaches to address the former.  For example, we are currently using a simulation approach to estimate false identification error rates of low template analysis (identification of contributors to low-copy DNA mixtures).

Collaborators: SFSU undergraduate Matt Paunovich