Project Portfolio
Bram Stone
Data Exploration
Mapping, Visualization, and Data Exploration: US Coal Production
Repository, Project Narrative (Rmarkdown)
This project utilizes the ggplot
, maps
, and the chloroplethr
R packages.
Visualization and mapping of trends in coal mining in the United States from 2000 to 2016. Raw data were obtained in txt
files from the Mine Safety and Health Administration (MSHA), which is under the US Department of Labor. This project demonstrates trends in coal over several decades, identifying patterns in production and employment. These data are intended to be integrated into a larger EPA dataset linking environmental and economic variables together, and are used in another project to build a predictive model frequency of mining-related accidents.
Predictive Modeling
Predicting US Mining Accidents
Repository, Project Narrative (Jupyter Notebook)
This project utilizes the numpy
, pandas
, and matplotlib
python libraries.
The goal of this project is to predict the frequency of mining-relateda accidents in the United States. Raw data were obtained in txt files from the Mine Safety and Health Administration (MSHA), which is under the US Department of Labor. Accidents occurrences are given from 2000 to 2017.
Statistical Programming
Creating Ecological Networks Microbial Community Datasets
Repository, Project Vignette (pdf)
This project utilizes the vegan
package and is meant to be used to create network graphs with the igraph
package in R.
The function in this project is an implementation of Lallich et al.’s 2006 algorithm to reduce type I error (the false discovery rate) by selecting a subset of correlations with values deemed suitable or interesting by the researcher. The function writes its output in a data frame format suitable for graphing with the igraph package. Currently, work is being done to incorporate other methods for controlling false discovery.
Lallich, S, O Teytaud, E Prudhomme. Association rule interestingness: measure and statistical validation. Hamilton, G (ed.) Quality measures in data mining. 2006. Springer.
Quantifying Species Shared Between Microbial Communities
Repository
This projet utilizes the wrswoR
package for weighted resampling without replacement, and the parallel
and foreach
packages for parallel computation In R.
The function in this project is an re-implementation from Chase’s 2011 paper on calculating probabilistic species sharing patterns between communities using Raup-Crick associations. The original code provided by Chase utilized several nested for
loops which have been vectorized and a framework for parallel computation has been added (although this will only work on Windows machines). In addition, this function performs weighted resampling without replacement using the wrswoR
package which is faster than base R’s function to accomplish the same task. These changes combine to provide a significant speed-up for large datasets often encountered in microbial ecology. A similar function raupcrick
exists in the vegan
package, but this often fails for species-rich microbial datasets.