RCDS AD

More Modifications to the Website! Hello Friends! We made some final changes to the Hugo website, updated some of our pages, and added the new ad below. We look forward to adding workflows, applications, and datasets over the next few weeks! Until next time…

Aroma Project

Hello Friends, I’ve decided to post materials from our recent paper published at http://www.jopan.org/article/S1089-9472(16)30334-3/fulltext. This combination manuscript, code, and other materials comprise an ‘RMarkdown’ document (check the code oout), in which all elements of the project are in one central ‘.Rmd’ document. I actually did this project while in Afghanistan and Iraq last year. Having the text, bibliography, raw data, code, and everything integrated into one document provides an unprecedented way to track and manage the project, especially when you’re doing it in between war zones!
Read More…

Example FAERS Project Setup

Downloading data and working with large data sets can be a challenge, especially when the data require aggregation. Overtime column names or file naming conventions may change, which lead to potential errors in the aggregation process. Other data sets involve so many files that it is impractical to forego automation. In the biomedical sciences, there are many publically available Databases, Resources & APIs at the U.S. National Library of Medicine and U.
Read More…

Probability Distribution

Wow, more than a month passed since my last blog! I’ve been busy working on text mining and mapping free text to medical ontologies. The text mining utilizes heuristic and probabilistic approaches, so I decided to take a step back to review probability theory and distribution of probabilities. Below, I provide a list of discrete and continuous probability functions with links to R documentation, and look at few probability distributions (normal, geometric, and binomial).
Read More…

Linear Regression

Linear regression is a very simple approach for supervised learning in order to find relationship between a single, continuous variable called dependent (or target) variable (i.e. numeric values, not categorical or groups) and one or more other variables (continuous or factor/groups) called independent variables. Thus, it can be used to predict a quantitative response. At least 5 cases per independent variable in the analysis is required. Many of the more sophisticated statistical learning approaches are in fact generalizations or extensions of linear regression.
Read More…

My Library

Here are the books I’ve acquired over the last two years. After dabbling from 2012 to 2013, I started buying books. Many of the ones toward the end of the list you may find hardly worth your time. The web design, Java, and other similar books were typically obtained because they were free or inexpensive and I needed some quick solutions. I plan to update this library, web content, and I’ll add key manuscripts as time goes by.
Read More…

The Data Table Package

Data Frames First let’s look at the ‘data.frame’. The data.frame is the bread and butter of R. It is an intuitive way to organize data into rows and columns, and subsetting is very straight forward. Organization is critical to a data.frame. Data.frames require that each row and column are the same length, however, they may contain differing types of data. Thus, it is similar to both a matrix and a list.
Read More…

My Adventures in Data Science

Welcome to my blog! My name is Jonathan. Since 2007, I’ve worked as a Biochemistry Officer in the Army. My graduate training was in cellular and molecular pathology, but I’ve conducted research in many areas to meet the Army mission. I’ve mentored graduate medical education residents and fellows in basic and clinical research in the areas of trauma, maternal fetal medicine, reproductive biology, and general medicine. I’ve also served as a Program Director and a Deputy Commander at a military research lab.
Read More…