Workflow for statistical analysis and report writing

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this:

  1. Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district.

  2. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries).

  3. The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1).

  4. Rinse repeat until the tables and graphics meet QA/QC and satisfy the client.

  5. Write report incorporating tables and graphics.

  6. Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a “RECALCULATE” button, unless specifications change.

At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out… I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools.

Thanks!

PS:

Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ .RData suffix) and scripts (.R suffix). Make uses timestamps to check dependencies, so if you touch ss07por.csv, it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback!

http://www.gnu.org/software/make/manual/html_node/index.html#Top

R=/home/wsprague/R-2.9.2/bin/R

persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R
   $R --slave -f ImportData.R

persondata.Munged.RData : MungeData.R persondata.RData Functions.R
      $R --slave -f MungeData.R

report.txt:  TabulateAndGraph.R persondata.Munged.RData Functions.R
      $R --slave -f TabulateAndGraph.R > report.txt

14 Answers
14

Leave a Comment