Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this:
-
Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district.
-
The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries).
-
The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1).
-
Rinse repeat until the tables and graphics meet QA/QC and satisfy the client.
-
Write report incorporating tables and graphics.
-
Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a “RECALCULATE” button, unless specifications change.
At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out… I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools.
Thanks!
PS:
Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ .RData
suffix) and scripts (.R
suffix). Make uses timestamps to check dependencies, so if you touch ss07por.csv
, it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback!
http://www.gnu.org/software/make/manual/html_node/index.html#Top
R=/home/wsprague/R-2.9.2/bin/R persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R $R --slave -f ImportData.R persondata.Munged.RData : MungeData.R persondata.RData Functions.R $R --slave -f MungeData.R report.txt: TabulateAndGraph.R persondata.Munged.RData Functions.R $R --slave -f TabulateAndGraph.R > report.txt