Theoretically, the package 'filehash' makes R handle a large dataset by allowing a hard-disk space instead of a ram area for a dataset loading. I've tested this package with a 1G Stata-format dataset. It didn't work well. Anyway, here is howto:
(1) Install 'filehash'
> install.packages('filehash')
> library(filehash)
(2) Set an environment for the large dataset you'd like to use
> dumpDF(read.csv("largedata.csv"), dbName="dbname")
> envname <- db2env(db="dbname")
(3) Analyze with the environment
> with(envname, lm(y~x))
* envname & dbname can be any name you like.
filehash manual; howto by Yu-Sung Su
No comments:
Post a Comment