这是因为write_dta()
没有compress
。即,write_dta()
经常选择过大的数据存储类型。下面是我工作中的一个极端但真实的例子。(文件名和 varnames 已编辑。)
注意文件大小。它从 1 Mb 减少到 6 kb。尺寸减少 99.4%。真实的数据集实际上有数百万个观察结果——所以我很难将其转换为dta
使用write_dta()
. 可能需要在ReadStat
级别上进行调整。
. desc, size
Contains data from v1.dta
obs: 100
vars: 22 04 Sep 2019 10:19
size: 1,032,900
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
var1 double %10.0g
var2 str1 %-9s
var3 double %td
var4 double %td
var5 str4 %-9s
var6 str1 %-9s
var7 str2045 %-9s
var8 str2045 %-9s
var9 str2045 %-9s
var10 str2045 %-9s
var11 str2045 %-9s
var12 str5 %-9s
var13 double %10.0g
var14 double %td
var15 double %10.0g
var16 str3 %-9s
var17 double %10.0g
var18 double %10.0g
var19 double %10.0g
var20 double %10.0g
var21 double %10.0g
var22 str2 %-9s
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
r; t=0.00 10:27:24
. compress
variable var1 was double now long
variable var3 was double now int
variable var4 was double now int
variable var14 was double now int
variable var17 was double now byte
variable var18 was double now long
variable var19 was double now byte
variable var20 was double now byte
variable var7 was str2045 now str1
variable var8 was str2045 now str1
variable var9 was str2045 now str1
variable var10 was str2045 now str1
variable var11 was str2045 now str1
(1,026,700 bytes saved)
r; t=0.00 10:27:34
. desc, size
Contains data from v2.dta
obs: 100
vars: 22 04 Sep 2019 10:19
size: 6,200
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
var1 long %10.0g
var2 str1 %-9s
var3 int %td
var4 int %td
var5 str4 %-9s
var6 str1 %-9s
var7 str1 %-9s
var8 str1 %-9s
var9 str1 %-9s
var10 str1 %-9s
var11 str1 %-9s
var12 str5 %-9s
var13 double %10.0g
var14 int %td
var15 double %10.0g
var16 str3 %-9s
var17 byte %10.0g
var18 long %10.0g
var19 byte %10.0g
var20 byte %10.0g
var21 double %10.0g
var22 str2 %-9s
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
r; t=0.00 10:27:37