1

I'm trying to wrap my head around density -- what kind of units is it generating?

Some real data is contained below. Suppose log_sample$TS is in seconds. We can plot the density by event:

require(ggplot2) 
ggplot(log_sample, aes(x=TS)) + geom_density(aes(colour=event))

Initial Density

My intuition on the density (events / second estimate) gets rather changed when you adjust the timestamps to make them hours:

 ggplot(log_sample, aes(x=TS/(60*60))) + geom_density(aes(colour=event))

Hours Density

I figured the density should have gone up! events / hour -- but the Y axis got smaller.

What is the Density that comes out of density() from R (which is what's under the covers of geom_density)


Sample Data

    log_sample = structure(list(TS = c(5936781453, 5424106429, 3051226836, 3780602571, 
    4836845109, 5718264549, 879774681, 3693468059, 2007748562, 2504334226, 
    624948758, 5712390144, 3169326817, 2716096605, 1108085248, 5668904375, 
    6559186646, 21095572, 3875508209, 4315196759, 5253007933, 4702915059, 
    6498649004, 5606316102, 3886402298, 2552276252, 6055089961, 87782977, 
    1792383661, 1525444570, 2423674627, 2698516549, 770431980, 2249099432, 
    5560812828, 5140968169, 4938716355, 7446015137, 3697083581, 5000572471, 
    2748254652, 6697149589, 3718191398, 6123529413, 2459883463, 2521530177, 
    5570098130, 4360374786, 311727922, 6026773996, 4889601125, 3358303391, 
    1822623672, 7514080648, 2892349471, 6832359196, 5011293787, 443364160, 
    5220940964, 5254117874, 5337279943, 5208529127, 4180004131, 4053678140, 
    5911956363, 380893281, 2018033389, 842548954, 7497672544, 2724869215, 
    1958679125, 4069038129, 3397592985, 2328548539, 5049321404, 6783632939, 
    1657654904, 2707346266, 892475725, 5327372333, 1037573029, 3319817079, 
    5009282140, 7265205425, 108382115, 5125317279, 2767672973, 158006399, 
    3973921838, 1529684154, 2631744541, 2343000246, 584037151, 2811442843, 
    224371846, 6117606277, 6495065662, 4023007200, 3664433941, 5606111439
    ), event = c("c", "c", "b", "b", "b", "b", "c", "b", "c", "c", 
    "c", "c", "b", "b", "b", "c", "c", "c", "c", "b", "b", "b", "c", 
    "r", "c", "c", "c", "b", "c", "c", "c", "c", "b", "c", "b", "c", 
    "r", "c", "c", "c", "c", "b", "c", "b", "b", "b", "b", "c", "b", 
    "r", "b", "b", "b", "b", "c", "b", "c", "b", "r", "c", "c", "c", 
    "b", "b", "b", "b", "c", "c", "b", "c", "c", "c", "b", "c", "b", 
    "r", "b", "b", "c", "c", "c", "c", "c", "r", "c", "b", "b", "c", 
    "b", "c", "c", "b", "b", "c", "c", "c", "c", "b", "b", "b")), .Names = c("TS", 
    "event"), row.names = c(943411L, 610939L, 1419805L, 794230L, 
    5117419L, 5198213L, 4312722L, 1443299L, 3360370L, 3703742L, 1989592L, 
    2882113L, 2082613L, 2725174L, 39266L, 2553302L, 2920469L, 4938431L, 
    4093867L, 3444703L, 2521564L, 2465041L, 2918392L, 4854160L, 3429030L, 
    3380282L, 953508L, 1639160L, 4017713L, 2022520L, 4369194L, 2391770L, 
    26864L, 1390462L, 4523739L, 4820972L, 3478285L, 332872L, 791177L, 
    4805164L, 1408718L, 5232955L, 1771935L, 2259467L, 3376903L, 2385297L, 
    4852010L, 3771602L, 4619512L, 4221952L, 3472587L, 3734953L, 4018822L, 
    1308366L, 4057947L, 4573824L, 545463L, 3303167L, 4502527L, 3837677L, 
    4184887L, 4174426L, 1461097L, 147448L, 2566731L, 3300883L, 72689L, 
    3317772L, 4935292L, 4380180L, 4352184L, 148011L, 2750094L, 421915L, 
    4157011L, 2929336L, 4341175L, 1081379L, 2992396L, 4183930L, 4646073L, 
    120493L, 2166828L, 3609199L, 986390L, 2181468L, 1737202L, 342543L, 
    4425869L, 1691913L, 3056016L, 4366245L, 3633507L, 4710969L, 3295152L, 
    232484L, 1623483L, 803098L, 3420917L, 5192365L), class = "data.frame")
4

1 回答 1

2

图中的 y 轴是内核密度,默认情况下归一化为 1,因此如果增加点数,值会下降,反之亦然。正如已经指出的那样,第一个图(秒)的 y 值远小于第二个图(小时)的 y 值。

于 2014-02-06T18:53:26.590 回答