2

我从 Stata 的限价订单簿中获得高频数据。时间没有固定的间隔,有些观察是同时的(以毫秒为单位)。对于每个观察,我需要在 5 分钟后在单独的列中获得中点。因此,对于观察 1,中点为 10.49,因为最接近 09:05:02.579 的最后一个中点为 10.49。

如何在Stata中做到这一点?

datetime                         midpoint
12/02/2012 09:00:02.579          10.5125
12/02/2012 09:00:03.471          10.5125
12/02/2012 09:00:03.471          10.5125
12/02/2012 09:00:03.471          10.51
12/02/2012 09:00:03.471          10.51
12/02/2012 09:00:03.549          10.505
12/02/2012 09:00:03.549          10.5075
   ......
12/02/2012 09:04:59.785          10.495
12/02/2012 09:05:00.829          10.4925
12/02/2012 09:05:01.209          10.49
12/02/2012 09:05:03.057          10.4875
12/02/2012 09:05:05.055          10.485
 .....
4

2 回答 2

1

我的方法是

  1. 生成偏移五分钟的新数据集
  2. append这个移位器数据集
  3. 找到最接近五分钟增量的观察前后
  4. 使用一些标准来选择这两个值中的更好的值

您指定了最接近的,但您可能希望根据您的书添加一些其他条件。此外,您在给定的毫秒刻度上提到了多个值,但没有更多信息,我不确定如何处理。你想先合并这些中点吗?还是它们是不同的股票?

下面是一些实现上述方法基础的代码。

clear
version 11.2
set seed 2001

* generate some data
set obs 100000
generate double dt = ///
    tc(02dec2012 09:00:00.000) + 1000*_n + int(100*rnormal())
format dt %tcDDmonCCYY_HH:MM:SS.sss
sort dt
generate midpt = 100
replace midpt = ///
    round(midpt[_n - 1] + 0.1*rnormal(), 0.005) if (_n != 1)

* add back future midpts
preserve
tempfile future
rename midpt fmidpt
rename dt fdt
generate double dt = fdt - tc(00:05:00.000)
save `future'
restore
append using `future'

* generate midpoints before and after 5 minutes in the future
sort dt
foreach v of varlist fdt fmidpt {
    clonevar `v'_b = `v'
    replace `v'_b = `v'_b[_n - 1] if missing(`v'_b)
}

gsort -dt
foreach v of varlist fdt fmidpt {
    clonevar `v'_a = `v'
    replace `v'_a = `v'_a[_n - 1] if missing(`v'_a)
}

format fdt* %tcDDmonCCYY_HH:MM:SS.sss

* use some algorithm to pick correct value
sort dt    
generate choose_b = ///
    ((dt + tc(00:05:00.000)) - fdt_b) < (fdt_a - (dt + tc(00:05:00.000))) 
generate fdt_c = cond(choose_b, fdt_b, fdt_a)
generate fmidpt_c = cond(choose_b, fmidpt_b, fmidpt_a)
format fdt_c %tcDDmonCCYY_HH:MM:SS.sss
于 2013-01-16T15:14:02.987 回答
1
// Construct a variable to look for in the dataset
gen double midpoint_5 = (datetime + 5*60000)    
format midpoint_5 %tcNN/DD/CCYY_HH:MM:SS.sss

// will contain the closest observation number and midpoint 5 minutes a head
gen _t = .
gen double midpoint_at5 = . 

// How many observations in the sample?
local N = _N

// We will use these variables to skip some observations in the loop
egen obs_in_minute = count(minutes_filter), by(minutes_filter)
egen max_obs_in_minute = max(obs_in_minute)
set more off 

// For each observation
forvalues i = 1/`N' {

    // If it is a trade
    if type[`i'] == "Trade" {

        // Set the time to lookup in the data
        local lookup = midpoint_5[`i']

        // The time should be between the min and max(*5)
        local min = `i' + obs_in_minute[`i'] // this might cause errors
        local max = `i' + max_obs_in_minute[`i']*5

        // For each of these observations
        forvalues j = `min'/`max' {

            // Check if the lookup date is smaller than the datetime of the observation
            if `lookup' < datetime[`j'] {

                // Set the observation ID at the lookup ID 1 observation before
                quietly replace _t = `j'-1 in `i'
                // Set the midpoint at the lookup ID 1 observation before
                quietly replace midpoint_at5 = midpoint[`j'-1] in `i'

                // We have found the closest 5th min ahead... now stop loop and continue to next observation.
                continue, break
            }
        }

        // This is to indicate where we are in the loop
        display "`i'/`N'"
    }
}
于 2013-01-17T07:20:02.940 回答