r - TraMineR: extract events between equal states from SPELL-based sequence data

Question

Context

This question concerns sequence analysis using TraMineR package. The package offers automatic transformation of temporal sequences (statuses in time) to event sequences (changes between statuses in time). One of the recurrent issues in my analyses concerns the options to distinguish events of change between equal statuses.

Question-specific example

Suppose we have sequences of employment statuses, e.g. work, unemployment, inactivity, retirement. The analysis is focused on career transitions, distinguishing between stable and transitional careers. All kinds of transitions are relevant, from work to unemployment, inactivity to work, but also (and most importantly) from work to work!

Question

For TraMineR an event takes place when a status in a sequence is changed. For instance, the respondent had 3 years of work and then 1 in unemployment: Work-Work-Work-Unemployment (assuming annual interval). This is the STS format, representing statuses in time. However, in SPELL format we have additional information, e.g:

Status         Time1 Time2

Work           1     2
Work           2     3
Work           3     3
Unemployment   3     4

From the table above we can clearly see that two work-to-work transition events have occurred (otherwise there would be just one line: Work from 1 to 3). The question is whether there is any convenient way to extract an event object from the sequence object based on these data.

Data

My data contains work-related respondent statuses in the SPELL format (status, begin & end time), like this:

to.SO <- structure(list(ID = c(10, 11, 11, 12, 13, 13, 13, 13, 14, 14,     
         14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15), status = c(1, 
         1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3,         
         1, 3), time1 = c(1, 1, 104, 1, 1, 60, 109, 121, 1, 42, 47, 54,         
         64, 72, 78, 85, 116, 1, 29, 39, 69, 74, 78, 88), time2 = c(125,        
         104, 125, 125, 60, 109, 121, 125, 42, 47, 54, 64, 72, 78, 85,          
         116, 125, 29, 39, 69, 74, 78, 88, 125)), .Names = c("ID", "status",    
         "time1", "time2"), row.names = 10:33, class = "data.frame")

What I have tried

As per this post I must convert SPELL to STS first, then define sequences:

sts.data <- seqformat(data=to.SO,from="SPELL",to="STS",
                 id="ID",begin="time1",end="time2",status="status",
                 limit=125,process=FALSE)

sts.seq <- seqdef(sts.data,right="DEL")
alphabed <- c("Work","Study","Unemployed")
alphabet(sts.seq) <- alphabed

The information I require is already lost at this step, but until the bug (see link) is resolved there is no other way. It still shows what I want to achieve:

sts.seqe <- seqecreate(sts.seq) # creating events
sts.seqe

My results

Here, the first four event sequences are identical. If you look at the SPELL data (to.SO), it is apparent that there are multiple work-to-work transitions involved for respondents with id 11 and 13. In my other article I solve this by ascribing different statuses to job-1, job-2 and so forth. It is a less desirable strategy however, since it (1) explodes the number of statuses making subsequent dissimilarity analysis difficult and (2) is not theoretically important which job in career it is, the status of employment alone should cover it.

Thanks

I imagine this goes beyond the existing package capabilities, but perhaps I am missing something. Thanks in advance for reading this long post (at least) and for having any suggestions.

score 1 · Accepted Answer

'seqecreate' 接受不同类型的输入。其中之一是状态序列对象（由 seqdef 生成）。但是您也可以通过提供 TSE 格式的数据来构建事件序列对象。为此，您应该指定三个向量：id、timestamp 和 event。

拼写格式可以被视为 TSE 格式的数据（如果忽略句号的结尾）。开始列给出状态列中的事件发生的时间。

因此，我们可以使用以下代码：

## Start by giving some labels to the status vector
to.SO$event <- factor(to.SO$status, levels=1:3, labels=c("Work","Study","Unemployed"))
## Now, we can build the event sequences using seqecreate
## You may want to use timestamp=(to.SO$time1-1) instead. Events sequences start at time=0
seqe <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1, event=to.SO$event)
seqe

现在第四个人有正确的事件序列

如果要分析“工作>工作”转换，则需要重新编码数据。

## New vector holding our recoded events
event2 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
    if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
        if(to.SO[i-1, "event"]=="Work"&& to.SO[i, "event"]=="Work"){ ##Check 
           event2[i] <- "Work>Work"
        }
    }
}
## More general case
event3 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
    if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
        event3[i] <- paste(to.SO[i-1, "event"], to.SO[i, "event"], sep=">")
    }
}

通过修改此代码，您可以指定您感兴趣的转换。

seqe2 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event2)
seqe2

或者

seqe3 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event3)
seqe3

score 1 · Accepted Answer

我们确实可以想象一个解决方案，按照您的建议从拼写数据创建事件序列。TraMineR目前不提供此功能（但请参阅 Matthias 的解决方案）。

您已经在问题中给出的解决方法是将连续的工作区分为job1, job2, ...

我知道这是不太可取的，但是您可以使用此策略来定义分配相同事件的事件序列，例如"start new job"从作业i到作业i +1 的每个转换。为此，您需要指定一个大小为 a x atmat的矩阵 ( )，其中a是您的状态字母表的大小，它在每个单元格 ( i , j ) 中列出从状态i转换到状态j时发生的事件。例如，在行和列的交叉处，你会给出，并且因为从切换到job1job2"start new job"job2job1应该不可能，您只需将相应的单元格留空。对角线上的单元格 tmat( i,i ) 定义状态序列在相应状态i开始时的开始事件。一旦定义了矩阵 ( tmat)，将事件分配给每个可能的转换，就可以将事件序列对象创建为

seqe <- seqecreate(sts2.seq, tevent=tmat)

并且您仍然可以使用您的原始sts.seq状态序列分析与单一的工作状态。

希望这可以帮助。

r - TraMineR: extract events between equal states from SPELL-based sequence data

2 回答 2

Related

Reference