0

I have a dataset containing some dates listed in the CSV as dd/mm/yy and my final aim is to separate out the year column into a separate column, however it seems to be giving me some very odd results.

When the CSV is input into R, the dataset (dframe1) shows the dates as factors automatically e.g. 8/3/05 but they aren't all in this format, some are 8/3/2005 and some are 08/03/2005 (as they should be) which means that when I convert them, some of the dates are coming up like they should (i.e., 2004-11-5) and some are coming out really strange (i.e., 0004-11-5).. I checked on the excel spreadsheet and they are all formatted exactly the same, written in the same etc., and the code is exactly the same for converting, but it just doesn't seem to work.

Here is an example of what it looks like what I import into R, even though in excel they are all the same:

2   11/11/04
3   11/11/04
4   11/11/04
5   11/11/04
6   11/11/04
7   11/11/04
8   11/11/04
9   11/11/04
10  5/3/05
11  5/3/05
12  5/3/05
13  5/3/05
14  5/3/05
15  5/3/05
16  5/3/05
17  3/11/05
18  3/11/05
19  3/11/05
20  3/11/05
21  3/11/05
22  3/11/05
23  3/11/05
24  3/11/05
25  3/11/05
26  3/11/05
27  9/6/06
28  9/6/06
29  9/6/06
30  9/6/06
31  9/6/06
32  9/6/06
33  9/6/06
34  9/6/06
35  11/8/06
36  11/8/06
37  11/8/06
38  11/8/06
39  11/8/06
40  11/8/06
41  11/8/06
42  11/8/06
43  22/02/2007
44  22/02/2007
45  22/02/2007
46  22/02/2007
47  22/02/2007
48  22/02/2007
49  22/02/2007
50  7/2/08
51  7/2/08
52  7/2/08
53  7/2/08
54  7/2/08
55  7/2/08
56  8/5/08
57  8/5/08
58  8/5/08
59  8/5/08
60  25/03/2012
61  25/03/2012
62  25/03/2012
63  25/03/2012
64  25/03/2012
65  25/03/2012

The code I am wanting to use to convert is as below:

data = dframe1

Set.date, Haul.date, Date.depart and Date.return are all columns within dframe1

First I make sure R knows that the date columns are actually dates (rather than factors, the automatic assumption)

dframe1$Set.date <- as.Date(dframe1$Set.date,"%d/%m/%Y")
dframe1$Haul.date <- as.Date(dframe1$Haul.date, format ="%d/%m/%Y")
dframe1$Date.depart <- as.Date(dframe1$Date.depart, format ="%d/%m/%Y")
dframe1$Date.return <- as.Date(dframe1$Date.return, format ="%d/%m/%Y")

Next I want to separate and add a column for year (can also do with day and month but won't bother this time)

dframe1$Set.year <- format(dframe1$Set.date[1], "%Y")
dframe1$Haul.year <- format(dframe1$Haul.date[1], "%Y")
dframe1$Year.depart <- format(dframe1$Date.depart[1], "%Y") 
dframe1$Year.return <- format(dframe1$Date.return[1], "%Y")
4

2 回答 2

2

First, Excel doesn't show you the actual text format of a date/time string as it exists in the CSV file -- it converts them to Excel format for display. You should open your CSV in a text editor and look at the dates there to verify the format.

Second, you can use the lubridate package to make working with dates/times much easier. For your example, you can do something like this:

library(lubridate)

dframe1$Set.date <- dmy(dframe1$Set.date)

dframe1$Set.year <- year(dframe1$Set.date)

In lubridate, if the general date format is known, you can parse character dates with functions like dmy(), ymd_hms(), and so on. You can also use the parse_date_time() function and specify orders as you did in as.Date(). Lubridate also has year(), month(), and other functions for extracting specific elements of date/timestamps.

Lubridate stores dates as POSIXct objects by default, not Date objects, but that shouldn't matter for your purposes.

于 2014-12-09T22:58:21.327 回答
1

If all your dates are in the last 14 years, then this should work:

dat$dat2 <- gsub("/([0-9]{2})$", "/20\\1", dat$date)
library(lubridate)

dat$dat3 <- dmy(dat$dat2)

The gsub function will ignore teh items that do not have exactly 2 digits between a forward-slash and the end of the string. If you have some values in the 1900's then there should be ways of converting those. Perhaps:

dat$dat2 <- gsub("/([7-9][0-9])$", "/19\\1", dat$date)
于 2014-12-10T05:05:04.403 回答