7

How can I import data from a .xlsx file into R so that numbers are represented as numbers, when their original decimal separator is comma not a dot?

The only package I know of, when dealing with excel is readxl from tidyverse.

I'm looking for a solution that won't need opening and editing excel files in any other software (and can deal with hundreds of columns to import) - if that would be possible I'd export all excels to .csv and import them using tools I know of, that can take the dec= argument.

So far my best working solution is to import numbers as characters and then transform it:

library(dplyr)
library(stringr)

var1<- c("2,1", "3,2", "4,5")
var2<- c("1,2", "3,33", "5,55")
var3<- c("3,44", "2,2", "8,88")
df<- data.frame(cbind(var1, var2, var3))

df %>%
      mutate_at(vars(contains("var")),
                str_replace,
                pattern = ",",
                replacement = "\\.") %>%
      mutate_at(vars(contains("var")), funs(as.numeric))
4

3 回答 3

2

I suspect strongly that there is some other reason these columns are being read as character, most likely that they are the dreaded "Number Stored as Text".

For ordinary numbers (stored as numbers), after switching to comma as decimal separator either for an individual file or in the overall system settings, readxl::read_excel reads in a numeric properly. (This is on my Windows system.) Even when adding a character to one of the cells in that column or setting col_types="text", I get the number read in using a period as decimal, not as comma, giving more evidence that readxl is using the internally stored data type.

The only way I have gotten R to read in a comma as a decimal is when the data is stored in Excel as text instead of as numeric. (You can enter this by prefacing the number with a single quote, like '1,7.) I then get a little green triangle in the corner of the cell, which gives the popup warning "Number Stored as Text". In my exploration, I was surprised to discover that Excel will do calculations on numbers stored as text, so that's not a valid way of checking for this.

于 2018-01-11T17:57:36.267 回答
1

It's pretty easy to replace the "," with a "." and recast the column as numeric. Example:

> x <- c('1,00','2,00','3,00')
> df <- data.frame(x)
> df
     x
1 1,00
2 2,00
3 3,00
> df$x <- gsub(',','.',df$x)
> df$x <- as.numeric(df$x)
> df
  x
1 1
2 2
3 3
> class(df$x)
[1] "numeric"
> 

Just using base R and gsub.

于 2018-01-05T20:20:54.490 回答
1

I just had the same problem while dealing with an Excel spreadsheet I had received from a colleague. After I had tried to import the file using readxl (which failed), I converted the file into a csv file hoping to solve the problem using read_delim and fiddling with the locale and decimal sign options. But the problem was still there, no matter which options I used.

Here is the solution that worked for me: I found out that the characters that were used in the cells containing the missing values (. in my case) were causing trouble. I went back to the Excel file, replaced . in all cells with missing values with blanks while just keeping the default option for the decimals (,). After that, all columns were imported correctly as numeric using readxl.

If you should face this problem with your decimals set to . make sure to tick the box saying "Match entire cell contents" in Excel before replacing all instances of the missing values .

于 2019-09-24T17:45:03.407 回答