22

Is there a way to determine MS Office Excel file type in Apache POI? I need to know in what format is the Excel file: in Excel '97(-2007) (.xls) or Excel 2007 OOXML (.xlsx).

I suppose I could do something like this:

int type = PoiTypeHelper.getType(file);
switch (type) {
case PoiType.EXCEL_1997_2007:
   ...
   break;
case PoiType.EXCEL_2007:
   ...
   break;
default:
   ...
}

Thanks.

4

5 回答 5

47

Promoting a comment to an answer...

If you're going to be doing something special with the files, then rjokelai's answer is the way to do it.

However, if you're just going to be using the HSSF / XSSF / Common SS usermodel, then it's much simpler to have POI do it for you, and use WorkbookFactory to have the type detected and opened for you. You'd do something like:

 Workbook wb = WorkbookFactory.create(new File("something.xls"));

or

 Workbook wb = WorkbookFactory.create(request.getInputStream());

Then if you needed to do something special, test if it's a HSSFWorkbook or XSSFWorkbook. When opening the file, use a File rather than an InputStream if possible to speed things up and save memory.

If you don't know what your file is at all, use Apache Tika to do the detection - it can detect a huge number of different file formats for you.

于 2013-01-25T14:58:42.160 回答
24

You can use:

// For .xlsx
POIXMLDocument.hasOOXMLHeader(new BufferedInputStream( new FileInputStream(file) ));

// For .xls
POIFSFileSystem.hasPOIFSHeader(new BufferedInputStream( new FileInputStream(file) ));

These are essentially the methods that the WorkbookFactory#create(InputStream) uses for determining the type

Please note, that both method supports only streams supporting "mark" feature (or PushBackInputStream), so simple FileInputStream is not supported. Use BufferedInputStream as a wrapper. For this reason after the detection you can simply reuse the stream, since it will be reseted to the starting point.

于 2013-01-25T13:12:15.940 回答
2

This can be done using the FileMagic class. See below JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/filesystem/FileMagic.html

Sample code snippet:

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML) // XLSX

于 2018-04-29T15:17:51.547 回答
1

Based on the lib implementation of org.apache.poi.ss.usermodel.WorkbookFactory#create(java.io.InputStream)

We can mimic the WorkbookFactory's logic, remove irrelevant bits and return file type instead.

public static TYPE fileType(File file) {
    try (
            InputStream inp = new FileInputStream(file)
    ) {
        if (!(inp).markSupported()) {
            return getNotMarkSupportFileType(file);
        }
        return getType(inp);
    } catch (IOException e) {
        LOGGER.error("Analyse FileType Problem.", e);
        return TYPE.INVALID;
    }
}

private static TYPE getNotMarkSupportFileType(File file) throws IOException {
    try (
            InputStream inp = new PushbackInputStream(new FileInputStream(file), 8)
    ) {
        return getType(inp);
    }
}

private static TYPE getType(InputStream inp) throws IOException {
    byte[] header8 = IOUtils.peekFirst8Bytes(inp);
    if (NPOIFSFileSystem.hasPOIFSHeader(header8)) {
        NPOIFSFileSystem fs = new NPOIFSFileSystem(inp);
        return fileType(fs);
    } else if (DocumentFactoryHelper.hasOOXMLHeader(inp)) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.INVALID;
}

private static TYPE fileType(NPOIFSFileSystem fs) {
    DirectoryNode root = fs.getRoot();
    if (root.hasEntry("EncryptedPackage")) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.HSSF_WORKBOOK;

}

public enum TYPE {
    HSSF_WORKBOOK, XSSF_WORKBOOK, INVALID
}
于 2017-06-17T10:16:41.720 回答
0

This is the way i have identified the requested file is of Office type.

public static boolean isOfficeDoc(String filePath) {
         FileMagic fileMagic = getFileMagicObj(filePath);
            return fileMagic != null && (fileMagic == FileMagic.OLE2 || fileMagic == FileMagic.OOXML);
        }

    private static FileMagic getFileMagicObj(String filePath) {

        try (InputStream is = new FileInputStream(filePath);
             InputStream magicIS = FileMagic.prepareToCheckMagic(is)) {

            return FileMagic.valueOf(magicIS);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            return null;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
于 2019-10-03T11:12:16.020 回答