6

最终目标:CellRecords有效地(一次通过)在一个巨大的(30,000+ 行)上读取所有内容,受保护Worksheet

问题: 使用HSSF.EventUserModel,我如何读取具有 Workbook 和 Worksheet 保护的 XLS 文件的所有Records(包括)?CellRecords

创建输入电子表格(在 Excel 2010 中):

  1. 创建新的空白工作簿。
  2. 将 A1 的值设置为数字:50
  3. 将 A2 的值设置为字符串:五十
  4. 将A3的值设置为公式:=25*2
  5. 审查(功能区)-> 保护工作表-> 密码:pass1
  6. 审查(功能区)-> 保护工作簿-> 密码:pass1
  7. 文件(功能区)->另存为...->另存为类型:Excel 97-2003 工作簿

迄今为止的进展:

  • XLS 文件在 Excel 中打开时没有密码。因此,您不需要密码即可在 POI 中打开它。
  • XLS 文件以new HSSFWorkbook(Stream fs). 但是,我需要EventUserModel实际电子表格的效率。
  • 设置NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1";不起作用。
  • ProcessRecord( )函数捕获 a PasswordRecord,但我找不到任何有关如何正确处理它的文档。
  • 也许,EncryptionInfoorDecryptor类可能有一些用处。

注意:
我正在使用 NPOI。但是,我可以将任何 java 示例转换为 C#。

代码:
我使用以下代码来捕获Record事件。我的Book1-unprotected.xls(无保护)显示所有Record事件(包括单元格值)。MyBook1-protected.xls显示一些记录并引发异常。

我只是processedEvents在调试器中查看。

using System;
using System.Collections.Generic;
using System.IO;

using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;

namespace NPOI_small {
    class myListener : IHSSFListener {
        List<Record> processedRecords;

        private Stream fs;

        public myListener(Stream fs) {
            processedRecords = new List<Record>();
            this.fs = fs;

            HSSFEventFactory factory = new HSSFEventFactory();
            HSSFRequest request = new HSSFRequest();

            MissingRecordAwareHSSFListener mraListener;
            FormatTrackingHSSFListener fmtListener;
            EventWorkbookBuilder.SheetRecordCollectingListener recListener;
            mraListener = new MissingRecordAwareHSSFListener(this);
            fmtListener = new FormatTrackingHSSFListener(mraListener);
            recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
            request.AddListenerForAllRecords(recListener);

            POIFSFileSystem poifs = new POIFSFileSystem(this.fs);

            factory.ProcessWorkbookEvents(request, poifs);
        }

        public void ProcessRecord(Record record) {
            processedRecords.Add(record);
        }
    }
    class Program {
        static void Main(string[] args) {
            Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");

            myListener testListener = new myListener(fs); // Use EventModel 
            //HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel

            Console.Read();
        }
    }
}

更新(针对胡安·梅拉多) 以下是例外。我现在最好的猜测(在 Victor Petrykin 的回答中)是不能本地解密受保护记录的HSSFEventFactory用途。RecordInputStream收到异常后,processedRecords包含 22 条记录,其中包括以下可能重要的记录:

  • processesRecords[5] 是一个WriteAccessRecord带有乱码(可能是加密)的值.name
  • processesRecords[22] 是一个RefreshAllRecord并且是Record列表中的最后一个

例外:

NPOI.Util.RecordFormatException was unhandled
  HResult=-2146233088
  Message=Unable to construct record instance
  Source=NPOI
  StackTrace:
       at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
       at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
       at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: NPOI.Util.RecordFormatException
       HResult=-2146233088
       Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
       Source=NPOI
       StackTrace:
            at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
            at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
            at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
4

1 回答 1

4

我认为这是NPOI库代码中的错误。据我了解,他们使用不正确的流类型HSSFEventFactory:它使用RecordInputStream而不是RecordFactoryInputStream像原始POI库或UserModel(这就是为什么HSSFWorkbook工作)中的解密函数

此代码也可以工作,但它不是事件逻辑:

POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null) 
   ProcessRecord(currRecord);
于 2013-04-09T13:07:32.813 回答