我需要使用文本扫描的帮助。我正在尝试读取具有以下格式的数据:
# ---------------------------------- WARNING ----------------------------------------
# The data you have obtained from this automated U.S. Geological Survey database
# have not received Director's approval and as such are provisional and subject to
# revision. The data are released on the condition that neither the USGS nor the
# United States Government may be held liable for any damages resulting from its use.
# Additional info: http://nwis.waterdata.usgs.gov/nwis/help/?provisional
#
# File-format description: http://nwis.waterdata.usgs.gov/nwis/?tab_delimited_format_info
# Automated-retrieval info: http://nwis.waterdata.usgs.gov/nwis/?automated_retrieval_info
#
# Contact: gs-w_support_nwisweb@usgs.gov
# retrieved: 2013-09-13 13:10:29 EDT (nadww01)
#
# Data for the following 1 site(s) are contained in this file
# USGS 08067074 CWA Canal at Thompson Rd nr Baytown, TX
# -----------------------------------------------------------------------------------
#
# Data provided for site 08067074
# DD parameter Description
# 01 00010 Temperature, water, degrees Celsius
# 02 00095 Specific conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius
#
# Data-value qualification codes included in this output:
# A Approved for publication -- Processing and review completed.
# P Provisional data subject to revision.
#
agency_cd site_no datetime tz_cd 01_00010 01_00010_cd 02_00095 02_00095_cd
5s 15s 20d 6s 14n 10s 14n 10s
USGS 08067074 2013-01-05 00:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:15 CST 10.2 A 392 A
USGS 08067074 2013-01-05 04:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:30 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:45 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:00 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:30 CST 10.1 A 390 A
USGS 08067074 2013-01-05 07:45 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 08:15 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:30 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:45 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:15 CST 10 A 390 A
USGS 08067074 2013-01-05 09:30 CST 10 A 390 A
USGS 08067074 2013-01-05 09:45 CST 10 A 390 A
USGS 08067074 2013-01-05 10:00 CST 10 A 390 A
USGS 08067074 2013-01-05 10:15 CST 10 A 390 A
USGS 08067074 2013-01-05 10:30 CST 10 A 390 A
USGS 08067074 2013-01-05 10:45 CST 10 A 390 A
USGS 08067074 2013-01-05 11:00 CST 10 A 390 A
USGS 08067074 2013-01-05 11:15 CST 10 A 390 A
USGS 08067074 2013-01-05 11:30 CST 10 A 390 A
USGS 08067074 2013-01-05 11:45 CST 10 A 389 A
USGS 08067074 2013-01-05 12:00 CST 10 A 389 A
USGS 08067074 2013-01-05 12:15 CST 10 A 389 A
USGS 08067074 2013-01-05 12:30 CST 10 A 389 A
USGS 08067074 2013-01-05 12:45 CST 10 A 389 A
USGS 08067074 2013-01-05 13:00 CST 10 A 389 A
USGS 08067074 2013-01-05 13:15 CST 10 A 389 A
USGS 08067074 2013-01-05 13:30 CST 10 A 389 A
我关心的唯一两个数据条目是“比电导”和“日期”。(分别为第 3 列和第 7 列)
我能够使用以下代码在一致的基础上做到这一点:
%%
% Collect conductance data
filename = 'conductivityData_Temp_File';
%%
% Determine length of data file
fid = fopen('conductivityData_Temp_File','r');
fseek(fid, 0, 'eof');
chunksize = ftell(fid);
fseek(fid, 0, 'bof');
ch = fread(fid, chunksize, '*uchar');
N = sum(ch == sprintf('\n')); % number of lines
fclose(fid)
%%
% Read conductivity data
fileconductID = fopen(filename);
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s', N, 'delimiter', '\t', 'EmptyValue', 0, 'headerlines', 27);
fclose(fileconductID);
但是,我发现您可以简单地使用“commentstyle”来忽略评论。这很重要,因为我正在阅读多个文件,有时我会遇到一个不包含 27 行注释的文件。这将使我的程序抛出错误。
有人可以告诉我如何调整我的 textscan 代码以忽略注释行并跳过两个标题行吗?
如果我提供的示例代码很复杂,我深表歉意,但基本上我的错误在于这一行代码:
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s', N, 'delimiter', '\t', 'EmptyValue', 0, 'headerlines', 27);
(如果您想下载一个示例制表符分隔文件以使用此链接: 这里
谢谢!
回答:
谢谢 TryHard,这是一个很好的方法,但我想更接近我之前所做的事情。显然我的分隔符已关闭。
waterConductivityData = textscan(fileconductID,'%s %s %s %s %s %s %s %s %s ' , 'Delimiter', '\t', 'CommentStyle', '#');
dates = waterConductivityData{3}(3:end);
conductancesStr = waterConductivityData{7}(3:end);
temperaturesStr = waterConductivityData{5}(3:end);
conductances = str2double(conductancesStr);
temperatures = str2double(temperaturesStr);