1

我正在尝试使用csvserde读取 rust 中的混合分隔符 csv 类型文件,但我很难看到如何使用这些库来完成它。每一行大致如下: value1|value2|subvalue1,subvalue2,subvalue3|value4

并将反序列化为如下所示的结构:

struct Line {
    value1:u64,
    value2:u64,
    value3:Vec<u64>,
    value4:u64,
}

关于如何告诉库有两个不同的分隔符并且其中一个列具有这种嵌套结构的任何指导?

4

1 回答 1

0

好的,我仍然是 Rust 的初学者,所以我不能保证这一切都很好 - 我怀疑它可以更有效地完成,但我确实有一个可行的解决方案 -

use csv::{ReaderBuilder};
use serde::{Deserialize, Deserializer};
use serde::de::Error;
use std::error::Error as StdError;


#[derive(Debug, Deserialize)]
pub struct ListType {
    values: Vec<u8>,
}

fn deserialize_list<'de, D>(deserializer: D) -> Result<ListType , D::Error>
where D: Deserializer<'de> {
    let buf: &str = Deserialize::deserialize(deserializer)?;

    let mut rdr = ReaderBuilder::new()
        .delimiter(b',')
        .has_headers(false)
        .from_reader(buf.as_bytes());
    
    let mut iter = rdr.deserialize();

    if let Some(result) = iter.next() {
        let record: ListType = result.map_err(D::Error::custom)?;
                
        return Ok(record)
    } else {
        return Err("error").map_err(D::Error::custom)
    }
}

struct Line {
    value1:u64,
    value2:u64,
    #[serde(deserialize_with = "deserialize_list")]
    value3:ListType,
    value4:u64,
}

fn read_line(line: &str) -> Result<Line, Box<dyn StdError>> {
    let mut rdr = ReaderBuilder::new()
        .delimiter(b'|')
        .from_reader(line.as_bytes());
    
    let mut iter = rdr.deserialize();

    if let Some(result) = iter.next() {
        let record: Line = result?;
        return Ok(Line)
    } else {
        return Err(From::from("error"));
    }
}

[编辑]

我发现上述解决方案速度慢得令人无法忍受,但我能够通过简单地将嵌套类型手动反序列化为固定大小的数组来使性能可接受 -

#[derive(Debug, Deserialize)]
pub struct ListType {
    values: [Option<u8>; 8],
}

fn deserialize_farray<'de, D>(deserializer: D) -> Result<ListType, D::Error>
where
    D: Deserializer<'de>,
{
    let buf: &str = Deserialize::deserialize(deserializer)?;

    let mut split = buf.split(",");

    let mut dest: CondList = CondList {
        values: [None; 8],
    };

    let mut ind: usize = 0;

    for tok in split {
        if tok == "" {
            break;
        }

        match tok.parse::<u8>() {
            Ok(val) => {
                dest.values[ind] = Some(val);
            }
            Err(e) => {
                return Err(e).map_err(D::Error::custom);
            }
        }

        ind += 1;
    }

    return Ok(dest);
}

于 2021-05-26T14:24:01.853 回答