matrix - 乘法矩阵并存储 sram 或 sdram

Question

我正在开始一个项目，我将矩阵相乘并在 FPGA / Changes DE2 中合成它。当我开始时，我想指导如何操作将这个值存储在内存中。我想做的是[C] = [A] * [B]。注：[A]、[B]、[C]的值是自动存入SRAM还是SDRAM？当我阅读问题时：Verilog For Loop For Array Multiplication变得更清楚如何操作，但我看不到如何管理它以将其输入内存。任何人都有一些我可以适应读写内存的代码？我走这条路对吗？

编辑：

我有这个用于矩阵 4x4 乘法的代码。你能告诉我这段代码是否正确吗？我尝试运行，但 C 值没有保存到内存中。

module rams(
input clk,
  //  SRAM Interface
  inout  [15:0] SRAM_DQ,   // SRAM Data bus 16 Bits
  output [17:0] SRAM_ADDR, // SRAM Address bus 18 Bits
  output SRAM_UB_N,        // SRAM High-byte Data Mask
  output SRAM_LB_N,        // SRAM Low-byte Data Mask 
  output SRAM_WE_N,    // SRAM Write Enable
  output SRAM_CE_N,        // SRAM Chip Enable
  output SRAM_OE_N        // SRAM Output Enable
);

parameter mat_size = 4;  // change the size of the matrices here.
reg [7:0] A_mat [0:mat_size*mat_size-1];
reg [7:0] B_mat [0:mat_size*mat_size-1];


wire [15:0] mem_in;
reg [17:0] mem_address;

wire [7:0] A,B;
wire [7:0] C;
wire [19:0] A_addr,B_addr,C_addr;
reg reset;
wire start;
reg [9:0] Cr,Cc;

assign SRAM_ADDR = mem_address;
assign SRAM_UB_N = 1'b0;        // SRAM High-byte Data Mask
assign SRAM_LB_N = 1'b0;        // SRAM Low-byte Data Mask 
assign SRAM_CE_N = 1'b0;        // SRAM Chip Enable
assign SRAM_OE_N = 1'b0;        // SRAM Output Enable

reg [2:0] state;
parameter idle=0, read_A=1, read_B=2, start_process=3,do_nothing = 4;;

assign SRAM_WE_N = (valid_output ? 1'b0 : 1'b1);
assign start = !(valid_output | reset);//(valid_output ? 1'b0 : 1'b1);
assign SRAM_DQ = (valid_output ? mem_in : 16'hzzzz);

    // Instantiate the Unit Under Test (UUT)
    mat_mult uut (
        .clk(clk), 
        .reset(reset), 
        .start(start),
        .A_addr(A_addr), 
        .B_addr(B_addr), 
        .C_addr(C_addr), 
        .A(A), 
        .B(B), 
        .mat_size(mat_size), 
        .C(C), 
        .valid_output(valid_output)
    );

assign mem_in = {4'h00,C};

initial begin
    state = idle;
end     

always @(posedge clk)
begin
    case (state)
        idle :
            begin
                mem_address <= 16'h0000;
                state = read_A;
                reset <= 1'b1;
            end
        read_A :    
            begin
                A_mat[mem_address] <= SRAM_DQ;
                if(mem_address < mat_size*mat_size) begin
                    state = read_A;
                    mem_address <= mem_address + 1;
                end else begin
                    state = read_B;
                end 
            end
        read_B :    
            begin
                B_mat[mem_address-(mat_size*mat_size)] <= SRAM_DQ;
                if(mem_address < 2*mat_size*mat_size) begin
                    state = read_B;
                    mem_address <= mem_address + 1;
                end else begin
                    state = start_process;
                    reset <= 1'b0;
                end 
            end 
        start_process : 
            begin
                state = start_process;
                mem_address <= 2*mat_size*mat_size + C_addr;
                if(C_addr == mat_size*mat_size-1) begin 
                    state = do_nothing;
                end else begin
                    reset <= 1'b0;
                end
            end     
        do_nothing : 
            if(valid_output) begin
                reset <= 1'b1;
            end 
    endcase

end

assign A = A_mat[A_addr];
assign B = B_mat[B_addr];

endmodule

我将值 A 和 B 一起加载，格式化十六进制 16。我使用 DE_Control Altera 将值转换为值，因为我不知道如何使用您的加载代码来执行此操作。乘法模块是：

module mat_mult(
    input clk,
    input reset,
     input start,
    output [19:0] A_addr,
    output [19:0] B_addr,
    output [19:0] C_addr,
    input [7:0] A,
    input [7:0] B,
     input [9:0] mat_size,
    output [7:0] C,
     output valid_output
    );

reg [9:0] Ar,Br,Bc,Cr,Cc;
reg [7:0] C_res;
reg v;

always @ (posedge clk or posedge reset)
    if (reset) begin
        Ar <= 10'b0000000000;
        Br <= 10'b0000000000;
        Bc <= 10'b0000000000;
        Cr <= 10'b0000000000;
        Cc <= 10'b0000000000;
        C_res <= 8'h00;
        v <= 1'b0;
    end else begin
        if (start) begin
            if (Br == mat_size-1) begin
                Br <= 10'b0000000000;
                if (Bc == mat_size-1) begin
                    Bc <= 10'b0000000000;
                    if (Ar == mat_size-1) begin
                        Ar <= 10'b0000000000;
                    end else begin
                        Ar <= Ar + 1;
                    end
                end else begin
                    Bc <= Bc + 1;
                end
                v <= 1'b1;
            end else begin
                Br <= Br + 1;
            end
            C_res <= C_res + A*B;
        end else begin
            C_res <= 8'h00;
            v <= 1'b0;
        end 
    end 

assign A_addr = (Ar * mat_size) + Br;
assign B_addr = (Br * mat_size) + Bc;
assign C_addr = (Ar * mat_size) + Bc;
assign C = C_res;
assign valid_output = v;

endmodule

score 0 · Accepted Answer

使用内存时，请记住它只是地址和字节之间的映射。因此，如果您考虑一个矩阵（为简单起见，我假设 4x4 的 32 位浮点数），那么您真正拥有的只是 16 个浮点 32 位数字。

如果您打算将它们存储在片上存储器（集成到 FPGA 芯片中的 SRAM 单元）或片外存储到某种 DDR 类型的存储器中，那么将它们存储在存储器中的方式会有所不同。

将它们存储在芯片上更容易使用，在这种情况下，在 verilog 中，您只需声明一个数组，然后一次读取和写入一个元素。如果您的 FPGA 具有可用的正确大小的 ram 单元，您的合成器会将其推断为 RAM。

module matrixmem;

input clk;
input [3:0] addr;
input [31:0] data_in;
output [31:0] data_out;
input write;
input read;

reg [31:0] mem [0:15]; //16 32-bit elements, enough to store one 4x4 array.

always @(posedge clk) begin
   if(write)
       mem[addr] <= data_in;
   else if (read)
       data_out <= mem[addr];
end

然后通过此模块，您可以通过更改地址一次提取一个数组的元素，其中地址表示您要提取矩阵的哪个元素。

如果您想在芯片外存储这些内容，那就有点复杂了，因为您需要实现一个 DDR 控制器或利用 FPGA 附带的一些 IP 来与外部 ram 通信。但本质上它的工作方式相同，因为矩阵的每个元素都将存储在某个地址中。

score 0 · Accepted Answer

您必须始终牢记的另一项预防措施是，在声明 always 块时，如果它是顺序块，建议将输出端口声明为 reg 类型。

至于这里的解决方案，如果我们想使用 data_out 来存储值，我们需要将其声明为 reg 类型。否则没关系。

matrix - 乘法矩阵并存储 sram 或 sdram

2 回答 2

Related

Reference