我使用提供的 ImageReader 训练了一个网络,现在,我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。
我见过与 EvalDll 相关的示例,但输入始终是浮点/双精度数组,而不是图像。
如何使用暴露的界面来使用经过训练的网络和 RGB 图像?
我使用提供的 ImageReader 训练了一个网络,现在,我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。
我见过与 EvalDll 相关的示例,但输入始终是浮点/双精度数组,而不是图像。
如何使用暴露的界面来使用经过训练的网络和 RGB 图像?
我假设你会想要与阅读的等价物ImageReader
,你的阅读器配置看起来像
features=[
width=224
height=224
channels=3
cropType=Center
]
您将需要辅助函数来创建裁剪,并将图像重新调整为网络接受的大小。
我将定义 2 种扩展方法System.Drawing.Bitmap
,一种用于裁剪,一种用于重新调整大小:
open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
/// Crops the image in the present object, starting at the given (column, row), and retaining
/// the given number of columns and rows.
member this.Crop(column, row, numCols, numRows) =
let rect = Rectangle(column, row, numCols, numRows)
this.Clone(rect, this.PixelFormat)
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
member this.ResizeImage(width, height, useHighQuality) =
// Rather than using image.GetThumbnailImage, use direct image resizing.
// GetThumbnailImage throws odd out-of-memory exceptions on some
// images, see also
// http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
// Use the interpolation method suggested on
// http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
let rect = Rectangle(0, 0, width, height);
let destImage = new Bitmap(width, height);
destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
use graphics = Graphics.FromImage destImage
graphics.CompositingMode <- CompositingMode.SourceCopy;
if useHighQuality then
graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
graphics.CompositingQuality <- CompositingQuality.HighQuality
graphics.SmoothingMode <- SmoothingMode.HighQuality
graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
else
graphics.InterpolationMode <- InterpolationMode.Low
use wrapMode = new ImageAttributes()
wrapMode.SetWrapMode WrapMode.TileFlipXY
graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
destImage
基于此,定义一个函数来进行中心裁剪:
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
let cropSize =
float(min image.Height image.Width) * cropRatio
|> int
let startRow = (image.Height - cropSize) / 2
let startCol = (image.Width - cropSize) / 2
image.Crop(startCol, startRow, cropSize, cropSize)
然后将它们连接在一起:裁剪、调整大小,然后按照 OpenCV 使用的平面顺序遍历图像:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
// Apply the same image pre-processing that is typically done
// in CNTK when running it in test or write mode: Take a center
// crop of the image, then re-size it to the network input size.
let cropped = CenterCrop 1.0 image
let resized = cropped.ResizeImage(targetSize, targetSize, false)
// Ensure that the initial capacity of the list is provided
// with the constructor. Creating the list via the default constructor
// makes the whole operation 20% slower.
let features = List (targetSize * targetSize * 3)
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
for c in 0 .. 2 do
for h in 0 .. (resized.Height - 1) do
for w in 0 .. (resized.Width - 1) do
let pixel = resized.GetPixel(w, h)
let v =
match c with
| 0 -> pixel.B
| 1 -> pixel.G
| 2 -> pixel.R
| _ -> failwith "No such channel"
|> float32
features.Add v
features
调用ImageToFeatures
有问题的图像,将结果输入 的实例IEvaluateModelManagedF
,你就很好了。我假设您的 RGB 图像进来了myImage
,并且您正在使用 224 x 224 的网络大小进行二进制分类。
let LoadModelOnCpu modelPath =
let model = new IEvaluateModelManagedF()
let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
model.Init description
model.CreateNetwork description
model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)
我在 C# 中实现了类似的代码,它加载到模型中,读取测试图像,进行适当的裁剪/缩放/等,然后运行模型。正如 Anton 所指出的,输出与 CNTK 的输出没有 100% 匹配,但非常接近。
图像读取/裁剪/缩放的代码:
private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
{
var rect = new Rectangle(col, row, numCols, numRows);
return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
}
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
{
var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
var startCol = (img.Width - cropSize) / 2;
var startRow = (img.Height - cropSize) / 2;
return ImCrop(img, startCol, startRow, cropSize, cropSize);
}
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
private static Bitmap ImResize(Bitmap img, int width, int height)
{
return new Bitmap(img, new Size(width, height));
}
用于加载模型和包含像素的 xml 文件的代码意味着:
public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
{
var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
Stopwatch stopWatch = new Stopwatch();
var model = new IEvaluateModelManagedF();
model.CreateNetwork(networkConfiguration, deviceId: -1);
stopWatch.Stop();
Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
return model;
}
/// Read the xml mean file, i.e. the offsets which are substracted
/// from each pixel in an image before using it as input to a CNTK model.
public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
{
// Read and parse pixel value xml file
XmlTextReader reader = new XmlTextReader(XmlPath);
reader.ReadToFollowing("data");
reader.Read();
var pixelMeansXml =
reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(Single.Parse)
.ToArray();
// Re-order mean pixel values to be in the same order as the bitmap
// image (as outputted by the getRGBChannels() function).
int inputDim = 3 * ImgWidth * ImgHeight;
Debug.Assert(pixelMeansXml.Length == inputDim);
var pixelMeans = new float[inputDim];
int counter = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < ImgHeight; h++)
for (int w = 0; w < ImgWidth; w++)
{
int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
pixelMeans[counter++] = pixelMeansXml[xmlIndex];
}
return pixelMeans;
}
加载图像并转换为模型输入的代码:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor, and the mean
/// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
{
// Apply the same image pre-processing that is done typically in CNTK:
// Take a center crop of the image, then re-size it to the network input size.
var imgCropped = ImCropToCenter(img, 1.0);
var imgResized = ImResize(imgCropped, targetSize, targetSize);
// Convert pixels to CNTK model input.
// Fast pixel extraction is ~5 faster while giving identical output
var features = new float[3 * imgResized.Height * imgResized.Width];
var boFastPixelExtraction = true;
if (boFastPixelExtraction)
{
var pixelsRGB = ImGetRGBChannels(imgResized);
for (int c = 0; c < 3; c++)
{
byte[] pixels = pixelsRGB[2 - c];
Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
for (int i = 0; i < pixels.Length; i++)
{
int featIndex = i + c * pixels.Length;
features[featIndex] = pixels[i] - pixelMeans[featIndex];
}
}
}
else
{
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
// Note: calling GetPixel(w, h) repeatedly is slow!
int featIndex = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < imgResized.Height; h++)
for (int w = 0; w < imgResized.Width; w++)
{
var pixel = imgResized.GetPixel(w, h);
float v;
if (c == 0)
v = pixel.B;
else if (c == 1)
v = pixel.G;
else if (c == 2)
v = pixel.R;
else
throw new Exception("");
// Substract pixel mean
features[featIndex] = v - pixelMeans[featIndex];
featIndex++;
}
}
return features.ToList();
}
/// Convert bitmap image to R,G,B channel byte arrays.
/// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
{
// Lock the bitmap's bits.
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
// Declare an array to hold the bytes of the bitmap.
int bytes = bmpData.Stride * bmp.Height;
byte[] rgbValues = new byte[bytes];
byte[] r = new byte[bytes / 3];
byte[] g = new byte[bytes / 3];
byte[] b = new byte[bytes / 3];
// Copy the RGB values into the array, starting from ptr to the first line
IntPtr ptr = bmpData.Scan0;
Marshal.Copy(ptr, rgbValues, 0, bytes);
// Populate byte arrays
int count = 0;
int stride = bmpData.Stride;
for (int col = 0; col < bmpData.Height; col++)
{
for (int row = 0; row < bmpData.Width; row++)
{
int offset = (col * stride) + (row * 3);
b[count] = rgbValues[offset];
g[count] = rgbValues[offset + 1];
r[count++] = rgbValues[offset + 2];
}
}
bmp.UnlockBits(bmpData);
return new List<byte[]> { r, g, b };
}