我正在对一个名为 JOLT 的 JSON 转换库进行基准测试。该库看起来不错,但我们想确保它不会对我们的垃圾收集器造成太大压力。因此,我创建了一个小程序(转载如下),它执行以下操作:
- precompiles JSON tranformation scriptlet
- loop for a long time in each of ten threads, doing the following:
- create randomized JSON string input (approx 300 bytes)
- parse input using JOLT and apply tranform
当我们查看垃圾收集器日志时,我们从未看到完整的 GC(即使运行了半天),更可疑的是,报告发生的唯一 GC 发生在启动程序的前几分钟,然后几个小时过去了,我们没有看到一个额外的 GC。该程序每秒执行的交易数量令人印象深刻。根据 JOLT 文档:
"The transform process will create and discard a lot of objects, so the garbage collector will have work to do."
所以,我想我应该很高兴我们的 gc 日志即使经过一个小时的运行后仍然看起来像这样:
Server VM (25.66-b17) for linux-amd64 JRE (1.8.0_66-b17), built on Oct 6 2015 17:28:34 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 12439584k(11065340k free), swap 12721148k(12721148k free)
CommandLine flags: -XX:InitialHeapSize=199033344 -XX:MaxHeapSize=3184533504 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
2017-01-10T11:13:39.752-0800: 1.458: [GC (Allocation Failure) 2017-01-10T11:13:39.753-0800: 1.459: [ParNew: 51904K->6464K(58368K), 0.0123836 secs] 51904K->8148K(188096K), 0.0138403 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.416-0800: 2.122: [GC (Allocation Failure) 2017-01-10T11:13:40.416-0800: 2.123: [ParNew: 58368K->6463K(58368K), 0.0193642 secs] 60052K->10934K(188096K), 0.0197870 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2017-01-10T11:13:40.611-0800: 2.317: [GC (Allocation Failure) 2017-01-10T11:13:40.611-0800: 2.318: [ParNew: 58367K->4378K(58368K), 0.0098514 secs] 62838K->12472K(188096K), 0.0102622 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.762-0800: 2.468: [GC (Allocation Failure) 2017-01-10T11:13:40.762-0800: 2.468: [ParNew: 56282K->6464K(58368K), 0.0067945 secs] 64376K->14714K(188096K), 0.0071342 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.954-0800: 2.661: [GC (Allocation Failure) 2017-01-10T11:13:40.955-0800: 2.661: [ParNew: 58368K->6464K(58368K), 0.0102488 secs] 66618K->17490K(188096K), 0.0108123 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:41.216-0800: 2.922: [GC (Allocation Failure) 2017-01-10T11:13:41.216-0800: 2.922: [ParNew: 58368K->6464K(58368K), 0.0110529 secs] 69394K->21205K(188096K), 0.0114348 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
2017-01-10T11:13:41.412-0800: 3.119: [GC (Allocation Failure) 2017-01-10T11:13:41.413-0800: 3.119: [ParNew: 58368K->5049K(58368K), 0.0073977 secs] 73109K->22139K(188096K), 0.0077819 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
但也许这好得令人难以置信?也许我没有指定正确的 JVM 开关来捕获 >all< GC,并且真的发生了一些没有记录的事情?
该程序不断运行(并不断产生小对象),所以看起来这会填满伊甸园空间,并且会发生内存分配失败,结果会启动 GC.. 即使我有(我认为) 对垃圾收集器概念的正确掌握,我有点困惑。我能想到的唯一可能发生在这里的事情是:1)也许 JOLT 将它的小对象分配在堆栈上(如在一个大数组中)而不是堆上?或 2)(更有可能),也许我没有正确指定我的 JVM 开关。这是我用开关所做的:
export MAVEN_OPTS=" -XX:+UseConcMarkSweepGC \
-XX:SoftRefLRUPolicyMSPerMB=1 \
-XX:+PrintGCTimeStamps \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-Xloggc:/tmp/gc.log -Xmx2048m "
然后我将程序作为单元测试运行,如下所示:
mvn clean test
下面给出了测试的源代码。
(如果需要,我可以在 github 上发布这件事的完整 maven 项目。)提前感谢您的帮助!
import com.bazaarvoice.jolt.Chainr;
import com.bazaarvoice.jolt.JsonUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.math.stat.descriptive.SummaryStatistics;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.jupiter.api.DisplayName;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class BenchmarkJoltTest {
static int numThreads = 20;
static int numSecondsToRun = 360 * 1000; // 10 hours
static boolean randomize = true; /* for non-random we will validate that the output always remains constant */
static int hashOfExpectedOutput = 0;
static SummaryStatistics summaryStatistics = new SummaryStatistics();
static String inputJson = "";
static String transformSpecJson = "";
static Random random = new Random();
static Chainr chainr = null;
static AtomicLong completedTransformsCountForInterval = new AtomicLong(0);
static AtomicLong completedTransformsCountGrandTotal = new AtomicLong(0);
//summaryStatistics
@BeforeClass
public static void initialize() throws Exception {
System.out.println("HI HI HO");
inputJson =
String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("input.json")));
transformSpecJson =
String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("spec.json")));
chainr = Chainr.fromSpec(JsonUtils.jsonToList( transformSpecJson ));
/* Warm up: do JIT compilation, load classes and whatever else might need to be done only during startup */
for (int i = 0; i < 100; i++) {
runOneTransform(false);
}
}
@Test
@DisplayName("Benchmark for Jolt Json transforms")
public void runBenchmark() throws Exception {
runTransformsInThreadPool(10 /* threads */);
System.out.println("OK - here");
for (int i = 0; i < numSecondsToRun; i++) {
Thread.sleep(1000); /* 1 second */
double value = completedTransformsCountForInterval.doubleValue();
System.out.println("adding to count: " + value );
summaryStatistics.addValue(value);
completedTransformsCountForInterval.set(0L); // reset counter
if (completedTransformsCountForInterval.get() > 100) {
throw new Exception("how did the counter suddenly jump so high? -- unexpected");
}
}
System.out.println(
"Summary statistics for number of transforms performed per second\n\n" +
summaryStatistics.toString() + "\ntotalCount: "+ completedTransformsCountForInterval);
System.out.println("Grand total of completed operations: " + completedTransformsCountGrandTotal);
System.exit(0);
}
private void runTransformsInThreadPool(final int threadCount) throws InterruptedException, ExecutionException {
Callable<Void> task = new Callable<Void>() {
@Override
public Void call() throws Exception {
while (true) {
runOneTransform(true /* include in official count */);
}
}
};
List<Callable<Void>> tasks = Collections.nCopies(threadCount, task);
ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
for (Callable c : tasks) {
System.out.println("submitting " + c);
executorService.submit(c);
System.out.println("finished submitting ");
}
System.out.println("INVOKED all");
}
private static void runOneTransform(boolean includeInOfficialCount /* don't include if in warm-up mode */) throws Exception {
String targetJson = randomize ? getRandomizedJson() : inputJson;
Object transformedJson = chainr.transform(JsonUtils.jsonToObject(targetJson));
String transformedJsonAsString = JsonUtils.toJsonString(transformedJson);
if (hashOfExpectedOutput == 0) {
hashOfExpectedOutput = transformedJsonAsString.hashCode();
}
if (! randomize && hashOfExpectedOutput != transformedJsonAsString.hashCode()) {
throw new Exception("hash codes of transformed strings don't match. multi-threading issue ?");
}
if (includeInOfficialCount) {
completedTransformsCountForInterval.incrementAndGet();
completedTransformsCountGrandTotal.incrementAndGet();
} else {
System.out.println(transformedJsonAsString);
}
}
private static String getRandomizedJson() {
Integer num = random.nextInt() % 10;
inputJson.replaceAll("3", num.toString());
return inputJson;
}
}