2

我们最近对应用程序的持久性机制进行了更改,通过反射替换了慢速序列化以创建 SQL 语句字符串,这些语句通过 OpenHFT Chronicle 序列化对 MySQL 数据库执行,并将各种对象字段的硬编码附加到摘录中。正如预期的那样,这大大减少了持久事务所需的时间并减少了我们的延迟。但我们注意到更改前后的垃圾收集模式有些令人费解。虽然我们产生的垃圾更少,但每次垃圾收集所花费的时间却增加了很多,我们很难理解原因。

这是更改前后两次运行的示例,在RedHat 4.4.7 CentOS 6.7 上使用 Java 1.8.0_74 运行相同的 VM 参数。“After”运行似乎消耗更少的年轻代,并且在每次 GC 开始时消耗的总量更少,在 GC 结束时大致相同的堆,但实际运行 GC 需要更多时间,并且运行频率要低得多(GC 之间大约 300 秒而不是 70-80 秒)。

我们试图弄清楚这是如何发生的,以及我们是否可以做任何事情来让 GC 暂停回到我们更快地生成垃圾时的位置。关于 Chronicle 以及它使用堆外内存映射文件的方式是否会导致这种情况?这似乎不太可能,但它似乎是有牵连的,除非它真的只是垃圾生成方式导致它的原因。FWIW,作为一项实验,我们尝试了 G1 和 CMS,并且至少使用默认参数,它们显示相同的模式,如果有更糟的时间。

虚拟机参数:

-Xms75776M
-Xmx75776M
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Dsun.rmi.dgc.server.gcInterval=54000000
-Dsun.rmi.dgc.client.gcinterval=54000000
-XX:+DisableExplicitGC
-XX:MaxNewSize=19000M
-XX:+PrintGCApplicationStoppedTime
-XX:InitialSurvivorRatio=3

前:

15009.601: [GC (Allocation Failure) [PSYoungGen: 19226640K->142320K(19273216K)] 22796282K->3814635K(77411840K), 0.0596127 secs] [Times: user=0.53 sys=0.05, real=0.06 secs]
15080.410: [GC (Allocation Failure) [PSYoungGen: 19235824K->137920K(19282432K)] 22908139K->3913367K(77421056K), 0.0570436 secs] [Times: user=0.50 sys=0.06, real=0.06 secs]
15154.485: [GC (Allocation Failure) [PSYoungGen: 19243712K->129840K(19279360K)] 23019159K->4007451K(77417984K), 0.0605860 secs] [Times: user=0.55 sys=0.05, real=0.06 secs]
.
.
bunch of lines omitted to get to approximately the same end-point
.
16070.663: [GC (Allocation Failure) [PSYoungGen: 19195664K->146464K(19262976K)] 24408408K->5460932K(77401600K), 0.0616580 secs] [Times: user=0.55 sys=0.06, real=0.06 secs]

后:

15313.256: [GC (Allocation Failure) [PSYoungGen: 15092194K->2230066K(15050752K)] 16440428K->3981360K(73189376K), 0.3563350 secs] [Times: user=3.40 sys=0.15, real=0.36 secs]
15770.413: [GC (Allocation Failure) [PSYoungGen: 15050546K->2922738K(15321600K)] 16801840K->4674033K(73460224K), 0.3584541 secs] [Times: user=3.56 sys=0.00, real=0.36 secs]
16090.016: [GC (Allocation Failure) [PSYoungGen: 14896370K->3571346K(15244288K)] 16647665K->5322641K(73382912K), 0.4276084 secs] [Times: user=4.25 sys=0.00, real=0.42 secs]

更新: 所以我根据下面的一些建议进行了以下实验(这是在 UAT 机器上,而不是 PROD 机器上,所以它有点不同......例如,linux 版本是 CentOS 7.2 而不是 6.7)。我们将堆的大小限制为 32G,并设置了 200 毫秒的显式 GC 暂停目标,以及打开更多诊断信息。如果有的话,结果会更糟。它开始时非常合理,但建议的暂停时间最多为 1.94 秒。

java -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -Dsun.rmi.dgc.server.gcInterval=54000000 -Dsun.rmi.dgc.client.gcinterval=54000000 -XX:+DisableExplicitGC -Xms32000M -Xmx32000M


{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 9557504K, used 4096257K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 50% used  [0x00007f58fe580000,0x00007f59f85c0458,0x00007f5af2600000)
  from space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
  to   space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
 ParOldGen       total 21845504K, used 0K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9000000,0x00007f58fe580000)
 Metaspace       used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
  class space    used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
2.420: [GC (Metadata GC Threshold) 
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 4096257K->13270K(9557504K)] 4096257K->13366K(31403008K), 0.0181755 secs] [Times: user=0.11 sys=0.05, real=0.02 secs] 
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 9557504K, used 13270K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af32f59a8,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 96K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9018010,0x00007f58fe580000)
 Metaspace       used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
  class space    used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
}
{Heap before GC invocations=2 (full 1):
 PSYoungGen      total 9557504K, used 13270K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af32f59a8,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 96K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9018010,0x00007f58fe580000)
 Metaspace       used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
  class space    used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
2.438: [Full GC (Metadata GC Threshold) [PSYoungGen: 13270K->0K(9557504K)] [ParOldGen: 96K->12747K(21845504K)] 13366K->12747K(31403008K), [Metaspace: 20877K->20877K(1069056K)], 0.0530698 secs] [Times: user=0.27 sys=0.05, real=0.05 secs] 
Heap after GC invocations=2 (full 1):
 PSYoungGen      total 9557504K, used 0K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 12747K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c72e90,0x00007f58fe580000)
 Metaspace       used 20877K, capacity 21118K, committed 21296K, reserved 1069056K
  class space    used 2499K, capacity 2599K, committed 2608K, reserved 1048576K
}
{Heap before GC invocations=3 (full 1):
 PSYoungGen      total 9557504K, used 8192512K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
  from space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 12747K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c72e90,0x00007f58fe580000)
 Metaspace       used 22891K, capacity 23188K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
1332.241: [GC (Allocation Failure) 
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8192512K->291885K(9557504K)] 8205259K->304649K(31403008K), 0.2443282 secs] [Times: user=0.48 sys=3.83, real=0.25 secs] 
Heap after GC invocations=3 (full 1):
 PSYoungGen      total 9557504K, used 291885K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 21% used [0x00007f5b45b00000,0x00007f5b5780b650,0x00007f5b99000000)
  to   space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
 ParOldGen       total 21845504K, used 12763K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c76e90,0x00007f58fe580000)
 Metaspace       used 22891K, capacity 23188K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=4 (full 1):
 PSYoungGen      total 9557504K, used 8484397K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
  from space 1364992K, 21% used [0x00007f5b45b00000,0x00007f5b5780b650,0x00007f5b99000000)
  to   space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
 ParOldGen       total 21845504K, used 12763K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c76e90,0x00007f58fe580000)
 Metaspace       used 22939K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
2858.320: [GC (Allocation Failure) 
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8484397K->677978K(9557504K)] 8497161K->690750K(31403008K), 0.5290410 secs] [Times: user=0.74 sys=8.70, real=0.53 secs] 
Heap after GC invocations=4 (full 1):
 PSYoungGen      total 9557504K, used 677978K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 49% used [0x00007f5af2600000,0x00007f5b1bc16a80,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 12771K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c78e90,0x00007f58fe580000)
 Metaspace       used 22939K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=5 (full 1):
 PSYoungGen      total 9557504K, used 8870490K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
  from space 1364992K, 49% used [0x00007f5af2600000,0x00007f5b1bc16a80,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 12771K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c78e90,0x00007f58fe580000)
 Metaspace       used 22946K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
4467.385: [GC (Allocation Failure) 
Desired survivor size 1397751808 bytes, new threshold 7 (max 15)
[PSYoungGen: 8870490K->1096779K(9557504K)] 8883262K->1109558K(31403008K), 0.9701847 secs] [Times: user=1.96 sys=15.40, real=0.97 secs] 
Heap after GC invocations=5 (full 1):
 PSYoungGen      total 9557504K, used 1096779K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 80% used [0x00007f5b45b00000,0x00007f5b88a12d00,0x00007f5b99000000)
  to   space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
 ParOldGen       total 21845504K, used 12779K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c7ae90,0x00007f58fe580000)
 Metaspace       used 22946K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
{Heap before GC invocations=6 (full 1):
 PSYoungGen      total 9557504K, used 9289291K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 100% used [0x00007f58fe580000,0x00007f5af2600000,0x00007f5af2600000)
  from space 1364992K, 80% used [0x00007f5b45b00000,0x00007f5b88a12d00,0x00007f5b99000000)
  to   space 1364992K, 0% used [0x00007f5af2600000,0x00007f5af2600000,0x00007f5b45b00000)
 ParOldGen       total 21845504K, used 12779K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 0% used [0x00007f53c9000000,0x00007f53c9c7ae90,0x00007f58fe580000)
 Metaspace       used 22951K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
6153.256: [GC (Allocation Failure) 
Desired survivor size 2375548928 bytes, new threshold 6 (max 15)
[PSYoungGen: 9289291K->1364987K(9557504K)] 9302070K->2238887K(31403008K), 1.9590450 secs] [Times: user=5.38 sys=29.68, real=1.96 secs] 
Heap after GC invocations=6 (full 1):
 PSYoungGen      total 9557504K, used 1364987K [0x00007f58fe580000, 0x00007f5b99000000, 0x00007f5b99000000)
  eden space 8192512K, 0% used [0x00007f58fe580000,0x00007f58fe580000,0x00007f5af2600000)
  from space 1364992K, 99% used [0x00007f5af2600000,0x00007f5b45afed70,0x00007f5b45b00000)
  to   space 1364992K, 0% used [0x00007f5b45b00000,0x00007f5b45b00000,0x00007f5b99000000)
 ParOldGen       total 21845504K, used 873899K [0x00007f53c9000000, 0x00007f58fe580000, 0x00007f58fe580000)
  object space 21845504K, 4% used [0x00007f53c9000000,0x00007f53fe56ae90,0x00007f58fe580000)
 Metaspace       used 22951K, capacity 23252K, committed 23472K, reserved 1071104K
  class space    used 2602K, capacity 2688K, committed 2736K, reserved 1048576K
}
4

3 回答 3

1

我会考虑将堆保持在 32 GB 以下,以便您可以使用压缩的 oops。

通过拥有更多的收藏品,您可能会随着时间的推移传播工作。我也会尝试在新空间中使用。

于 2016-07-29T08:06:49.700 回答
1

您担心暂停时间,但没有设置暂停时间目标 ( MaxGCPauseMillis)。默认收集器 (ParallelGC) 没有默认暂停时间目标,因此您必须明确指定一个。

设置后,GC 将尝试调整年轻一代的大小和任期阈值以满足这些目标。

FWIW,作为一项实验,我们尝试了 G1 和 CMS,并且至少使用默认参数,它们显示相同的模式,如果有更糟的时间。

G1 确实有 200 毫秒的默认暂停目标,因此暂停时间上升到该范围不会是意料之外的。

于 2016-07-28T23:34:31.763 回答
1

您需要一些更详细的统计信息,考虑使用诸如 visualgc 之类的东西来观察堆的不同部分随着负载的发展而发生的情况和标志,例如-XX:+PrintTenuringDistribution深入研究伊甸园相关空间中正在发生的事情。

在第二种情况下,总堆大小不断增加的事实表明您正在逐渐流血到终身,因此增加的时间可能是幸存者在从和到幸存者空间之间 ping 的成本(然后最终被提升为终身制)。请注意,这是基于可用数据的假设,我们需要更详细的统计数据才能真正了解正在发生的事情(或者了解我个人没有的编年史行为方式)。

这可能意味着您有内存泄漏或 eden 现在太小(无论是就 maxtenuringthreshold 还是仅 eden 的绝对大小而言,一般来说,可能需要对 eden 进行一些调整以适应您的新堆使用模式)。

于 2016-07-28T21:56:41.330 回答