[Analysis] - Sudden CPU throttle and bottleneck analysis under 24 hour performance test

Description

  • In our recent 100 user 24 tests, we observe a sudden spike in CPU utilisation after 15 - 17 hours of test causing the CPU to bottleneck and max out the response times for the api’s tested during that window.

  • This task is to analyse on various fronts to identify the root cause behind these sudden spikes and document them

Attachments

5

Activity

M. Maharaja . April 16, 2023 at 3:22 AM

Moving this issue to done as it is not observed in performance env , The root cause will be tracked in FYI

M. Maharaja . April 3, 2023 at 5:11 AM

As suggested by Openmrs , we moved our Garbage Collection Strategy from Concurrent Mark Sweep to G1GC

With G1GC , we were able to set a dynamic heap allocation in Eden Space and Survivor Space based on application needs instead of static ones, Following are the OMRS_JAVA_SERVER_OPTS has to be set in setenv.sh

with this approach we were able to reduce the GC pauses which in turn reduces the impact on CPU Utilization from 95% to a max of 47.3% for a 100 users test , PFA the report for 100 users 24 hrs test Report

M. Maharaja . March 30, 2023 at 5:16 AM

Performance Tuning suggestions from OPENMRS -

Arjun G March 29, 2023 at 7:16 AM

Raising talk thread in OpenMRS -

M. Maharaja . March 16, 2023 at 4:59 AM

Investigation document -

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Dev Pair

Priority

Created March 10, 2023 at 5:01 AM
Updated April 16, 2023 at 3:22 AM
Resolved April 16, 2023 at 3:22 AM