Bahmni Performance Testing Journey (High Level Summary)

Goals

  1. To publish the baseline reports and advantage of having upgraded to new software components.

  2. To publish the capacity planning reports and to be able to predict per-facility cloud running hardware costs with different hardware contexts.

  3. To publish a Roadmap for the next set of experiments or features that promise to improve the performance (or reduce the per-facility costs) considerably.

  4. To integrate performance test runs with Bahmni deployments and be available to anyone in the community to modify/run/benchmark their own Bahmni deployments.

More details here


Performance Test Plan Strategy

Test Strategy

The strategy mainly focuses on doing realistic stress testing on Bahmni LITE environment by maintaining the following criteria

  1. To have pauses between each user interaction thus maintaining the breathing time for each persona.

  2. To have the overall load shared between each persona based on their breathing time.

  3. To have a ramp-up and ramp-down of users at the beginning and end of each test.

  4. Each test will start with a set of patients for the doctors to start the consultation from the beginning of the test.

  5. To maintain a seamless connection of scenarios between patient registration and consultation.

  6. To have a hard stop time at the end of the test to control the overall test duration.

More details here


Test Scenarios 

 

The performance test suite has the following test scenarios developed

  1. New Patient - Registration - Start OPD Visit

  2. Existing Patient - Patient Search - Start OPD Visit

  3. Upload Patient Document

  4. Doctor Consultation and Observations Flow

More details here


Infra Setup

The Performance Test environment runs on Kubernetes on AWS.

  • A separate namespace is created with a Bahmni Kubernetes Installation.

  • The existing RDS is shared with the performance namespace.

  • For Monitoring, Grafana and JVM Dashboard are added.

More details here


Required Software

 

  • JDK 11

  • Gradle

  • Nodejs

  • Newman

  • Aws credentials (Needed only to run the test on the cloud)

  • Access for GitHub actions (Needed only to run the test on the cloud)

  • Yourkit Java profiler(Get license from Infra Team)

  • Network Bandwidth controller - Wondershaper

Code repository

Archive Report Path - GH Pages


Test Execution Steps 

 

  • Clone all the repositories.

  • Use the Wondershaper to set the network speeds only if needed.

  • Run the test data generator to create and upload new patients.

  • Copy the registrations.csv file from /output to /src/gatling/resources .

  • Start the test by providing the simulation type , number of users, and duration of the test.

  • To run the test against different environments update respective env properties in

    • src/gatling/scala/configurations/protocols.scala and src/gatling/scala/api/constants.scala 

  • To run the test in cloud use the trigger in GH actions.

More details here & here


Java Profiling

 

  • Made use of YourKit profiling tool to profile JVM while running performance executions

  • Helped in analysing CPU and memory utilisation, troubleshoot code that slows down API responses, locating possible deadlocks and so on.

  • Setting up YourKit on a remote machine can be found here.


Findings & Remediation

📗 Baseline Test Observations

 

  • By default, Openmrs comes with Open JVM memory management which is not optimal for applications with large memory footprints. So we moved to CMS(Concurrent Mark Sweep) which gave us a low GC pause time and higher throughput for minimal patient data. - BAH-2660.

  • We have configured the min, max heap size and parallel GC threads.

  • This change has reduced the max time taken by the POST API call to save encounters for 90 users test run from 4149 ms to 1551 ms.

More details about the baseline test reports can be found here 


📗 Long Duration Test Observations (24 hour test runs)

  1. Saving the consultation page takes more time due to a groovy parse class function, By disabling the parse class function the response time for a single API call is reduced from 2.5s to 1s - BAH-2870.

  2. The HIP health check module was pinging OpenMRS patients and visit API every 5 seconds causing the environment to go down due to Out-of-Memory Exception constantly whenever the patients count reaches 125k - BAH-2441, BAH-2783. (this was fixed). The fix for this issue has reduced the max time taken by the POST API call to save encounters for 70 users test run from 60s to 4s.

  3. HIP and Crater atom feed were also pinging OpenMRS to query the event feeds causing high GC pauses which in turn spiking CPU utilization - BAH-2801, BAH-2912.

  4. The update of GC strategy from CMS to G1GC has helped to control the CPU spike.

  5. Without the HIP, Crater atomfeed and updated G1GC settings the 99th percentile has reduced to 1.5s.

More details about the JVM configurations, infra setup and long duration test runs can be read here: https://bahmni.atlassian.net/wiki/spaces/BAH/pages/3110568005


Bahmni lite Cost Estimates (Projected)

 

  • Based on the long-duration test results and corresponding AWS utilization bills we have come up with a cost calculator. Link: https://bahmni.atlassian.net/wiki/spaces/BAH/pages/3140714497

  • Anyone can create a cloud cost estimation to set up Bahmni LITE by providing the no. of users, users-per-clinic, and operational hours.

The assumption for load pattern is as per our Test suite. If the operations being performed at your facility are different than the scenarios in Test suite, then the results won’t match as-is. Please review the test scenarios to get a better understanding of the performance work done by the team.


Future Recommendations for Performance Testing

 

Troubleshoot / Improvement stories

  1. Test the environment with multitenancy.

  2. Update the test suite with the latest changes in the application - BAH-2903.

  3. Reduce the impact of HIP and Crater atomfeed on openmrs - BAH-2948.

  4. Optimize the API response time - BAH-2871 , BAH-2890, BAH-2891 , BAH-2892 , BAH-2893.

  5. Optimize the application memory management - BAH-2949 .

  6. Optimize the duplicate SQL queries - BAH-2716.

  7. Backlog stories list.

The Bahmni documentation is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)