Bahmni Lite Performance Long Duration Simulation Baselining

This would be a living document to capture various baselining snapshots while :-

Troubleshooting and applying a patch
Changing Software or Network configurations
Adding new scenarios and changing load share

Source Code: GitHub - Bahmni/performance-test

1 Automation Technology Stack
2 Base Configuration
- 2.1 Hardware
- 2.2 Software
3 Client Configuration
4 📗 40 Concurrent Users - 8 Hours
5 📗 50 Concurrent Users - 8 Hours
6 📕 70 Concurrent Users - 8 Hours
7 🟣 Tests after HIU(ABDM) Fix
- 7.1 📗 40 Concurrent Users - 24 Hours
- 7.2 📗 70 Concurrent Users - 24 Hours
8 🟣 Tests with G1GC JVM settings
- 8.1 📗 100 Concurrent Users - 24 Hours
9 🟣 Tests with G1GC JVM settings after HIP Atomfeed fix
- 9.1 📗 100 Concurrent Users - 24 Hours
- 9.2 📗 120 Concurrent Users - 24 Hours

Automation Technology Stack

Base Configuration

Hardware

Performance environment was running on AWS EKS Custer with single node

Node (EC2: m5-xlarge)

RAM 16GB
4 vCPU
100GB Secondary storage
AWS LINUX x86_64

Total 15 application pods in cluster such as openmrs, bahmni-web, postgresql, rabbitmq etc

Database (AWS RDS service: db.t3.xlarge)

RAM 16GB,
4 vCPU (2 core, 2.5 GHz Intel Scalable Processor)
100GB Secondary storage
MySQL, max_connections = 1304

Software

OpenMRS Tomcat - Server

Server version: Apache Tomcat/7.0.94
Server built:   Apr 10 2019 16:56:40 UTC
Server number:  7.0.94.0
OS Name:        Linux
OS Version:     5.4.204-113.362.amzn2.x86_64
Architecture:   amd64
JVM Version:    1.8.0_212-8u212-b01-1~deb9u1-b01
ThreadPool:     Max 200, Min 25 (Default server.xml)

OpenMRS - Heap

Initial Heap: 256 MB
Max Heap: 768 MB

-Xms256m -Xmx768m -XX:PermSize=256m -XX:MaxPermSize=512m

Openmrs Connection Pooling

hibernate.c3p0.max_size=50
hibernate.c3p0.min_size=0
hibernate.c3p0.timeout=100
hibernate.c3p0.max_statements=0
hibernate.c3p0.idle_test_period=3000
hibernate.c3p0.acquire_increment=1

Client Configuration

The performance test simulation(Gatling) will be executed in the client machine.

Client (EC2: c5-xlarge)

RAM 8GB
4 vCPU
8GB storage
AWS LINUX x86_64

📗 40 Concurrent Users - 8 Hours

Network: 60 MBPS
Ramp Up: 5 mins
Database pre-state: 543 Patients

OpenMRS JVM Configuration:

-Xms2048m -Xmx2048m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MetaspaceSize=768m -XX:MaxMetaspaceSize=768m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:SurvivorRatio=16 -XX:TargetSurvivorRatio=50 -XX:MaxTenuringThreshold=15 -XX:+UseParNewGC -XX:ParallelGCThreads=16 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark

Report Link: Gatling Stats - Global Information

Report Observations:

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	1920	131	215	282	533
	Existing Patient Search using ID → Start OPD Visit	30%	1440	32	91	167	245
	Existing Patient Search using Name → Start OPD Visit	20%	1440	27	50	76	176
	Upload Patient Document	10%	480	88	152	202	327
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	1920	562	1065	1191	1592

📗 50 Concurrent Users - 8 Hours

Network: 60 MBPS
Ramp Up: 5 mins
Database pre-state: 2464 Patients

OpenMRS JVM Configuration:

-Xms2048m -Xmx2048m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MetaspaceSize=768m -XX:MaxMetaspaceSize=768m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:SurvivorRatio=16 -XX:TargetSurvivorRatio=50 -XX:MaxTenuringThreshold=15 -XX:+UseParNewGC -XX:ParallelGCThreads=16 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark

Report Link: Gatling Stats - Global Information

Report Observations:

The execution was successful for 8 hours. But the OpenMRS went down after a time period when the environment was idle due to the same issue observed for 70 concurrent user 8 hour run available below.

Needs to be monitored

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	2440	130	226	295	478
	Existing Patient Search using ID → Start OPD Visit	30%	1680	32	85	173	2272
	Existing Patient Search using Name → Start OPD Visit	20%	1680	29	77	131	277
	Upload Patient Document	10%	480	89	143	166	263
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	2400	572	1199	1338	1716

📕 70 Concurrent Users - 8 Hours

Network: 60 MBPS
Ramp Up: 5 mins
Database pre-state: 8361 Patients

OpenMRS JVM Configuration:

-Xms2048m -Xmx2048m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MetaspaceSize=768m -XX:MaxMetaspaceSize=768m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:SurvivorRatio=16 -XX:TargetSurvivorRatio=50 -XX:MaxTenuringThreshold=15 -XX:+UseParNewGC -XX:ParallelGCThreads=16 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark

Report Link: Gatling Stats - Global Information

Report Observations:

The test was up and running for 5 hours. Then the OpenMRS application went down due “Java Heap - Out of Memory” issue. The same issue was observed for every “70 concurrent users - 8 hours” test with OpenMRS going down at various time intervals.

JVM Observation:

Overall Heap Size: 1.94GB | CMS Old Gen Size: 1GB | Par Eden Space: 910MB

*Note: All the above memory categories where maxed out when OpenMRS went down

🟣 Tests after HIU(ABDM) Fix

After the above test failures, complete analysis were performed across Bahmni lite services and found the health check services implemented under HIU(ABDM) for openMRS were constantly piling up the heap memory and caused the application to crash when it reach the maximum allocation. This issue had been successfully fixed and not reproducible again.

📗 40 Concurrent Users - 24 Hours

Hardware

Performance environment was running on AWS EKS Custer with single node

Node (EC2: t3-large)

RAM 8GB
2 vCPU
100GB Secondary storage
AWS LINUX x86_64

Total 20 application pods in cluster such as openmrs, bahmni-web, postgresql, abdm etc

Database (AWS RDS service: db.t3.xlarge)

RAM 16GB,
4 vCPU (2 core, 2.5 GHz Intel Scalable Processor)
100GB Secondary storage
MySQL, max_connections = 1304

Software

OpenMRS Tomcat - Server

Server version: Apache Tomcat/7.0.94
Server built:   Apr 10 2019 16:56:40 UTC
Server number:  7.0.94.0
OS Name:        Linux
OS Version:     5.4.204-113.362.amzn2.x86_64
Architecture:   amd64
JVM Version:    1.8.0_212-8u212-b01-1~deb9u1-b01
ThreadPool:     Max 200, Min 25 (Default server.xml)

OpenMRS - Heap

Initial Heap: 1024 MB
Max Heap: 1536 MB

-Xms1024m -Xmx1536m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=40 -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark -XX:+UseGCOverheadLimit -XX:+UseStringDeduplication

Openmrs Connection Pooling

hibernate.c3p0.max_size=50
hibernate.c3p0.min_size=0
hibernate.c3p0.timeout=100
hibernate.c3p0.max_statements=0
hibernate.c3p0.idle_test_period=3000
hibernate.c3p0.acquire_increment=1

Report

Network: 60 Mbps
Duration: 24 hours
Ramp Up: 5 mins
Database pre-state: 75000 patient records

Report Link: Gatling Stats - Global Information

Report Observations:

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	5760	152	484	648	1389
	Existing Patient Search using ID → Start OPD Visit	30%	4320	54	472	676	1977
	Existing Patient Search using Name → Start OPD Visit	20%	4320	119	352	507	1492
	Upload Patient Document	10%	1440	142	482	581	1135
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	5760	1364	4056	4531	7291

📗 70 Concurrent Users - 24 Hours

Hardware

Performance environment was running on AWS EKS Custer with single node

Node (EC2: m5-xlarge)

RAM 16GB
4 vCPU
100GB Secondary storage
AWS LINUX x86_64

Total 20 application pods in cluster such as openmrs, bahmni-web, postgresql, abdm etc

Database (AWS RDS service: db.t3.xlarge)

RAM 16GB,
4 vCPU (2 core, 2.5 GHz Intel Scalable Processor)
100GB Secondary storage
MySQL, max_connections = 1304

Software

OpenMRS Tomcat - Server

Server version: Apache Tomcat/7.0.94
Server built:   Apr 10 2019 16:56:40 UTC
Server number:  7.0.94.0
OS Name:        Linux
OS Version:     5.4.204-113.362.amzn2.x86_64
Architecture:   amd64
JVM Version:    1.8.0_212-8u212-b01-1~deb9u1-b01
ThreadPool:     Max 200, Min 25 (Default server.xml)

OpenMRS - Heap

Initial Heap: 1024 MB
Max Heap: 2536 MB

-Xms1024m -Xmx2536m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=40 -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark -XX:+UseGCOverheadLimit -XX:+UseStringDeduplication

Openmrs Connection Pooling

hibernate.c3p0.max_size=50
hibernate.c3p0.min_size=0
hibernate.c3p0.timeout=100
hibernate.c3p0.max_statements=0
hibernate.c3p0.idle_test_period=3000
hibernate.c3p0.acquire_increment=1

Report

Network: 60 Mbps
Duration: 24 hours
Ramp Up: 5 mins
Database pre-state: 90500 patient records

Report Link: Gatling Stats - Global Information

Report Observations:

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	10080	130	252	309	674
	Existing Patient Search using ID → Start OPD Visit	30%	7200	49	305	464	2660
	Existing Patient Search using Name → Start OPD Visit	20%	7200	135	278	348	573
	Upload Patient Document	10%	2160	111	206	253	459
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	10080	998	2331	2608	4134

🔰 Observations:

Doctor Consultation - The maximum response times for this particular activity was pretty high under both 40 and 70 concurrent tests. The services responsible for this activity is under analysis and will be prioritised for performance improvement.

🟣 Tests with G1GC JVM settings

After finding that hip and crater atomfeed in creating more objects at runtime maxing out the eden space , The following test is done without atomfeed and updated to G1GC memory management

📗 100 Concurrent Users - 24 Hours

Hardware

Performance environment was running on AWS EKS Custer with single node

Node (EC2: m5-xlarge)

RAM 16GB
4 vCPU
100GB Secondary storage
AWS LINUX x86_64

Total 18 application pods in cluster excluding the hip and crater atomfeed

Database (AWS RDS service: db.t3.xlarge)

RAM 16GB,
4 vCPU (2 core, 2.5 GHz Intel Scalable Processor)
100GB Secondary storage
MySQL, max_connections = 1304

Software

OpenMRS Tomcat - Server

Server version: Apache Tomcat/7.0.94
Server built:   Apr 10 2019 16:56:40 UTC
Server number:  7.0.94.0
OS Name:        Linux
OS Version:     5.4.204-113.362.amzn2.x86_64
Architecture:   amd64
JVM Version:    1.8.0_212-8u212-b01-1~deb9u1-b01
ThreadPool:     Max 200, Min 25 (Default server.xml)

OpenMRS - G1GC Heap

Initial Heap: 1024 MB
Max Heap: 4096 MB

-Dfile.encoding=UTF-8 -server -Xms1024m -Xmx4096m -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:G1NewSizePercent=25 -XX:G1MaxNewSizePercent=50 -XX:SurvivorRatio=8 -XX:MaxMetaspaceSize=512m -XX:ReservedCodeCacheSize=256m -XX:CompressedClassSpaceSize=256m -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./java.hprof -Xloggc:/usr/local/tomcat/logs/gc.log -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Report

Network: 60 Mbps
Duration: 24 hours
Ramp Up: 5 mins
Database pre-state: 189500 patient records

Report Link: Gatling Stats - Global Information

Report Observations:

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	14400	67	260	373	1176
	Existing Patient Search using ID → Start OPD Visit	30%	10800	26	198	309	812
	Existing Patient Search using Name → Start OPD Visit	20%	10800	233	531	677	1612
	Upload Patient Document	10%	3600	50	192	318	1203
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	14400	406	1122	1500	1500

🔰 Observations:

Removing the hip and crater atom feed pods and introducing the G1GC memory management has reduced the overall response time and also the CPU utilization .

🟣 Tests with G1GC JVM settings after HIP Atomfeed fix

📗 100 Concurrent Users - 24 Hours

Hardware

Performance environment was running on AWS EKS Custer with single node

Node (EC2: m5-xlarge)

RAM 16GB
4 vCPU
100GB Secondary storage
AWS LINUX x86_64

Total 20 application pods in cluster such as openmrs, bahmni-web, postgresql, abdm etc

Database (AWS RDS service: db.t3.xlarge)

RAM 16GB,
4 vCPU (2 core, 2.5 GHz Intel Scalable Processor)
100GB Secondary storage
MySQL, max_connections = 1304

Software

OpenMRS Tomcat - Server

Server version: Apache Tomcat/7.0.94
Server built:   Apr 10 2019 16:56:40 UTC
Server number:  7.0.94.0
OS Name:        Linux
OS Version:     5.4.204-113.362.amzn2.x86_64
Architecture:   amd64
JVM Version:    1.8.0_212-8u212-b01-1~deb9u1-b01
ThreadPool:     Max 200, Min 25 (Default server.xml)

OpenMRS - G1GC Heap

Initial Heap: 1024 MB
Max Heap: 4096 MB

-Dfile.encoding=UTF-8 -server -Xms1024m -Xmx4096m -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:G1NewSizePercent=25 -XX:G1MaxNewSizePercent=50 -XX:SurvivorRatio=8 -XX:MaxMetaspaceSize=512m -XX:ReservedCodeCacheSize=256m -XX:CompressedClassSpaceSize=256m -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./java.hprof -Xloggc:/usr/local/tomcat/logs/gc.log -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Report

Network: 60 Mbps
Duration: 24 hours
Ramp Up: 5 mins
Database pre-state: 223722 patient records

Report Link: Gatling Stats - Global Information

Report Observations:

Needs to be ANALYSED

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)

Simulations	Scenario	Load share	Patient Count	Min Time (ms)	95th Percentile (ms)	99th Percentile (ms)	Max Time (ms)
Frontdesk 50% Traffic	New Patient Registration → Start OPD Visit	40%	14400	121	256	364	1156
	Existing Patient Search using ID → Start OPD Visit	30%	10800	67	170	276	846
	Existing Patient Search using Name → Start OPD Visit	20%	10800	287	628	821	1734
	Upload Patient Document	10%	3600	88	224	353	686
Doctor 50% Traffic	Doctor Consultation 8 Observations 2 Lab Orders 3 Medication	100%	14400	590	1497	1862	2855