Creating robust deployment
Scenarios
- Database Ids may overflow over time. It may have developed unnecessary gaps in Ids.
- The system leaks memory slowly and has to be restarted
- The system becomes slower over time and has to be restarted
- The system becomes slower with increase in database size
- Individual services runs out of threads
- Individual services runs out of database connection allocated to them
- The system is not reporting the errors
- The system is using more disk space than it is should
- On machine restart the services do not come back up
- On service restart it doesn't become available for service
- Active to Passive data replication is not happening
- Automatic switch back from active to passive doesn't work
- Automatic switch back from passive to active doesn't work
- Database connection becomes stale if not used for some time
- Failed events do not resolve on their own (including scenarios of restart of services)
- A very large file is uploaded causing server to become less responsive
- Redirect loop causes denial of service (low priority)
- Inefficient report slows down or hampers the production operations
Manual Testing
Automated Testing
Automated Testing
Automation
The above scenarios need to be tested manually or in an automated fashion.
Environment
Functional Tests
Setup an environment with enough disk space, cpu and memory. Ensure that all the basic scenarios are covered by the functional tests. Run these tests continuously for days.
Environment Configuration
The system should be configured such that after it gives warning it continues working for sometime. For example, if OpenMRS runs out of number of threads
- Connection Pool Size (min, max, increment)
- Each service should ideally use only one connection pool
- min=5, increment=1
- max=depending on the size of the deployment
- Thread Pool Size
- The maximum size of the thread pool should not be very high (e.g. at JSS a thread pool size of 100 for OpenMRS, 20 for OpenELIS, should be enough)
- Failed events size
- This size should be 10, so that the problem gets reported immediately
- While fixing the issue in production one may temporarily increase the size of this
- Database
- Maximum number of connections (this includes the number of connections used for adhoc usage too. so keep this number slightly higher than the connection pool size given to the application)
Connection Pool Size
Sub-System | Service | Pool Name | Min | Max | Increment |
---|---|---|---|---|---|
OpenMRS | Application | default | |||
OpenMRS | Dynamic Reports | default | |||
OpenELIS | |||||
OpenERP | |||||
OpenERP | Atom Feed Service | ||||
Jasper Reports |
Thread Pool Size
Sub-System | Service | Pool Name | Min | Max | Increment |
---|---|---|---|---|---|
OpenMRS | Application | default | |||
OpenELIS | Application | ||||
OpenERP | Application | ||||
OpenERP | Atom Feed Service | ||||
Jasper Reports |
Database
Server | Database | Max Connections |
---|---|---|
MySQL | OpenMRS | |
PostgreSQL | OpenELIS (clinlims) | |
PostgreSQL | OpenERP | |
MySQL | Jasper |
Tomcat
Monitoring
- Icinga
- How to notify when something goes wrong in production environment
- Test whether monitoring is working or not
Troubleshooting
- It should be straightforward to get the runtime system parameters without bringing down the system.
- Hospital's system administrator should be able to issue the command to extract these parameters from the running system.
The Bahmni documentation is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)