View Source

The need for speech assistant

Capturing observations and data in Bahmni has been done primarily with the help of a keyboard and mouse. While this has worked until now, we wanted to explore faster methods for capturing patient data, thus providing increased consultation time with the patient (and reduce time on computer screens). In the long run, the speech assistant could also be enhanced in other areas of Bahmni like providing faster navigation, quick view of dashboards, etc.

The solution

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.18.02 PM.png

After quick rounds of brainstorming activities, we converged on the idea to try out speech assistant for consultation notes and use it for initial user testing and general feedback. Some notable decisions were:

The button to initiate consultation box with speech assistant would be kept outside (i.e, on the patient dashboard) , so that the doctor could have a glance at the entire medical history of the patient in the patient dashboard and also capture consultation notes.
The consultation box would be a floating box and the doctor would be able to drag and move the box around on the screen according to the doctors convenience
The doctor can use the consultation box even from the patient dashboard or inside the consultation
Even when the doctor shifts tabs inside consultation session (example, medication, orders, etc) the floating box would remain as it is
The doctor can simultaneously record medications/diagnosis , etc inside consultation using keyboard and also record the notes at the same time.
The consultation box with speech assistant would be developed in a manner which ensures that it is decoupled from Bahmni and can be used by any other OpenMRS distro as a separate plugin.

Workflows for speech assistant

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.36.21 PM.png

Initiating the speech assistant

The button to initiate speech assistant would be first found on the patient dashboard on the bottom right corner as seen in the screenshot below

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.41.34 PM.png

The button would remain even if the doctor is on any other tab inside the consultation session of the patient

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.43.41 PM.png

2. Recording in the consultation box

Once the doctor clicks on the button, the consultation box with the speech to text converter open as seen below

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.47.47 PM.png

The doctor can drag and move the consultation box .

Once the box is open, we see that that “save notes” button is disabled. This is because there are no notes inside the box. The doctor can click on “start recording” now.

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.51.12 PM.png

Once the doctor clicks on start recording, the doctor can start speaking to capture the notes. Also note, the save button is enabled only after the doctor clicks on stop button. Also, editing the notes while it is listening has been disabled.

Bahmni Wiki > Speech Assistant in Bahmni > Screenshot 2022-11-23 at 12.53.59 PM.png

In the screenshot above, we can see that the save notes has been enabled after the recording has been stopped.

Note : Doctor can also use keyboard to type in this box without using voice as the primary means

3. Saving the notes

Once the notes are saved, the doctor can verify in the following places at present:

→ Inside the visit summary

→ On the same consultation box

Screen Recording 2022-11-23 at 1.00.10 PM.mov

Barriers to adoption

While interacting with the doctors, we found the possible barriers that could lessen the adoption.

Doctors usually work in a noisy environment. Therefore, the speech to text should be capable enough to filter the ambient noises
Any technology or method that ensures speed in capturing information would likely be used by the doctors to reduce interaction time with the EMR and increase time for patient consultation. Along with speed, accuracy is also another critical factor to be considered. Right now, we are testing different models and settings to test the speed and accuracy
Doctors prefer writing on paper. Therefore, technologies like OCR, etc are likely substitutions

Technical overview

High Level Architecture Diagram:

Bahmni Wiki > Speech Assistant in Bahmni > image-20221202-055814.png

Details:

Speech Assistant is a micro-frontend application, where all ui code resides. The UI code is built in React using Carbon Design System.
Vakyansh is used as Speech to Text engine.
- speech-recognition-open-api-proxy converts a socket connection to GRPC call, so that a user can continuously audio stream the message from browser.
- speech-recognition-open-api receives the audio stream and return the converted text back.
The two Vakyansh applications are packaged in Speech Assistant Package, to be used with Bahmni docker

Speech Assistant is bundled in Bahmni apps and is available at https://speech.mybahmni.in/

Setting up Speech Assistant Feature with Bahmni

Setting up Vakyansh:
- Clone Speech Assistant Package
- Download models by running download_models.sh in the scripts folder.
- Run docker-compose up -d. It will bring two containers: vakyansh-api and vakyansh-proxy.
Setting up Speech Assistant Frontend:
- As Speech Assistant is a microfrontend, so to use it, corresponding bundle.js needs to be included in the script tag.
- bundle.js can be availed either from this url, which contains the latest bundle, or while building the app present in the repository.

Demo of Speech Assistant (Youtube Video):

https://www.youtube.com/watch?v=i2R_odYHAeA

Next Steps (and limitations)

Currently the language model been used by Vakyansh is normal English model. To understand medical words, model needs to be trained with the relevant vocabulary. Trained medical model then can be used by Vakyansh to get the proper text back.
Vakyansh api works better when deployed in a GPU machine. One instance of api can easily serve upto 10 concurrent audio connections. If it needs to be increased further, the api needs to be scaled.
If NLP library is applied to understand the meaning from the sentence, then the usecase could be extended to other consultation tasks like Medications, Symptoms, etc.