OCR in Bahmni
Objective | Provide OCR abilities in Bahmni |
Due date |
|
Key outcomes | Phase1. OutcomeA: Ability to scan Covid RT-PCR test results into Bahmni via OCR. |
Status | in-Analysis Prototype |
Collaborators | KCDH/IIT + Thoughtworks |
Slack | #bahmni-ocr |
Code Repo | |
Issue List |
Problem Statement
Provide OCR abilities in Bahmni
Bahmni enables the users to upload scanned patient documents (e.g lab reports, prescriptions, etc) and attach them to the patients' dashboard. Though the documents are always available for the doctors to view, the fact that the documents can be viewed only as an attachment has its disadvantages. It is desirable that the data from the documents be extracted and converted through OCR and recorded as observation/indicators against the patient medical history. This will make it easier for Bahmni to chart data and for these to be used in reports/analytics (since these are no longer just textual data).
UseCase1: Lab Reports (printed)
Many hospitals where Bahmni is installed have outsourced labs. Reports received from the third-party labs are generally uploaded as patient documents.
An on-demand OCR functionality can be provided in the document upload area
If the external lab is returning reports through emails, an email address can be specified and OCR may be configured to read the images, pull out the data and log it to the patient’s dashboard automatically.
For phase1, the team is focussing on Covid Reports OCR, so that we can perform end-to-end feature development for scanning and reporting Covid RT-PCR test results in Bahmni. Once Covid reports are done, then we will select the next “set” of Lab tests to recognise, for instance CBC panel.
Other OCR Use-cases for Future
Lab Reports (hand written)
Prescriptions (printed / hand-written)
Consultation Notes (printed / hand-written)
Discharge Summary (printed / hand-written)
Payment Receipts / Insurance Claim Documents
Current POC Status (OCR for Covid Lab Reports)
Code: https://github.com/document-analysis-tools/ocr-ner-extractor
Mark regions to extract from Lab Reports. (Using Opensource Label Studio)
From the regions, extract the text (OCR of printed text) using Tesseract models.
Use NLP libraries like MedCat and Spacy for extraction of "meaning" from text (like identifying patient ID, name or clinical term).
Receive a JSON representation of original Lab report, with appropriate data elements extracted and identified.
Video recording of the PoC showcase: Bahmni PAT Call on 20-Apr-2022 (Wednesday)
Reference materials
Digital Scanned Documents for Bahmni - Initial Proposal by KCDH: (Presentation Link)
IIT/KCDH: https://rnd.iitb.ac.in/research-glimpse/adaptive-framework-end-end-corrections-indic-ocr
Sample Lab Reports
The Bahmni documentation is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)