OCR in Bahmni

Objective	Provide OCR abilities in Bahmni
Due date
Key outcomes	Phase1. OutcomeA: Ability to scan Covid RT-PCR test results into Bahmni via OCR.
Status	in-Analysis Prototype
Collaborators	KCDH/IIT + Thoughtworks
Slack	#bahmni-ocr
Code Repo	https://github.com/venkatapathy/ocr-editor
Issue List	https://github.com/venkatapathy/ocr-editor/issues

1 Problem Statement
2 UseCase1: Lab Reports (printed)
3 Other OCR Use-cases for Future
4 Current POC Status (OCR for Covid Lab Reports)
5 Reference materials

Problem Statement

Provide OCR abilities in Bahmni

Bahmni enables the users to upload scanned patient documents (e.g lab reports, prescriptions, etc) and attach them to the patients' dashboard. Though the documents are always available for the doctors to view, the fact that the documents can be viewed only as an attachment has its disadvantages. It is desirable that the data from the documents be extracted and converted through OCR and recorded as observation/indicators against the patient medical history. This will make it easier for Bahmni to chart data and for these to be used in reports/analytics (since these are no longer just textual data).

UseCase1: Lab Reports (printed)

Many hospitals where Bahmni is installed have outsourced labs. Reports received from the third-party labs are generally uploaded as patient documents.

An on-demand OCR functionality can be provided in the document upload area
If the external lab is returning reports through emails, an email address can be specified and OCR may be configured to read the images, pull out the data and log it to the patient’s dashboard automatically.

For phase1, the team is focussing on Covid Reports OCR, so that we can perform end-to-end feature development for scanning and reporting Covid RT-PCR test results in Bahmni. Once Covid reports are done, then we will select the next “set” of Lab tests to recognise, for instance CBC panel.

Other OCR Use-cases for Future

Lab Reports (hand written)
Prescriptions (printed / hand-written)
Consultation Notes (printed / hand-written)
Discharge Summary (printed / hand-written)
Payment Receipts / Insurance Claim Documents

Current POC Status (OCR for Covid Lab Reports)

Code: https://github.com/document-analysis-tools/ocr-ner-extractor

Mark regions to extract from Lab Reports. (Using Opensource Label Studio)
From the regions, extract the text (OCR of printed text) using Tesseract models.
Use NLP libraries like MedCat and Spacy for extraction of "meaning" from text (like identifying patient ID, name or clinical term).
Receive a JSON representation of original Lab report, with appropriate data elements extracted and identified.

Video recording of the PoC showcase: Bahmni PAT Call on 20-Apr-2022 (Wednesday)

Reference materials

Digital Scanned Documents for Bahmni - Initial Proposal by KCDH: (Presentation Link)
IIT/KCDH: https://rnd.iitb.ac.in/research-glimpse/adaptive-framework-end-end-corrections-indic-ocr
Sample Lab Reports

Sample Covid Report