OCR in Bahmni

Objective

Provide OCR abilities in Bahmni

Due date

 

Key outcomes

Phase1. OutcomeA: Ability to scan Covid RT-PCR test results into Bahmni via OCR.

Status

in-Analysis Prototype

Collaborators

KCDH/IIT + Thoughtworks

Slack

#bahmni-ocr

Code Repo

https://github.com/venkatapathy/ocr-editor

Issue List

https://github.com/venkatapathy/ocr-editor/issues

Problem Statement

Provide OCR abilities in Bahmni

Bahmni enables the users to upload scanned patient documents (e.g lab reports, prescriptions, etc) and attach them to the patients' dashboard. Though the documents are always available for the doctors to view, the fact that the documents can be viewed only as an attachment has its disadvantages. It is desirable that the data from the documents be extracted and converted through OCR and recorded as observation/indicators against the patient medical history. This will make it easier for Bahmni to chart data and for these to be used in reports/analytics (since these are no longer just textual data).

UseCase1: Lab Reports (printed)

Many hospitals where Bahmni is installed have outsourced labs. Reports received from the third-party labs are generally uploaded as patient documents.

  1. An on-demand OCR functionality can be provided in the document upload area

  2. If the external lab is returning reports through emails, an email address can be specified and OCR may be configured to read the images, pull out the data and log it to the patient’s dashboard automatically.

For phase1, the team is focussing on Covid Reports OCR, so that we can perform end-to-end feature development for scanning and reporting Covid RT-PCR test results in Bahmni. Once Covid reports are done, then we will select the next “set” of Lab tests to recognise, for instance CBC panel.

Other OCR Use-cases for Future

  1. Lab Reports (hand written)

  2. Prescriptions (printed / hand-written)

  3. Consultation Notes (printed / hand-written)

  4. Discharge Summary (printed / hand-written)

  5. Payment Receipts / Insurance Claim Documents

Current POC Status (OCR for Covid Lab Reports)

Code: https://github.com/document-analysis-tools/ocr-ner-extractor

  1. Mark regions to extract from Lab Reports. (Using Opensource Label Studio)

  2. From the regions, extract the text (OCR of printed text) using Tesseract models.

  3. Use NLP libraries like MedCat and Spacy for extraction of "meaning" from text (like identifying patient ID, name or clinical term).

  4. Receive a JSON representation of original Lab report, with appropriate data elements extracted and identified.

Reference materials

 

Sample Covid Report