Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objective

Provide OCR abilities in Bahmni

Due date

Key outcomes

Phase1. OutcomeA: Ability to scan Covid RT-PCR test results into Bahmni via OCR.

Status

Status
colourYellow
titlein-Analysis
Status
colourYellow
titlePrototype

Collaborators

KCDH/IIT + Thoughtworks

Slack

#bahmni-ocr

Code Repo

https://github.com/venkatapathy/ocr-editor

Issue List

https://github.com/venkatapathy/ocr-editor/issues

Table of Contents
minLevel1
maxLevel7
outlinetrue

Problem Statement

Provide OCR abilities in Bahmni

...

  1. Lab Reports (hand written)

  2. Prescriptions (printed / hand-written)

  3. Consultation Notes (printed / hand-written)

  4. Discharge Summary (printed / hand-written)

  5. Payment Receipts / Insurance Claim Documents

Current POC Status (OCR for Covid Lab Reports)

Code: https://github.com/document-analysis-tools/ocr-ner-extractor

  1. Mark regions to extract from Lab Reports. (Using Opensource Label Studio)

  2. From the regions, extract the text (OCR of printed text) using Tesseract models.

  3. Use NLP libraries like MedCat and Spacy for extraction of "meaning" from text (like identifying patient ID, name or clinical term).

  4. Receive a JSON representation of original Lab report, with appropriate data elements extracted and identified.

Reference materials

View file
namesample_covid_report.pdf
View file
namePDFReportServlet - 139900.pdf
View file
nameSample Report.pdf
View file
nameReportPrint.pdf

Image RemovedImage Added