2

I want to extract the values of the below text

Pafient Name : Thomas Joseph MRNO : DQ026151?
Doctor : Haneef M An : 513! Gandar : Male
Admission Data : 19-Feb-2V'3‘¥T12:2'$ PM Bill No : IDOGIII.-H-17
Discharge Date : 22-Feb-20$? 1D:5‘F AM Bill Dale : E2-Feb-2017

extract only the values of the field names for example,

Thomas Joseph from the field name Pateint name, similarly for others field names and save the output to excel

Python code for the above

My attempt -

text = pt.image_to_string(img1)
print(text)
s = re.findall(r'\s:\s(\w+)', text)
print (s)
Dawny33
  • 8,226
  • 12
  • 47
  • 104
Shyama
  • 91
  • 1
  • 2
  • 8

2 Answers2

1

As @spacedman correctly mentioned, this will answered quicker at StackOverflow. But you can use this to create a dictionary like this. There might be a better way but this is a quick work around.

# -*- coding: utf-8 -*-
import re
st = '''Pafient Name : Thomas Joseph MRNO : DQ026151?
Doctor : Haneef M An : 513! Gandar : Male
Admission Data : 19-Feb-2V'3‘¥T12:2'$ PM Bill No : IDOGIII.-H-17
Discharge Date : 22-Feb-20$? 1D:5‘F AM Bill Dale : E2-Feb-2017'''
st = st.decode('utf-8').replace('\n','')+'<eof>'
words = ['Pafient Name','MRNO','Doctor','An','Gandar','Admission Data','PM Bill No','Discharge Date','Bill Dale','<eof>']
print {words[i]:st[st.index(words[i])+len(words[i]):st.index(words[i+1])].replace(':','').strip() for i in range(len(words)-1)}
Kiritee Gak
  • 1,789
  • 1
  • 10
  • 25
  • I tried to execute this.i got this error – Shyama Apr 10 '17 at 07:48
  • Well, invest some of your time onto it and post *that* error onto StackOverflow if you are unable to resolve it. – Kiritee Gak Apr 10 '17 at 07:58
  • File "", line 8 print (words[i]:st[st.index(words[i])+len(words[i]):st.index(words[i+1])].replace(':','').strip() ^ SyntaxError: invalid syntax – Shyama Apr 10 '17 at 08:56
  • Try it in python 2.7, you might be using 3.x. Either way remove the dictionary comprehension and write a normal loop and try to print the substrings and then debug and find the type of error. This should give u some clue. – Kiritee Gak Apr 10 '17 at 09:24
  • In which version of Python did u execute the above code.can u show me the output u have got. – Shyama Apr 10 '17 at 09:26
  • `{'Discharge Date': u'22-Feb-20$? 1D5\u2018F AM', 'PM Bill No': u'IDOGIII.-H-17', 'Doctor': u'Haneef M', u'Pa\ufb01ent Name': u'Thomas Joseph', 'An': u'513!', 'Bill Dale': u'E2-Feb-2017', 'Gandar': u'Male', 'MRNO': u'DQ026151?', 'Admission Data': u"19-Feb-2V'3\u2018\xa5T122'$"}` and 2.7.x version – Kiritee Gak Apr 10 '17 at 09:27
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/56853/discussion-between-shyama-and-kiritee). – Shyama Apr 10 '17 at 09:28
1

It may not be perfect but does the job almost.

import re
re.findall(r'(?<=: )\w{2}-\w{3}-\d{4}|(?<=: )\d{2}-\w{3}-\w{2}|(?<=: )\s?\w+\s?\w+\s?\w+',data)

#['Thomas Joseph MRNO','DQ026151','Haneef M An','513','Male','19-Feb-2V','IDOGIII','22-Feb-20','E2-Feb-2017']
Jil Jung Juk
  • 336
  • 2
  • 6