OCRs

noggin not available yet!

Come to a Technical Meeting and not only natter but get your Jowett going better.
Jowett Technical Weekend
Post Reply
Tony Fearn
Posts: 1727
Joined: Thu Feb 09, 2006 5:33 pm
Your interest in the forum: Early pre-wars. Owner of 1933 'Flying Fox' 'Sarah Jane, and 1934 Short saloon 'Mary Ellen'.
Given Name: Anthony
Location: Clayton le Moors, Lancashire, the Premier County in the British Isles!!
Contact:

OCRs

Post by Tony Fearn »

Hi Keith.

I've just had a look at the 'latest comments' section in the Gallery, and this is one of the items I tried to read.
Perhaps there's an easy explanation for some of this wording, although most of the rest was comprehensible.

OCR from One Note
| show full
/Sufrpleueenl to ‘ The .%foiôr Trodr,” July 27, 1949 Joweft Javelin 1948-9 Model PA Martnfoclurene: Jometi Cor&e, LId., idle, &odford,........

/head valve engine and unit consteLiction of body and chassis, ........

/Such modifications as have been in-trod Liced affecting service are listed here.....

/Car and engine numbers are tlissmo. production day of each year. 1)8 = 1948, 1)9=1949, and so on.

Then I clicked on a link and found the actual article I'd scanned.

Is there a need for the bit about the OCR scan?

Regards.

Tony.

Forumadmin has deleted a mile of space in here that was presumably pasted in from the Gallery comment.
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Re: OCRs

Post by Forumadmin »

Do a search on JT for "OCR".

This post comes up along with others.
The point of putting raw (unedited) OCR into the Gallery is that it can be corrected by others. As you say the majority can be read if the scan was any good.
The choice of OCR engine also makes a difference. I have concentrated on Microsoft One Note as it is essentially free and available to those with a Windows machine.
To show how this editing can be done I copied the comment into Microsoft Word. I then made a first pass at formatting the text by deleting all the Carriage Return / Line Feeds to create proper paragraphs.
Some OCR engines might well do this automatically and there are some tools available that can do it as well. The Word text was then put back as a further comment.
The next step would be to view the image (in this case a pdf) and correct any spelling and other formatting issues.
The main point of this correction is to make the document more searchable, so people can find what they want. It is not to make it more readable, although, if the text is to be copied into another document or HTML page or JT posting, then additional correction would be required.
Post Reply

Who is online

Users browsing this forum: No registered users and 20 guests