Hi Keith.
I've just had a look at the 'latest comments' section in the Gallery, and this is one of the items I tried to read.
Perhaps there's an easy explanation for some of this wording, although most of the rest was comprehensible.
OCR from One Note
| show full
/Sufrpleueenl to ‘ The .%foiôr Trodr,” July 27, 1949 Joweft Javelin 1948-9 Model PA Martnfoclurene: Jometi Cor&e, LId., idle, &odford,........
/head valve engine and unit consteLiction of body and chassis, ........
/Such modifications as have been in-trod Liced affecting service are listed here.....
/Car and engine numbers are tlissmo. production day of each year. 1)8 = 1948, 1)9=1949, and so on.
Then I clicked on a link and found the actual article I'd scanned.
Is there a need for the bit about the OCR scan?
Regards.
Tony.
Forumadmin has deleted a mile of space in here that was presumably pasted in from the Gallery comment.
OCRs
-
- Posts: 1727
- Joined: Thu Feb 09, 2006 5:33 pm
- Your interest in the forum: Early pre-wars. Owner of 1933 'Flying Fox' 'Sarah Jane, and 1934 Short saloon 'Mary Ellen'.
- Given Name: Anthony
- Location: Clayton le Moors, Lancashire, the Premier County in the British Isles!!
- Contact:
-
- Site Admin
- Posts: 20389
- Joined: Tue Feb 07, 2006 5:18 pm
- Your interest in the forum: Not a lot!
- Given Name: Forum
- Contact:
Re: OCRs
Do a search on JT for "OCR".
This post comes up along with others.
The point of putting raw (unedited) OCR into the Gallery is that it can be corrected by others. As you say the majority can be read if the scan was any good.
The choice of OCR engine also makes a difference. I have concentrated on Microsoft One Note as it is essentially free and available to those with a Windows machine.
To show how this editing can be done I copied the comment into Microsoft Word. I then made a first pass at formatting the text by deleting all the Carriage Return / Line Feeds to create proper paragraphs.
Some OCR engines might well do this automatically and there are some tools available that can do it as well. The Word text was then put back as a further comment.
The next step would be to view the image (in this case a pdf) and correct any spelling and other formatting issues.
The main point of this correction is to make the document more searchable, so people can find what they want. It is not to make it more readable, although, if the text is to be copied into another document or HTML page or JT posting, then additional correction would be required.
This post comes up along with others.
The point of putting raw (unedited) OCR into the Gallery is that it can be corrected by others. As you say the majority can be read if the scan was any good.
The choice of OCR engine also makes a difference. I have concentrated on Microsoft One Note as it is essentially free and available to those with a Windows machine.
To show how this editing can be done I copied the comment into Microsoft Word. I then made a first pass at formatting the text by deleting all the Carriage Return / Line Feeds to create proper paragraphs.
Some OCR engines might well do this automatically and there are some tools available that can do it as well. The Word text was then put back as a further comment.
The next step would be to view the image (in this case a pdf) and correct any spelling and other formatting issues.
The main point of this correction is to make the document more searchable, so people can find what they want. It is not to make it more readable, although, if the text is to be copied into another document or HTML page or JT posting, then additional correction would be required.
Who is online
Users browsing this forum: No registered users and 24 guests