Which do you prefer?

noggin not available yet!

Come to a Technical Meeting and not only natter but get your Jowett going better.
Jowett Technical Weekend
Keith Andrews
Posts: 941
Joined: Wed Jul 26, 2006 8:11 am
Location: New Zealand
Contact:

Post by Keith Andrews »

Raven Search capabilities.
Seem to be the same as available on JowettTalk. We want the search to also look through all attachments in the JowettGallery including pdf text image files.
Gallery /albums is designed for hosting photo albums not so much technical information....
PHpBB2 with attachment mod is far more suited to this and discussion on the subject....this is how most sites operate effectivily...you are trying to use a format thats not desidned for as used here.

I also not with nuke and phpbb2 and gallery in the one portal they all work off the same membership data base...no need to regist several times for each function of the web site.
Access control
Need to be able to control what editors can do at various levels. Exact requirements yet to be fully defined. But at a very granular level users should only be able to edit their own data, or data to which they have been given explicit permission by the owner. Groups of users should also be able to be defined and added to access control lists. Access control lists should be able to be defined by department admins as well.
Access control also needs to flow from Forum to Gallery and to other applications such as register update, stock control.
All this is built into nuke/gallery/phpbb2 in the administration CP
One can create groups, define who can edit, post, upload and where, can have hidden sections for Admins, Mods, members, different groups...
One can setup sections only veiwable to members or adminstration....even these only appear to the approiate visitors, members like the scrolling block on the index page of kakariki.net.
Edit capabilities
Now this is where requirements really start to get complex.
I would like auto spell check at least in English, if not other languages.
yes there are mods that do spell check, transaltions of pages AND the whole site...kakariki.net also has a txt translator at the bottom of each forum page.
Unlike jowetts kakariki.net has a large proportion of active non english speeking members...Spainish, Potugese, German, Swedish, Hebrew,French, Dutch. Lativian, Icelandic, Romain, japanese...and of coarse american.
Hiding details

It is not good if the out of the box solution requires mods to prevent showing any member's details. I hope the latest release has all the mods that you put in, now as default settings.
I cant rem if out of the box default setting does or doesnt show members details, If it doesnt it is just a setting in the Admin CP that changes this.

Bottom line Nuke allows to set alsorts of settings...it canbe sort of compared to active directory and Isa on a domain controller server.

Also to note I use Hmailer and squirell for email and web mail..which I have customised and intregated...doesnt take much...to match both kakariki.net and jowettnz.net.
My Spelling is Not Incorrect...It's 'Creative'
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

We are getting there. I just need some help deciding on a few things.
Optical Character Recognition
We need the OCR recognition tools for the document scans. OCR is notoriously difficult and we will need something that deals with the club magazine formats. It needs to separate out columns and pictures and deliver in either Word, pdf or html.

It should be possible to have a server version for later correction of recognised text, or a desktop version that allows immediate correction. Spell checking and formatting would be good additional functions. Do any recognise different fonts and sizes of fonts?

We need to test a few different products and come up with a recommendation.

Picture Standards and processing software
There have been a few discussions over the years on the best format to store pictures. jpg wins the day for photos, but text, line and B/W need thought.

The current Gallery processing tools are abysmal. I have not updated since they were what was available from the hosting company. Now that the site is on the home server we could try something better. Any ideas?

Though last night I overloaded the memory on the site processing 1.5MB files from Patrick Di Marco. So I will need to change the server hardware.

Upload
The current Gallery has many upload services that do not work (and probably are not needed).
The upload capability on JowettTalk is a bit clunky. Any ideas on what is available that is easy to use, can upload multiple files, and gives progress reports.
To save on server overload, not having to do file conversion would be preferred, but then we would need pre-download checking on file size, type and any other attributes we deem required.

Reorganisation of items
Once items are uploaded to the server they may need moving. Somehow we need to maintain 'referential integrity' such that, if they are moved, anything that referenced them still finds them.
Alastair Gregg
websitedesign
Posts: 757
Joined: Sun Aug 27, 2006 10:43 pm
Your interest in the forum: E2 SA 922 HKY 770
D7 CB 6079 CVG 166
E2 PD 22113 MVU 377
Given Name: Alastair
Location: Corrie, Isle of Arran.
Contact:

OCR et al

Post by Alastair Gregg »

I have Acrobat 9.3 Pro extended. It has OCR which I like and use. You may want to generate a standard document to test the different types of software. Let me have your test piece. I will print it and then scan it back in and you can grade the results.

JPG definitely. You need to define how much compression you are prepared to allow. For simplification can we not use JPG for line and B/W.

I suspect Word would be good for text even if you went for Wordpad.

The upload function on the gallery is similar to Wordpress and Ebay. I don't know of anything else similar to compare it with.

You are on your own with referential integrity. Never had to do it.
Compliments of the Season,

Alastair Gregg
Keith Andrews
Posts: 941
Joined: Wed Jul 26, 2006 8:11 am
Location: New Zealand
Contact:

Post by Keith Andrews »

Optical Character Recognition
.......It needs to separate out columns and pictures and deliver in either Word, pdf or html.
Why not just take the orginal electonic form and up load that or convert to pdf...the main issue is editing down graphics when it is orginally made
This can be done in office 2007.

Are you going nuke?
If so that is what the built in articules mod is for.

Picture Standards and processing software
....but text, line and B/W need thought.
That is what the grey scale option is for on the scanner.
Upload
The current Gallery has many upload services that do not work (and probably are not needed).
The upload capability on JowettTalk is a bit clunky. Any ideas on what is available that is easy to use, can upload multiple files, and gives progress reports.
To save on server overload, not having to do file conversion would be preferred, but then we would need pre-download checking on file size, type and any other attributes we deem required
.

Both gallery and forum attach mod have min max setting for pics both for storage/ thumbs and uploads.

What are using for server pic editing software?
Reorganisation of items
Once items are uploaded to the server they may need moving. Somehow we need to maintain 'referential integrity' such that, if they are moved, anything that referenced them still finds them.
Now that is a time consuming mission.....I did come across a open source web site link check ages ago, worked very well....


I get the impression you want the server to automatically do everything....As far as I know there is no portal..well at least a open source one, and doubt even a very expensive one, that can remove all responsabi;y from the member.

A few notes on server spell checkers
I tried them, scraped them.
1/ ppl need to post get in/out and on their way..user frirndly...all the sites I have been a member of that have spell checker set to auto get complaints from members and either remove them and suggest they use iespell, or set it as a opition only...the latter rarely gets used.
2/ we have members who are dislexec, and foreigners....yes the english canbe a bit "creative" any reasonable member reconises and accepts their problems...even to the extent when they appolgise for bad english in their post saying "doing worry about it, we can understand what you are asking"
In my books any member making fun of these people get a warning ..have never banned anyone for this yet
I have had 'stick' over the years at sites..hence my sig...and they get a very blunt reply...I may not be able to spell but thats doesnt mean my debating skills lack.
My Spelling is Not Incorrect...It's 'Creative'
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

Image

This shows the use of Office 2007 OneNote to OCR.
The pdf from jowett.net parts is displayed on the left.
You need to select the area you want to OCR. In this case a column. If you do the whole lot at once it gets confused. In this case I did two columns separately.

The area is copied (Ctrl+C) and pasted into OneNote (Ctrl+V) Then right click on the area in One Note and select (Copy Text from Picture). Then (Ctrl+V) into where you want it. In this case an Excel column.
Then you would need to reformat and correct.
This test made two mistakes: it misread Nut as 14ut and concatenated the next to last lines together.
Last edited by Forumadmin on Tue Feb 02, 2010 10:41 pm, edited 5 times in total.
Leo Bolter
Posts: 367
Joined: Sat Feb 10, 2007 10:32 am
Your interest in the forum: Proud owner of:
1 x 1951 Jowett Jupiter
1 x 1952 LE Velocette
1 x 1952 Jowett Bradford
2 x 1982 Princess 2 litre
Location: R. D. 2, Palmerston North, 4472, New Zealand.
Contact:

Optical Character Recognition programs

Post by Leo Bolter »

In regard to Optical Character Recognition programs.

When employed at Massey University I used ABBY FineReader v8 Professional and found it to be the best of the bunch of the several OCR programs we evaluated for use on Windows machines. It handled all we could throw at it and produced exact replicas of the original hard copies in most cases . . small print, images, columns of text, tables etc. were excellent . . . even old and blurry print Gestetner pages with sketches were handled well in most cases. It checked spelling too. I suggest you have a look at the latest at http://www.abbyy.com/

By the way, when I retired and moved from a machine running XP at home to using a iMac I was bitterly disappointed to find that there was no Mac OCR apps that held a candle to ABBYY . . . and that there was no ABBYY made for a Mac! On revisiting their site just a few moments ago I was delighted to find the's a Mac version available now. :) I'll be looking into that, I can assure you! A good OCR is a terrific asset and a excellent one is even more so!
R. Leo Bolter,
Palmerston North,
New Zealand.

JCC of NZ - Member No 0741.
JOAC - Member No 0161

Car: Jupiter (E1-SA-513-R)

Skype name = jupiter1951
Messenger name = r.l.bolter"at"massey.ac.nz
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

Why not just take the orginal electonic form and up load that or convert to pdf...the main issue is editing down graphics when it is orginally made
I have Jowetteers going back to 1964. The early ones were done on wax stencils (some of which I still have). Note this was before Zerox. So, as Leo says, we need a good OCR. I have looked at Abby and it gets good write ups.
Before we all fork out on getting a copy, I want to stretch Adobe and Office to the limit so everyone can help take the info out of these old magazines and put in searchable and reformattable format in our archive.

Of course, if the current editors of the club mags can direct their electronic copy to us to reformat for the web then great.

Why not each try some old Jowetteers and see if you can create a Word doc or pdf doc out of them. See how many mistakes it makes in OCR and how difficult it is to retain the layout and formatting. Probably the hardest for OCR to do is a table so try http://jowett.org/jowettnet/dt/parts/1952book/13.pdf

Regarding English and grammar.

It helps if we write so that we reach as many people as possible. This means both native English speakers and those with other mother tongues.

Picture formats
jpg really is only good for colour photos. It is not good for coloured documents with small colour pallette or b/w line or text. There are too many artifacts. GIF and TIFF are then more suitable. png is also favoured by some. If you need to manipulate the image then you need a lossless format. Anyway the proof is in the eating. You need to choose a format that is most suitable for the document, to retain quality with the best compression. As far as Video is concerned there are too many to choose from. So a pin might be the best bet.

Referential Integrity
Oh yes you have had to do it. When someone borrows your tools and puts them back in the wrong drawer!
Leo Bolter
Posts: 367
Joined: Sat Feb 10, 2007 10:32 am
Your interest in the forum: Proud owner of:
1 x 1951 Jowett Jupiter
1 x 1952 LE Velocette
1 x 1952 Jowett Bradford
2 x 1982 Princess 2 litre
Location: R. D. 2, Palmerston North, 4472, New Zealand.
Contact:

Post by Leo Bolter »

By the way, but not exactly relevant to the topic.

I have ALL the "Flat Four" (JCC of NZ magazine) from the very first (Volume 1 Number 1 - November1962). Early ones were printed on Foolscap paper* and by using wax stencils on a Gestener. Because the paper that had to be used was "almost blotting paper", the ink was inclined to blur the text with the result that the page was often quite hard to read by human eyes let alone by the machines we have available today.

* Here's a wee problem . . . all my scanners won't accommodate Foolscap paper. That probably means that in reality the really old stuff on "odd" paper sizes (larger than A4) will have to digitised in the first instance using a camera set up on a tripod, then OCRd . . Hmmmmm! :(
R. Leo Bolter,
Palmerston North,
New Zealand.

JCC of NZ - Member No 0741.
JOAC - Member No 0161

Car: Jupiter (E1-SA-513-R)

Skype name = jupiter1951
Messenger name = r.l.bolter"at"massey.ac.nz
Alastair Gregg
websitedesign
Posts: 757
Joined: Sun Aug 27, 2006 10:43 pm
Your interest in the forum: E2 SA 922 HKY 770
D7 CB 6079 CVG 166
E2 PD 22113 MVU 377
Given Name: Alastair
Location: Corrie, Isle of Arran.
Contact:

Acrobat OCR

Post by Alastair Gregg »

I took the suggested page converted it to a bitmap then back to a PDF did the OCR and this link will take you to the result www.thegreggsplace.co.uk/Tyrone/GearchangeOCR.pdf
I have had some problems accessing this. It worked first time!

Text is good and selectable but it may be my lack of skill but I can't make it copy the table as a table, either into Word or anything else for that matter. I will go and see what I can do with higher resolutions.

I feel sure Alan would oblige with output from new magazines, after our presentation to the exec, but I don't know how far back he keeps the electronic data.

I asked Alan some years back and also Steve Waldebburg. I got one electronic copy but that is all.

Delighted to see Abby does not cost a Kings ransom. :D

JCCA are currently shipped as PDF :D
Last edited by Alastair Gregg on Tue Feb 02, 2010 10:55 pm, edited 2 times in total.
Compliments of the Season,

Alastair Gregg
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

Alistair,
See annotation on your post above.
Leo,
My scanner does foolscap so I will send a ship to collect them!
Unfortunately my scanner driver does not work with Windows 7 so a I have to attach to the Windows XP Virtual Machine to get the stuff into Office!

Here is the result of Google doing the OCR on a page in jowettnet. Looks good.
Image
Alastair Gregg
websitedesign
Posts: 757
Joined: Sun Aug 27, 2006 10:43 pm
Your interest in the forum: E2 SA 922 HKY 770
D7 CB 6079 CVG 166
E2 PD 22113 MVU 377
Given Name: Alastair
Location: Corrie, Isle of Arran.
Contact:

Image?

Post by Alastair Gregg »

I'm sure your annotation said something about an image!! I 'm further down the road to madness than I had first thought.

I have been monkeying around with the PDF so may have caused your problems. I have stopped now so please try again :D It works every time for me.

This is an example in a Word document I had to manually create the table though. www.thegreggsplace.co.uk/Tyrone/GearboxtableLHS.doc


When I spoke to Alan he was happy provided he did not need to learn new things. His plate will be full with the Jowetteer. With the Exec hopefully buying into this I'm sure somehow we can get the electronic copy.
Compliments of the Season,

Alastair Gregg
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

You are not going mad as I changed the annotation.

What we need is a quantitative test. Your image to word conversion had several errors. We need to compare different products using the same image. Is there a test standard about with different fonts and different layouts that might trip up OCR.

What is the oldest Jowetteer you have?

The Adobe formatting was brilliant. Looks like it might have the job!

By the way what I just showed, by editing your post, was that within JowettTalk we can have collaborators working on the same document. I just need to set up groups and give them permissions.
Last edited by Forumadmin on Wed Feb 03, 2010 12:25 am, edited 1 time in total.
Alastair Gregg
websitedesign
Posts: 757
Joined: Sun Aug 27, 2006 10:43 pm
Your interest in the forum: E2 SA 922 HKY 770
D7 CB 6079 CVG 166
E2 PD 22113 MVU 377
Given Name: Alastair
Location: Corrie, Isle of Arran.
Contact:

Quantative test

Post by Alastair Gregg »

After coming in from howling at the moon, its good to hear I'm not going mad. :)

I don't know of such a test but have we not got a likely candidate in the page you and I have had a dabble with. We can quite easily check to see how many errors there are, or are you thinking of something much harder. If so, I don't know enough to suggest anything.
Compliments of the Season,

Alastair Gregg
Leo Bolter
Posts: 367
Joined: Sat Feb 10, 2007 10:32 am
Your interest in the forum: Proud owner of:
1 x 1951 Jowett Jupiter
1 x 1952 LE Velocette
1 x 1952 Jowett Bradford
2 x 1982 Princess 2 litre
Location: R. D. 2, Palmerston North, 4472, New Zealand.
Contact:

Post by Leo Bolter »

Gentlemen.

For OCR test purposes you could use the method I used to compare programs* . . . make your own "worst case scenario" standard test piece!

Take a A4 sized section of newsprint, with both a colour and monochrome picture, with captions, columns of text and, as it's unlikely to happen to have a table, actually paste (sellotape?) a table into one corner. Now there's a challenge for any OCR Application . . . sorting out the images to be shown as images, the columns and the font size as text and the tables with all the info. in the appropriate cells . . and all this on grotty newsprint!

If you consider that you need to have a "standard test" to do on glossy paper with hi-res coloured images, you may be able to use this testing as a excuse to sneak down to the local newsagent and pick up a copy of Playboy! . . . Please send me any results of OCRing, so I can check the results! :D :shock:

* and photocopiers too, when I was evaluating them on behalf of the University Purchasing Officer.

Leo
R. Leo Bolter,
Palmerston North,
New Zealand.

JCC of NZ - Member No 0741.
JOAC - Member No 0161

Car: Jupiter (E1-SA-513-R)

Skype name = jupiter1951
Messenger name = r.l.bolter"at"massey.ac.nz
Forumadmin
Site Admin
Posts: 20389
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum
Contact:

Post by Forumadmin »

What, your test for photocopiers was Playboy, or was the secretary sitting on it like the Twigglets? ad?
Post Reply

Who is online

Users browsing this forum: No registered users and 16 guests