Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, February 20, 2011

Genealogist's View -- Speech Recognition Revisited

As I mentioned in my last post on the subject of speech recognition, I was motivated by Anne Roach at RootsTech to return to the issue of speech recognition. Since my last go around, there has been considerable increase in computer speed and storage capabilities. Hoping, that these improvements would also show an improvement in speech recognition, I decided to try it again.

I must say that the results so far are mixed. It does appear that the program recognizes the vast majority of my words and speech patterns without any difficulty. Obviously, the increase in computer speed has a direct effect on the ability of the programs] to accurately represent what is spoken. But I will have to admit that the program is still cranky and makes some of the same errors that it made previously. The real issue is whether or not speech recognition is more efficient than using the keyboard. In the case of genealogy, there is obviously an issue as to whether or not the programs can recognize names efficiently.

Another issue is the question of whether or not having to stop and recalibrate program periodically actually saves anytime. For the record, I am using DragonDictate, the Macintosh program that utilizes the Dragon NaturallySpeaking software. I had previously used Dragon NaturalySpeaking on a PC. During the last few days, I have consistently tried to use the program in a variety of circumstances in order to determine whether or not it is an effective way of speeding up my text entering. So far, I have used the program to dictate my posts using Blogger, e-mail using Thunderbird, the Mozilla software program, OpenOffice and several other programs. I am presently still undecided as to whether or not there is any actual gain in productivity.

For example, here is a list of ten names as copied from my genealogy:

Samuel Shepherd
Susanna Dexter
James Newton
Maria
Ann Kadwale
Mary Mitchell
William Tarbutt
Andreas Jensen
Jens Jorgensen
Niels Pedersen

 I keep these names into the post. Following, is the same list of names read into the post by the program:

Samuel Sheppard
Susanna Dexter
James Newton
Maria
and Well
Mary Mitchell
William carpet
Andreas Jensen
Jens Jorgensen
Niels Patterson

I guess you could say that the results were either pretty good, or pretty bad depending on how much correction work you want to do. Only six of the names were transcribed correctly. The main problem that I see is the variation of Shepherd and Sheppard. These variations in spelling may be entirely confusing if they were not caught during proof reading. It is also apparent that the more unusual name Ann Kadwale  may be too difficult for the program. I can hardly see myself spending hours teaching the program the thousands of names in my database.

Here is another example of the difficulties faced using speech recognition. The following is a selection of text copied and pasted from photo.net explaining Digital Camera Basics:
Digital cameras are confusing to a lot of new users. In this basic guide to digital camera technology we hope to try to give digital beginners at least some basis to use in deciding which digital camera is appropriate for them. When shopping for a digital camera it's at least good to know what the basic terms like white balance, pixel, ppi and dpi mean and how they affect image and print quality. It's also important to know the difference between things like optical zoom and digital zoom as well as the advantages and disadvantages between storage formats such as Compact Flash (CF), Microdrives, Sony Memory Stick, Secure Digital (SD), Multimedia and camera interface technologies such as USB 1.1, USB 2.0 and Firewire IEEE 1394.
 Here is the same paragraph dictated using speech recognition software:
 digital cameras are confusing to a lot of new users. In this basic guide to digital camera technology we hope to try to give digital beginners at least some basis to use in deciding which digital camera is appropriate for them. When shopping for a digital camera it's at least good to know what the basic terms like white balance. Pixel, PPI and DPI mean and how they affect image and print quality. It is also important to know the difference between things like optical zoom and digital zoom as well as the advantages and disadvantages between storage formats such as CompactFlash (C F), Micro drives, Sony Memory Stick, Secure Digital (It S D close parens, Multimedia and camera interface technologies such as U S B1 .1, U S B2 .0 and FireWire India E E.co 1394.
That is exactly how it came out without any editing. Again, the text is either pretty good or pretty bad depending on how much editing you want to do. I recognize that if I were more familiar with the commands, many of the "errors" could have been corrected in the dictation. If I were to key in the paragraph, I would probably proof my typing as I went along. There is no way I could have typed the paragraph as fast as I could read it but any gain in speed would seem to be lost in time spent correcting the errors.

So, the question arises as to whether or not a combination of speech recognition and typing is appropriate? Unfortunately, the speech recognition program does not recognize keyed in text as part of the text in its memory. So you could not go back and use speech recognition commands to correct the text if you had keyed part of the text and done another part through dictation.

I am not giving up yet, but I suspect that I will reach some of the same conclusions that I reached previously, that speech recognition may save some time entering data but that the time savings may be lost in editing. Stay tuned as I continue to use the program, learn more of what the commands can do and do additional training of the program.

1 comment:

  1. This confirms what I have seen in my use of speech recognition software. There's been possibly a bit of an increase in accuracy, but it's not perfect. I want this to work, but the fact remains, speech recognition is HARD. Even text recognition is difficult, as OCR software has some of the same issues, especially with less than perfect scans to read from.

    ReplyDelete