1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Speaker-independent speech recognition?

Discussion in 'Embedded' started by larwe, Jan 8, 2009.

  1. larwe

    larwe Guest

    I've got a very specific speech recognition application in mind, and
    I'm looking for a reference that will indicate if it's feasible. I
    want to recognize just one magic word, which is a very well-solved
    problem with high accuracy if we were talking about a boom mike and a
    silent environment. The difficulty is that there may be lots of other
    noises in the background, other people saying things, etc.

    The application is something like a telematics device where you get
    its attention by saying "Computer...", except that the word in
    question can be assumed to be a unique word nobody would ever use for
    any other purpose. However the specifics of this application are
    something along the lines of:

    - If the computer doesn't recognize that you want its attention, a
    ninja will beat you to death with a frozen muskrat, and
    - If the computer hears your dog barking and thinks it was you trying
    to get its attention, you'll be charged $1,000 for the CPU time.

    Is there an article someone can reference for me that will give some
    feel for the best I can expect from today's technology? Ideally, some
    information on the upper practical % limit to catching validly spoken
    words, and the lower practical limit to the number of false positives
    I'll see on other noises.

    I see a lot of information about % recognition accuracy on the vendor
    websites, but they refer mostly to noise-free environments and of
    course to large dictionaries.
     
    larwe, Jan 8, 2009
    #1
    1. Advertisements

  2. Maybe something here will help:
    http://marf.sourceforge.net/

    Good luck.
    Richard
     
    Richard Seriani, Jan 8, 2009
    #2
    1. Advertisements

  3. larwe

    larwe Guest

    Lots to read here, thanks for the pointer. Not sure this will directly
    give me the statistic I need, but I may be able to gather enough
    samples of my "magic word" being spoken in different conditions to run
    it through this s/w and generate some of my own stats.
     
    larwe, Jan 8, 2009
    #3
  4. Sounds a lot like "word spotting" of the old ( cold war ) days.
    Lots of unencrypted voice radio transmissions in russian that
    were recorded. People trained to listen in, identify in all
    the uninteresting routine conversations key words.
    Then that recording that possibly contained something
    of value was further listened to by people who would actually
    understand the language.
    There was funding in the 70ies/80ies to do that inital
    step cheaper by computer. Doubtfull if anything usefull
    came out of it.

    MfG JRD
     
    Rafael Deliano, Jan 8, 2009
    #4
  5. larwe

    larwe Guest

    It's not the same kind of application at all - really it's more like a
    voice-operated "clapper" switch than anything else - but the
    requirements are similar. The cost of a false negative or a false
    positive are both pretty high, though a false negative is much more
    costly.

    I think cheap DSP technology has come a long way in the past 20-30
    years :)
     
    larwe, Jan 8, 2009
    #5
  6. I'll take the muskrat Alex. Actually, it sounds like an old Firesign
    Theatre line.

    Scott
     
    Not Really Me, Jan 8, 2009
    #6
  7. larwe

    larwe Guest

    Never really got into Firesign Theatre. I prefer the Goon Show,
    Hancock's Half Hour, etc.

    Anyway, I was trying to demonstrate (flippantly) the real fact that
    both a false hit and a false miss have real costs in this application
    - a false miss is dangerous, a false hit is financially expensive.
     
    larwe, Jan 9, 2009
    #7
  8. Looks like Sphinx:

    http://cmusphinx.sourceforge.net/html/cmusphinx.php

    Funny enough, I heard of it first in the context of a speech/VR interface
    for Infocom 'adventure'-type games.
     
    przemek klosowski, Jan 9, 2009
    #8
  9. Op Thu, 08 Jan 2009 17:48:34 +0100 schreef larwe <>:
    I've got a very specific speech recognition application in mind, and
    I'm looking for a reference that will indicate if it's feasible. I
    want to recognize just one magic word, [...]

    - If the computer doesn't recognize that you want its attention, a
    ninja will beat you to death with a frozen muskrat, and
    - If the computer hears your dog barking and thinks it was you trying
    to get its attention, you'll be charged $1,000 for the CPU time.

    Sounds like it could be a military application: firing too late is
    dangerous and firing for no reason is costly. Or remote assistance:
    screaming "help" too late is dangerous and rescueing you with a helicopter
    for no reason is costly.

    [...]

    I see a lot of information about % recognition accuracy on the vendor
    websites, but they refer mostly to noise-free environments and of
    course to large dictionaries.

    I think accuracy will be a lot higher in the case of whistled languages
    like Silbo.
    http://en.wikipedia.org/wiki/Silbo_Gomero_language

    Can your users be trained to whistle?
     
    Boudewijn Dijkstra, Jan 9, 2009
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.