A cache by jorwant Message this owner

Hidden : 8/13/2022

Difficulty:

Terrain:

Size: Size: micro (micro)

Join now to view geocache location details. It's free!

Watch

How Geocaching Works

Please note Use of geocaching.com services is subject to the terms and conditions in our disclaimer.

Geocache Description:

The cache is not at the listed coordinates, but is nearby. This is the fourth in a series of mystery caches devoted to AI. No programming is needed to solve this puzzle; in fact, if you do use programming this cache owner will be heartily impressed. Message the CO or mail AI.in.Cambridge@gmail.com if you need help.

This puzzle provides two independent paths to determine the cache coordinates. Choose whichever works for you.

Path 1, for the sound of hearing

For this cache I'm literally going to speak the coordinates out loud. The British male voice speaks the true coordinates, and the other three voices speak false coordinates.

Can you discriminate the four voices enough to tease out the British voice? If you can't, take solace in the fact that computers can't either, and proceed to Path 2.

Path 2, for the good at quizzing

Note: not all answers are single digits.

Speech recognition is hard, both for humans and computers. In some situations computers can hear better than people. But when it comes to speech, today's computers fall short, especially when the speech is fast. English speakers speak about about 150 words per minute, and as speaking speed increases, speech recognition accuracy degrades faster for computers than humans. Humans have trouble comprehending 300 words per minute. Beyond that, the speed at which lip muscles move imposes an upper limit at how fast speech can be created.

A = the speaking record for humans, in words per minute, set by Sean Shannon in 1995.

Early speech recognition success came from a technique abbreviated HMM, which is used in many areas of AI as well as physics and economics. Broadly, the technique involves reasoning about a sequence of situations (called states) and the transitions between them. For instance, consider that (in English) the sounds for b and p are similar. Let's say you hear a sound which could be either a b or p. But immediately afterward, you hear an aw sound. You would conclude that the previous sound was a p, since paw is a more likely utterance than baw. And if you'd heard the word "cat's" earlier, you'd be more confident in your conclusion.

B = the number of letters in the first "M" of HMM, named for a Russian mathematician.

The fundamental unit of speech is called a phoneme. The b and p phonemes sound like one another; oi and oo don't. The duration of phonemes might be a few hundred milliseconds, and are expressed at different pitches/frequencies. Vowels are often constrained in frequency (think about singing oo at a particular pitch), while fricatives like f and th contain lots of frequencies. At the extreme, white noise contains lots and lots of frequencies, which is why it's so easily tuned out by our brains.

A sonogram is a graph of sound, converting data from the aural to the visual. A sonogram's x-axis is time, and y-axis is frequency. A pure tone will be a horizontal line, so vowels will approximate that. Fricatives have more height. With training, people can read sonograms of speech and know what's being said. Consider this sonogram, which depicts someone speaking the name of a common meal in English:

That sharp vertical line just before 200 milliseconds is a forced alveolar stop, which gives you a hint about the first letter of the word.

C = the number of letters in the word that's being spoken in the above sonogram.

No one's put much work into training computers to reading sonograms, but at a deep mathematical level, reading long-duration sonograms is how the latest generation of AI speech recognition systems work. They use a type of neural network that uses two types of memory: one looking back over a long period of time (think words) and the other looking back over the short term (think phonemes). To AI researchers, this technique is known by a four-letter initialism.

D = the total number of letters in the four words of that technique. (Hint: it's more than ten.)

The cache is waiting to hear from you at:

N 42 22. A – (45 · B) – 1
W 71 05. (35 · D) – (C / 2)

Additional Hints (Decrypt)

Chmmyr: Sbe Cngu 2, jura fbyivat sbe Q, gur svefg yrggre bs gur sbhe-yrggre vavgvnyvfz vf "Y". Pnpur: ovfba va n gerr, fvk srrg bss gur tebhaq.

Decryption Key

A|B|C|D|E|F|G|H|I|J|K|L|M
-------------------------
N|O|P|Q|R|S|T|U|V|W|X|Y|Z

(letter above equals below, and vice versa)

AI in Cambridge #4: Speech Recognition Mystery Cache

Watch

Geocache Description:

Path 1, for the sound of hearing

Attributes

Treasures