The human voice is the organ of the soul

Right now, tens of millions of speech-impaired people around the world have no choice but to use mass-produced, generic voices straight from a computer. Rupal Patel has a different idea.

Right now, tens of millions of speech-impaired people around the world have no choice but to use mass-produced, generic voices straight from a computer. Rupal Patel has a different idea.

His voice would project from his 5-foot-11 frame, streaking out of the dugout when he coached his sons’ Little League teams and filling hotel ballrooms packed with his clients. John spent his days as a management consultant and executive coach, flying from his Maine home to New York or Philadelphia or taking the train down to Boston. His voice was his instrument; timely interjection is everything in coaching. The days he didn’t spend on the road were spent on the phone, talking through problems with clients. Then on the drive home, he’d crank the radio, singing along and drumming the rhythm with his hands.

In 2006 John opened his own coaching business and spent the next year constantly on the road. Two weeks of travel every month brought on a bout of pneumonia, which, even after he recovered, left him with overwhelming fatigue. But his wife, Linda, knew something else was off.

By each afternoon, his speech was slow and slurred, she says. For John, it was hard to enunciate; his lips felt awkward. They chalked it up to his work schedule.
One week that June, he headed back out on the road, to a meeting in Chicago. John and his clients sat down for a steak dinner and poured some wine. He began talking, and the two clients exchanged glances. They thought he was drunk; he hadn’t taken a sip.

John and Linda Gregoire in 2014

John and Linda Gregoire in 2014

Small things started to pile up: trouble opening jars, cutting himself while cooking dinner, difficulty showing his son a guitar chord. In New York, he fell over on the subway. One October morning, Linda looked out the window and saw him in the driveway. He looked nice: khakis, a blazer. He placed the garbage in the back of his car and pulled to the street to toss it in the can. As he went to throw it, his momentum bowled him over, and he rolled down the driveway. Linda sprinted out the front door, crying. They made an appointment to see a neurologist.

On December 17, John and Linda took the Amtrak Downeaster train to meet with a specialist at St. Elizabeth’s Medical Center in Boston.

With each stop, the train grew fuller with Christmas shoppers; at one point the whole car started singing carols. When they arrived, they walked 20 minutes through the early winter slush, Linda holding John up the whole way. The doctor was blunt: John had ALS. He gave John 18 months to live and offered the couple a coupon for a discounted wheelchair.

Linda limped John back across the now frozen slush and onto the train. She curled up, put her head on her husband’s shoulder, and cried. He tilted his head back and closed his eyes. They stayed like that for two and a half hours, all the way back to Maine.

John went back to work, and a new normal set in. Clients were understanding; email would suffice instead of phone conversations. Every afternoon at 3:30, John drove to Starbucks for a Venti-size coffee. Placebo or not, the coffee helped his speech. But it couldn’t last forever. Within a year, the man who could command a ballroom with his voice was resigned to typing on an iPad and having it speak for him.

“Bossy Ryan” took over from there. The mechanical, demanding voice on John’s iPad earned that nickname from Linda. Instead of John sneaking up behind her and whispering in her ear, she had tone-deaf Bossy Ryan. “You hear somebody on the phone, sometimes you can identify who that person is before they even say their name,” Linda says. “You take that away from them and just give them Bossy Ryan? It’s not right.”

Although John had fully switched over to the iPad, he still had a BlackBerry, and on it a voicemail greeting: Hi, this is John Gregoire. I can’t take your call, leave a message. Every day, Linda would call John’s number just to hear that anodyne message, over and over. Hi, this is John … She knew every crack, every pause.

“You gotta get it back! It’s in the cloud somewhere, right?”

“You gotta get it back! It’s in the cloud somewhere, right?”

One day she made the call, and it was gone. She checked the number and tried again. Still nothing. She jumped in her car and bolted to the AT&T store. It turned out that the employees who’d helped Linda arrange to have John’s BlackBerry shut off had neglected to tell her she’d lose the message. Linda cried in the store.

“You gotta get it back!” she pleaded. “It’s in the cloud somewhere, right?”
The employees shook their heads.

John and Linda still have home movies, but they’re too hard to watch. To look back on those things now, Linda says, what’s the point? They’re a painful reminder of what they’ll never get back: the whispers, the kisses, even the fights. There was something about hearing just his voice on the greeting, though, that had been soothing.


Rupal Patel strides onto the TED stage at San Francisco’s SFJAZZ Center. A black slide with white type rises behind her. “In the words of the poet Longfellow,” she says, “the human voice is the organ of the soul.”

A photo of Stephen Hawking slides across the screen, and then one of a little girl, then three more people. All use communication devices to help them speak. And all of them, Rupal says, may be using the same voice. This problem, this lack of individualization, this lack of soul, was driven home for her 11 years earlier.

It was August 2002, and she had just gotten off the stage at the Conference of the International Society of Alternative and Augmentative Communication, in Odense, Denmark. She walked into the crowded technology exhibit hall, where people from all over the world were pitching voice programs, software, and research. There, she stumbled across a conversation between a young girl, no older than 10, and a middle-aged man. When they spoke, they used the same voice.

That voice, known colloquially as “Perfect Paul” (or even more colloquially, the “Stephen Hawking voice”), is the most popular artificial voice on the planet. You may also know it as the longtime voice of the National Weather Service. Developed in the early ’80s by Digital Equipment Corporation for its DECtalk speech synthesizer, it got its name because it was the clearest artificial voice on the market.

As Rupal looked around the exhibit hall, she saw hundreds of people using only a handful of voices. “We wouldn’t dream of fitting a little girl with the prosthetic of a grown man,” she tells the TED audience. “So why then the same prosthetic voice?”

She reached out to Tim Bunnell, a speech synthesis expert who was already building personalized voices for people who lost the ability to speak later in life, people who’d banked their speech knowing it was escaping them. He did this by clipping together a person’s speech samples and reconstructing his or her voice. Rupal had to find a way to reverse engineer the system with whatever vocal ability an individual had. Maybe that was the ability to pronounce one or two vowels, or maybe it was just a noise from deep within their larynx. Whatever it was, Rupal was going to capture it.

“What happens next is best described by my daughter’s analogy—she’s 6,” Rupal says on the TED stage. “She calls it mixing colors to paint voices.”

To create a voice, Rupal takes those unique source sounds from a speech-impaired person and combines them with full speech from someone roughly the same age and gender (similar regional accents are helpful, too).

She introduces the story of Samantha, a 17-year-old with perisylvian syndrome, a rare disorder that limits her ability to speak. But Samantha can still produce vowel-like sounds. Rupal plays an audio recording of Samantha pronouncing an “Ahhh” sound. She pauses. “Now, Samantha can say this,” Rupal says.

A new recording begins. It’s feminine and youthful and optimistic. “This voice is only for me,” Samantha says. “I can’t wait to use my new voice with my friends.”

Rupal Patel's TED talk.

Two weeks before the TED talk, on Thanksgiving Day 2013, Rupal had set up a website, VocalID.org, which contained a small section where visitors could sign up to donate their voices. Before walking on stage, Rupal had 10 surrogate voices she could use to make personalized voices. Two hundred, she thought, would be a good start. That alone would be 10 times larger than her biggest study. Within a week, 1,500 people had signed up. Two months later: 8,000 people.

Money started trickling in from the National Institutes of Health and the National Science Foundation. Rupal hired a programmer to build out the website, making it possible for those 8,000 waiting donors to give their voices. But what should they say?

Rupal compiled 3,500 sentences based on their sounds and sound combinations. Some were selected for rhythm, melody, and emotion; 250 constituted our most familiar phrases.

Hi.
Good to see you.
I love You.

Donors flooded the site. Voice drives surfaced at middle schools and as bar mitzvah projects. Chinese residents used the voice donation as English language practice. More than 10,000 people have now started or completed donating, no small feat considering the 3,500 sentences take roughly six or seven hours to orate.

To take all those donations and actually turn them into usable voices, Rupal needed to build a more complex algorithm that would search through the bank and pull out appropriate matches. In stepped Geoff Meltzner.

Geoff and Rupal had met eight years earlier, when he was working on silent speech recognition. (The work was fascinating: Imagine sensors on your neck and face that can recognize words even if you just mouth them silently.) Geoff built the algorithm Rupal needed and installed a computer in his basement to work through all the permutations. Before he came on, Rupal had to spend 40 to 50 hours manually massaging every voice created by her team at Northeastern University, smoothing out transitions or tweaking certain letter combinations. The computer’s fine-tuning helped drop that number to about 15 hours.

To supplement the federal grants, Rupal launched an Indiegogo crowdfunding campaign with a goal of raising $70,000. Among the incentives, she included a “Trailblazer” option: If you donated $10,000 you could get one of the first voices off the line. (Or, in this case, out of Geoff’s basement.) Rupal didn’t expect anyone to donate that much; four families did. She chose three additional voice beneficiaries and got to work.

The recipients included a 59-year-old man with ALS from Windham, Maine, and a 12-year-old girl with cerebral palsy from Plano, Texas.

Signup to donate your voice to VocalID here

 

excerpt ©2016 Pace Communications. This article appears in print on page 64 in the May 2016 issue of Southwest: The Magazine.