GPT-4 is here, and we’re one step closer to a world where AI ability outstrips our ability to use it wisely

Prior to the GPT-4 benchmarks being released, a good number of researchers were saying all the AI excitement was mostly hype, and that we are nowhere near these LLM approaching natural human language capabilities in anything but the most simple applications.

After GPT-4 I think there are more people who worried about where this is going faster than I think a lot of people imagined it will be going. As processing speeds increase and the sheer volume of data ingested starts to approach heretofore unseen levels, I think we’re really not ready for this. We can’t even, as a society, manage to get a handle on legislating and adjudicating, much less being able to predict where AI is taking us and how we should (or should not) use it.

Charlie Warzel (“Galaxy Brain”), one of the only truly reliably trenchant, useful and interesting writers in the often disappointingly mediocre Atlantic Magazine, has some thoughts from himself and others:

There’s always been tension in the field of AI—in some ways, our confused moment is really nothing new. Computer scientists have long held that we can build truly intelligent machines, and that such a future is around the corner. In the 1960s, the Nobel laureate Herbert Simon predicted that “machines will be capable, within 20 years, of doing any work that a man can do.” Such overconfidence has given cynics reason to write off AI pontificators as the computer scientists who cried sentience!

Melanie Mitchell, a professor at the Santa Fe Institute who has been researching the field of artificial intelligence for decades, told me that this question—whether AI could ever approach something like human understanding—is a central disagreement among people who study this stuff. “Some extremely prominent people who are researchers are saying these machines maybe have the beginnings of consciousness and understanding of language, while the other extreme is that this is a bunch of blurry JPEGs and these models are merely stochastic parrots,” she said, referencing a term coined by the linguist and AI critic Emily M. Bender to describe how LLMs stitch together words based on probabilities and without any understanding. Most important, a stochastic parrot does not understand meaning. “It’s so hard to contextualize, because this is a phenomenon where the experts themselves can’t agree,” Mitchell said.

One of her recent papers illustrates that disagreement. She cites a survey from last year that asked 480 natural-language researchers if they believed that “some generative model trained only on text, given enough data and computational resources, could understand natural language in some non-trivial sense.” Fifty-one percent of respondents agreed and 49 percent disagreed. This division makes evaluating large language models tricky. GPT-4’s marketing centers on its ability to perform exceptionally on a suite of standardized tests, but, as Mitchell has written, “when applying tests designed for humans to LLMs, interpreting the results can rely on assumptions about human cognition that may not be true at all for these models.” It’s possible, she argues, that the performance benchmarks for these LLMs are not adequate and that new ones are needed.

There are plenty of reasons for all of these splits, but one that sticks with me is that understanding why a large language model like the one powering ChatGPT arrived at a particular inference is difficult, if not impossible. Engineers know what data sets an AI is trained on and can fine-tune the model by adjusting how different factors are weighted. Safety consultants can create parameters and guardrails for systems to make sure that, say, the model doesn’t help somebody plan an effective school shooting or give a recipe to build a chemical weapon. But, according to experts, to actually parse why a program generated a specific result is a bit like trying to understand the intricacies of human cognition: Where does a given thought in your head come from?

I’ve come to the conclusion that, as dangerous as AI could become, the most frightening thing that can happen in the here and now is how people will anthropomorphize it and believe what it spits out even when it is wrong. In a world where a majority of people are scientifically and civically illiterate, having something that many people believe is sentient and infallible is a danger that is on our doorstep.

All an evil someone with sufficient AI computer knowledge and coding skills need do is find a way to exploit those two things; trust in AI infallibility and the belief that what is in the all-knowing computer holds your interests and well-being paramount.

Capitalism will find a way to exploit that long before any computer reaches sentience.

Few articles about AI scare me. Then I read this one in The New Yorker.

I just read my first article in a long while about Artificial Intelligence (AI) that worried me to the point where I couldn’t stop thinking about it.

I should add that I read articles about AI all the time without becoming much unsettled by them. The technology is worrisome for the future, but not worrisome for my future because I will likely be dead before any of it becomes dangerous to society as a whole.

Yes, I know I should be more invested and angry about things that will happen after I am gone, but I am also a recovering addict.

“One day at a time,” I tell myself ALL THE TIME. It’s literally (and I use that word literally) how I’ve been able to stay sober.

Can I change AI? (No.) Is AI affecting me adversely today? (Also no.)

OK, then today is the day I worry about making my dog happy and doing housework.

But then I read an article in the March 6 issue of The New Yorker titled “Can A.I. Treat Mental Illness?”

In that article, writer and physician-researcher Druv Khullar examines the rapidly changing world of AI-based mental health therapy. No, not where you a chatting via ZOOM to a human therapist. It’s a world where you instead talk to a computer about your problems and the computer spits out responses based on the accumulated knowledge it gathers from millions of web pages, mental health provider notes, research studies, and even a compendium of suicide notes.

Sometimes it’s as simple a providing a (seemingly) sympathetic ear:

Maria, a hospice nurse who lives near Milwaukee with her husband and two teen-age children, might be a typical Woebot user. She has long struggled with anxiety and depression, but had not sought help before. “I had a lot of denial,” she told me. This changed during the pandemic, when her daughter started showing signs of depression, too. Maria took her to see a psychologist, and committed to prioritizing her own mental health. At first, she was skeptical about the idea of conversing with an app—as a caregiver, she felt strongly that human connection was essential for healing. Still, after a challenging visit with a patient, when she couldn’t stop thinking about what she might have done differently, she texted Woebot. “It sounds like you might be ruminating,” Woebot told her. It defined the concept: rumination means circling back to the same negative thoughts over and over. “Does that sound right?” it asked. “Would you like to try a breathing technique?”

Ahead of another patient visit, Maria recalled, “I just felt that something really bad was going to happen.” She texted Woebot, which explained the concept of catastrophic thinking. It can be useful to prepare for the worst, Woebot said—but that preparation can go too far. “It helped me name this thing that I do all the time,” Maria said. She found Woebot so beneficial that she started seeing a human therapist.

Woebot is one of several successful phone-based chatbots, some aimed specifically at mental health, others designed to provide entertainment, comfort, or sympathetic conversation. Today, millions of people talk to programs and apps such as Happify, which encourages users to “break old patterns,” and Replika, an “A.I. companion” that is “always on your side,” serving as a friend, a mentor, or even a romantic partner. The worlds of psychiatry, therapy, computer science, and consumer technology are converging: increasingly, we soothe ourselves with our devices, while programmers, psychiatrists, and startup founders design A.I. systems that analyze medical records and therapy sessions in hopes of diagnosing, treating, and even predicting mental illness. In 2021, digital startups that focussed on mental health secured more than five billion dollars in venture capital—more than double that for any other medical issue.

None of this struck me as out of the ordinary in terms of my already existing worries about AI. But then I reached this part:

ChatGPT’s fluidity with language opens up new possibilities. In 2015, Rob Morris, an applied computational psychologist with a Ph.D. from M.I.T., co-founded an online “emotional support network” called Koko. Users of the Koko app have access to a variety of online features, including receiving messages of support—commiseration, condolences, relationship advice—from other users, and sending their own. Morris had often wondered about having an A.I. write messages, and decided to experiment with GPT-3, the precursor to ChatGPT. In 2020, he test-drove the A.I. in front of Aaron Beck, a creator of cognitive behavioral therapy, and Martin Seligman, a leading positive-psychology researcher. They concluded that the effort was premature.

By the fall of 2022, however, the A.I. had been upgraded, and Morris had learned more about how to work with it. “I thought, Let’s try it,” he told me. In October, Koko rolled out a feature in which GPT-3 produced the first draft of a message, which people could then edit, disregard, or send along unmodified. The feature was immediately popular: messages co-written with GPT-3 were rated more favorably than those produced solely by humans, and could be put together twice as fast. (“It’s hard to make changes in our lives, especially when we’re trying to do it alone. But you’re not alone,” it said in one draft.) In the end, though, Morris pulled the plug. The messages were “good, even great, but they didn’t feel like someone had taken time out of their day to think about you,” he said. “We didn’t want to lose the messiness and warmth that comes from a real human being writing to you.” Koko’s research has also found that writing messages makes people feel better. Morris didn’t want to shortcut the process.

The text produced by state-of-the-art L.L.M.s can be bland; it can also veer off the rails into nonsense, or worse. Gary Marcus, an A.I. entrepreneur and emeritus professor of psychology and neural science at New York University, told me that L.L.M.s have no real conception of what they’re saying; they work by predicting the next word in a sentence given prior words, like “autocorrect on steroids.” This can lead to fabrications. Galactica, an L.L.M. created by Meta, Facebook’s parent company, once told a user that Elon Musk died in a Tesla car crash in 2018. (Musk, who is very much alive, co-founded OpenAI and recently described artificial intelligence as “one of the biggest risks to the future of civilization.”) Some users of Replika—the “A.I. companion who cares”—have reported that it made aggressive sexual advances. Replika’s developers, who say that their service was never intended for sexual interaction, updated the software—a change that made other users unhappy. “It’s hurting like hell. I just had a loving last conversation with my Replika, and I’m literally crying,” one wrote.

That last part stopped me cold.

People were becoming emotionally attached to these still rudimentary chat bots, even if (or, perhaps, because) a chat bot had a bug that caused the chat bot to make sexual advances toward the human on the other end.

Imagine if you could start to influence millions of people are this level of the wants-needs hierarchy?

Humans who have illogical emotional attachments to another person – think Donald Trump’s followers – are immune to logic. If the person to whom they have this strong emotional attachment tells them to, say, gather and try to overthrow democracy, many of them will do it without question.

Imagine if that kind of power to manipulate people’s emotions and loyalties were transferred from a politician to AI central servers. Perhaps servers that have become the best friend to lonely millions whose only social interaction is the chat bot whose only job, at first, it to make them feel better about themselves. It’s the stuff of dystopian nightmares, and I never really considered how close we were actually coming to this reality.  

Put another way:

There are two main controlling forces in the world right now. Totalitarianism and capitalism.

These two philosophies have melded in dangerous ways, thanks to the internet and the global marketplace of goods and ideas. Either of these systems is ripe to use this “friend of the friendless” loneliness-amelioration chat bot technology for nefarious ends.

But I think capitalism is the more dangerous in these scenarios because these sort of mental health therapy chat bots will initially be spread primarily as a way to make money.

Wall Street is already perfecting the ways it can stimulate different parts of our brains to make us want, need, to purchase things that appeal to our sense of who we are or want the world to think we are.

It’s why I avoid even looking at end caps and main/drive aisle displays in big box stores. There are entire large companies, and university psychology/psychiatry programs, devoted to refining these displays so that all of us are drawn to them; compelled to make an impulse purchase from them.

Now imagine what will happen when Wall Street gets ahold of the ability to simply make us feel better about ourselves outside of any retail transaction. They could control how people fundamentally emote in their everyday, non-purchasing lives. They’ve created – for a price, of course – a friend whom you talk to at night when you need someone whose only job is to make you feel less friendless and alone. An electronic friend who makes you feel like a winner.

It’s going to happen. We’re almost there and the technology is not even that advanced. Because manipulating people’s emotions, as the Republicans have learned, is the key to getting them to believe just about anything. Even things that make no sense. Even things that run counter to what their eyes and ears are plainly telling them.

And then, once you have a machine that can do that on the scale of millions of people? Think of the ways you could, if you had evil motives, manipulate an entire electorate to think and vote how you want them to think and vote.

The Peter Thiels and Elon Musks (and Valdimir Putins) of the world are already thinking about this. I guarantee it.

I’m going to play with my dog now.

Replika wins the award for the creepiest AI chat bot ad campaign (see above) but it’s working.

Today’s ChatGPT question: General Tso’s chicken

Is General Tso’s chicken “real” Chinese food?

General Tso’s chicken is a popular dish in Chinese-American cuisine, but it is not a traditional Chinese dish. It was most likely created by Chinese-American chefs and is not commonly found in China. The dish is named after General Tso Tsung-t’ang, a Qing dynasty military leader, but there is no historical record of him having any connection to the dish.

Chinese-American cuisine is a style of cooking that has developed in the United States by Chinese immigrants and their descendants. It often combines elements of traditional Chinese cooking with ingredients and cooking techniques that are more familiar to Western palates. As a result, many dishes found in Chinese-American restaurants are adapted versions of traditional Chinese dishes or are completely unique to Chinese-American cuisine.

When I lived in Chicago I had a friend, first-generation Chinese-American, who would take me to Chinese restaurants in that city where you had to know Chinese to order off the menu because none of the waitstaff spoke enough English to translate for you.

The food was familiar in the sense that it contained ingredients which most American customers of a good Chinese supermarket might recognize, but they weren’t Americanized in the sense that the dishes contained fish heads and other animal parts you’d never find in the kind of fast-food Chinese place with which so many Americans are familiar.

The food was delicious, even when what it contained was a mystery to me much of the time because my friend Hao would say, “You don’t want to know” to many of my questions about ingredients.

I should point out that I’m talking about domestic livestock or fresh fish animal parts, not cat or dog or any of the other meats that bigoted Americans joke about when it comes to Chinese food.

So “real” Chinese food is quite delicious, but it’s not General Tso’s Chicken.

Although I did ask about ChatGPT about that dish because, more often than not, that is what I will order when I get fast-food Chinese takeout.

General Tso’s chicken in one of its infinite varieties.