The big AI headline in Washington today was Vice President Kamala Harris hosting the CEOs of Microsoft, Google, Anthropic and OpenAI in a closed-door meeting. But the real attention of the AI community is now fixed in August, at an event that could provide a very public reckoning for the large language models these tech corporations have produced. Tucked into the White House’s press release Thursday on “new actions that will further promote responsible American innovation in artificial intelligence (AI) and protect people’s rights and safety,” was a nod to DEFCON 31 — a giant hacker convention held across multiple Las Vegas hotels from August 10-13 that now has an unusual endorsement from the Biden administration. Amid all the noise and bluster about regulating AI, it's the most concrete move yet to provide some public accountability — and public testing — of the fast-moving platforms at the heart of the conversation. Collectively, large language models — ChatGPT, Midjourney and their ilk — have suddenly lit a fire under the federal government. Congress has more questions than answers right now, and the White House has laid out (purely voluntary) AI development guidelines, frameworks and roadmaps. But now the White House has effectively signed onto a public experiment to find out whether rapidly developing AI models are secure and safe enough for widespread adoption — for the public and for the government itself. This isn’t a formal audit — instead, the plan is to let the world (or at least the part of the world at DEFCON this year) test the models from Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability AI, which are currently some of the most popular LLMs out there. “Nobody's ever done anything like this before,” said Seed AI CEO Austin Carson — one of the people organizing the DEFCON 31 hacking exercise. It’s not precisely unprecedented, although the scale of the hacking exercise might be. The Pentagon held a red-teaming exercise at last year’s DEFCON for a project to build individual “micro” electric grids for some of its military bases. August’s exercise is meant to function as a pilot, Carson says. “If you go on ChatGPT and you try to do some stuff, it’ll kick you out eventually, right? But we want folks to have at least this hour… to explore all the ways this thing possibly works, explore the guardrails and the functionality.” Carson plans to bring in a few hundred students from a coalition of community colleges (alongside the regular DEFCON crowd of hackers) — and let them rip. So, basically: a giant coordinated red team exercise on a bevy of headline-grabbing AI platforms to test if they are worthy of public trust. But will it be enough? Heather Frase, a senior fellow at Georgetown’s Center for Security and Emerging Technology who works on AI standards and testing, doesn’t think so. But she does think it’s a step in the right direction. The unease around AI adoption are manifold — from triggering the apocalypse to sinister dark patterns that define people’s experience in the digital world. “It's like when you're playing a video game and you get to a new continent on your screen — there's so much you don't know,” Frase said. The DEFCON assessments are “going to give us a huge thing of value in a short amount of time,” she said, “but it is not sufficient.” “We need to see this as a long-term commitment,” Fraser said. “There's gonna be things that we're not gonna discover quickly, even with DEFCON. So we are in this for the long haul.” Scale AI CEO Alexandr Wang — whose company is building the testing platform for this massive red-teaming exercise — said something similar: “This test at DEFCON is not going to be the last time that we have to test and evaluate these models for safety, reliability, and accuracy. These are going to be requirements for society going forward,” Wang said. He compared his company to Switzerland, “unbiased and not beholden to any single ecosystem.” As for which AI models from each company will come out to play: “We're collaborating directly with all of these foundation model companies to understand which models they would like to test and evaluate,” Wang said. “Advances in technology, including the challenges posed by AI are complex. Government, private companies, and others in society must tackle these challenges together,” Vice President Kamala Harris in a statement released after her meeting with the four tech CEOs. “President Biden and I are committed to doing our part – including by advancing potential new regulations and supporting new legislation – so that everyone can safely benefit from technological innovations.” she said And all those big AI model-makers — all those CEOs — are expecting to play ball, providing jailbroken versions of their models for DEFCON 31. And if you’re curious about what the big tech participants are gaining from this exercise: it’s not just consumer trust — it’s government trust, said Fraser. The open door policy that Harris is adopting with big tech CEOs is actually a diplomatic tightrope walk to get all the players at the table, said Carson — a signal that the White House is looking to have a constructive conversation with tech corporations that exist largely outside of any data regulations in the U.S. And in a nutshell, the White House’s ultimate goal in setting up the public AI assessment in August was “to evaluate the models that are out in the wild and being used today,” said Wang.
|