This week, Digital Future Daily is focusing on the fast-moving landscape of generative AI and the conversation about how and whether to regulate it — from pop culture to China to the U.S. Congress. Read day one here, on the controversy over a fake single from Drake and the Weeknd, and day two here on the not-actually-that-crazy idea of automated inventors. There’s a growing push to regulate AI. But what would that actually even look like? AI is either an idea or a tool, depending who you ask. And as a rule, the government doesn’t just start regulating tools or ideas. It usually finds a lever — a place to intervene, and ideally, a hard rationale for doing so. Last week, the leading futurist Jaron Lanier laid out a powerful case for both in the New Yorker. He argued for a principle called “data dignity” — or the concept that “digital stuff would typically be connected with the humans who want to be known for having made it.” In practical terms, this means you or I would actually have some claim on the huge data trails we leave — and on the ways they’re being used to train powerful artificial minds like GPT-4. For Lanier and the wider community of experts who have been exploring this idea, “data dignity” has two key pillars. For one, it’s a way to keep AIs closely tethered to people, rather than spinning off on their own in terrifying ways. It also offers clear, practical guidelines by which to regulate how they’re built and used — as well as who profits from them. Which in some ways, seems almost obvious. If these models are worthless without the stuff we post on the internet, shouldn’t we have a say over when, where, and how that stuff is used? But in another way, it represents a radical change in how we think about data online — one that likely has plenty of public support, even if the concept of “data dignity” itself is still unfamiliar to most people. On the internet we’ve come to know during the past 20 years, we’ve almost come to assume that free services require us to hand over control of massive gobs of data to Big Tech platforms — which then used to serve us hyper-targeted ads for, I don’t know, a dating service for singles in the Florida Panhandle who love Bruce Springsteen. It’s a trade we make knowingly, even if there are a lot of objections to the business model. But the rise of large language models and other tools that scrape the internet relentlessly for our personal data now adds a new dimension to the debate over that data, and that debate has serious ramifications for privacy, accountability and even the global economy. “It's not just creative people whose work is being transformed and reproduced,” said E. Glen Weyl, an economist and author who has written extensively about data ownership and provenance. (Weyl is also a researcher at Microsoft, but spoke to me in his personal capacity.) “Pretty much everyone who has done anything on the internet — and we don't know exactly what these models were trained on, because it hasn’t been disclosed, but imagine that it's most things that are publicly available online — anyone who's contributed there is helping create these models, and therefore helping create something that's both an engine of productivity, and also potentially an engine of labor displacement.” It’s a familiar doomy tale of technological exploitation: Companies that are just as much black boxes as their AI models are taking our data and using it to build machines that will put us out of jobs. The promise of “data dignity,” as its advocates lay out, is that internet users will not just have more authority and awareness of how their data is used, but that they might even be proportionally compensated for it. “Just like on Spotify, where there's a royalty that gets paid when a song gets played, there should be attribution of [data] both in moral terms so that people know where it's coming from, and in economic terms [accruing] to the people who created that,” Weyl said. Is that even possible, in practical terms? It’s an idea that would totally upend the status quo of the digital economy — so naturally, some big-stick government regulation would be required to start enforcing it. I asked Maritza Johnson, a data privacy expert and the founding director of the University of San Diego’s Center For Digital Society, what she thinks government could do to steer the current regime of digital rights in this direction if “data dignity” really does become a broader political cause. She made the case that it can use the same regulatory tools for accountability that already exist for other industries. “We need to move away from this fairy tale that [data rights are] up to the individual and recognize this is a collective problem,” Johnson said. She cited the examples of Facebook and Twitter being caught using phone numbers collected ostensibly for two-factor authentication for advertising, saying that regulators should be empowered to set explicit rules for how data is used and presented to users, and to punish companies for not following them. Weyl said much the same thing, arguing that tracking the provenance of data use is quite easy and simply needs a regulatory push to ensure it happens. For the trickier part — that is, getting users paid — he argued regulators could take a page from labor law. “From the legal perspective, [regulators could try] to give support and encouragement to some of these collective management organizations the same way that labor law did for labor unions… the Authors’ Guild, the Writers’ Guild, etc. can move beyond just the like basics of limiting what these models can do to try to invest in building the regime they really want once these things are pervasive.” When you start talking about humanity’s future, there’s another reason that tying the data LLMs use back to the original source could be important. Some thinkers in the data dignity movement argue that the risk to privacy posed by AI is existential, making it imperative that internet users have control over how personal data, like their speech patterns, written syntax, and even their gait are used. “These systems are capable of imitating just about anything,” Weyl told me. “They can create an essentially perfect replica of the person who is being imitated… and have it do arbitrary things to people that you care about. Many of the things that you believe are secret are not going to be secret anymore.” Johnson argues this is why regulators need to act now, before the AI business model runs roughshod over privacy. “Privacy and security are really hard to retrofit onto a system, which is why you have a lot of talk about ‘privacy by design,’” she said. “It would take a really big regulatory stick to be sure that companies actually do it, but it’s extremely important.”
|