alas, AI ransomware is here
Hey there.
Ok, I bear some awkward news.
No, it's not about Donald Trump — the current and potentially last POTUS — attempt to tweet-fire a member of the FOMC — headlines like those no longer feel like news to me. Reading the New York Times these days is a bit like watching a movie for a script written in the 1930s and just hastily rewritten to mention iPhones, Bitcoins and slightly more tanned and less war-traumatized characters.
But I digress… It's time to read about why we should care a bit more about AI Safety — once again.
A few weeks ago I finished Mustafa Suleyman's "The Coming Wave" over my sweet week of holidays. Some of you will have surely heard about this book before — or maybe not, honestly Musatfa has since been through a few PR chagrins, so not surprised if his book sales have followed suit.
But in any case, I found it a decent read on AI and biotech (less so the biotech, Mustafa is a CS guy) which are both obvious umbrella terms for a shit ton of new developments in computer science and life sciences.
The core idea of the book, through the pre-dystopian imagery it associates with, is that the culmination of ultra-fast advances in these two fields will engulf Humanity like a wave in the next few decades. Whether this wave will crush us our embrace us is beyond point to him: his answer is to contain it.
I don't want to spoil the messaging too much, althought the book is not thriller, but he speculates that this will be really, really hard. A lot of the book is a fairly good empirical account of how incompetent we have been in avoiding the totally-foreseeable misanthropic behaviors our computers have displayed.
About halfway through the book, Suleyman describes, as one among a vast ecology of perils States will find in their desperate attempt to hold their end of the Great Bargain, how distinctly problematic the wild west of "vibe hacking" will be.
He goes on at length, imagining varied scenarios where malicious actors unleash incredibly smart coding agents onto our ever-growing public and priavte networks — instead of running scripts or pre-written pieces of malware over and over, they'll let them linger and learn until they find an opening.
I remember thinking to myself something along these lines: "this is what I don't like about these forward-facing futuristic chronicle-type books. They always oversell the problems and hype the risks while paying little or any attention to the obvious solutions the same tech could produce."
And, in some sense, my initial counter-argument was simple: I briefly worked in cybersec and ran a SOC for a while, so I am aware of the effort commercial companies and even state agencies and institutions, have started putting into securing their products and internal operations.
So if smart evil black-hat agents can eventually find openings, then smart white-hat coding agents can eventually find fixes, and if there's matchup I am betting on whichever has the most computing power — recall that AI coding agents run on FLOPS. A LOT of FLOPS, and since these FLOPS cost dollars, I think I am leaning on whoever is protected by the law, most likely to be white-hat hacker.
That's it, that's my simple heuristic. Could I be wrong? I am sure I could, and if I am wrong it is almost certainly because somewhere along the AI-development-race-to-AGI-by-2027-folly, labs became obsessed with protecting their models and their research behind closed doors.
One key stone in the containment path is transparency. The idea that companies, governments, or any entity that develops, maintains, and/or regulates these emergent technologies must be incentivized and even arm-wrestled, if necessary, into revealing critical research findings to the open-source community. I am buying more and more into this idea, and I think it's glaringly obvious that without a framework of transparency, we're just not ready to deal with all the havoc that bots will inject into our unprotected systems.
A shining example of this paradigm is CVE — Common Vulnerabilities and Exposures — which organizes around a community of partners who openly share and publish new vulnerabilities in software. In cybersecurity, examples like CVE are the norm, and as a community it has thrived on this type of openness — I think AI labs and researchers should follow suit.
All this was really just context. Here's the news I want you to look at: Anthropic's first ever "Threat Intelligence Report".
I am genuinely impressed with this initiative by the Anthropic Safeguard Team, and writing this is my way of saluting them for their work and openness. Anthropic takes unreasonable amounts of flak for talking too much about safety, but I think that safety is vastly underrated in Silicon Valley. It's pretty clear that with models of this size, emergent capabilities and unforeseen jailbreaking techniques can wreak way more havoc than any other technology we have invented so far.
Let's talk about what the folks at Anthropic discovered.