3 Comments

Thank you for the thought provoking piece. Do you think there are LLM’s in use by the spy agencies? If so, which one would you speculate has the best system?

Expand full comment
author

I would guess that the NSA has the best, primarily because it has access to many systems in many languages that long predate the WWW. If you recall the controversy in the 90s over the FBI's use of 'Stingray' and CALEA wiretapping and trace mechanisms - these are just the tip of the iceberg. The USPS takes a photo of every normal sized letter I get and emails me a copy. The snail mail people have that, so I know the NSA has many orders of magnitude better tech and larger digital storage and mining capacity.

The NSA knows that I have a cousin who worked for them in the early 90s. He mastered Oracle and built applications for them. I still have a small notebook with their imprint from the tourist center at Ft Meade. Of course since ECHELON they have been sharing ELINT with the FiveEyes so I can easily imagine that in their very specific applications, they have data on people and organizations this is entirely unique.

What I specifically recall was that while working at large companies in the US, back when Google was newer, say 2003 for example, I could find financials from companies faster with Google than executives could with their own intranets. I've always considered Map/Reduce to be something of a joke, but that was the hallmark of the dawn of the 'Big Data' era in commercial computing. My nickel says IBM is staying alive because of contracts they've supplied the Feds who are captured by their own regs. I understand horizontal scale out computing but I still know very little about supercomputing. NSA has got those.

LLMs can be very useful for abstracting large collections of documents. This is exceedingly important work for a data collection function at a national security agency. But the most important thing is that they already capture first rate sources. That allows them to be much more efficient in their curation, and spend less time needing LLMs to be as accurate as civilians do. So that's what I would do - tailor LLMs to categorize more unstructured data into what must already be a fairly rigorous process of classification. Understand that NSA for example, will always collect WAY more than they inspect. The processes are independent. Like multiple cow stomachs. Well, not exactly.

Also understand that NSA will have legal access to systems outside of its own. I don't know how large that 'dark web' might be, but it was all purpose built. So I expect it does what it does fairly well. Considering what Palantir adds with their capabilities, it has got to be impressive. Remember that the law enforcement nature of these systems means that they will longitudinally track persons of interest without their knowledge. When an administrative warrant is signed, the FBI can tail you for many months before an arrest is made.

The most compelling fact I learned in all of this is that the CIA (and probably others too) have the right to violate patents. So they can re-engineer any publicly known system and could surreptitiously swap out their back door mods. Consider the known flaws in TLS. ( https://cheapsslsecurity.com/blog/tls-versions-what-they-are-and-which-ones-are-still-supported/ )

Expand full comment