Conway, AR
Many years ago I found myself in Arkansas. I had a contract to run a class on the latest technology in my field, which was a remarkable OLAP engine called Essbase. I was at the data center for a profoundly sharp group of folks at a company called Axciom. They were in the business of using hacked together supercomputers to do the heavy lifting for the credit card companies. I learned a few things during that trip, one of which is very relevant to controversies of the day.
The major task faced by Axciom was called fulfillment. Imagine you’re the marketing genius at Citibank and you have decided, as you decided many times before, that you would like to send out a mass mailing telling X number of people that they have been pre-selected for a special offer. I’m sure you have been on the receiving end of this genius decision on at least one occasion. Well, back in the early 2000s not a lot of people had enough compute hardware to calculate X. Ahh but the boys at Axciom had a secret weapon. Citibank would send them a dump of their entire catalog of cardholders onto magtape and FedEx it over to Conway, AK with the magic query. Axciom would copy that tape to their robotic tape library storage facility and then pipe the query through a Dec Alpha 64 bit pre-processor glued to an IBM mainframe. In that way, before horizontal scaling and Hadoop had been invented and perfected, they could come up with a select list. Moreover, they could run this over a few iterations and rightly qualify the list according to the budget of Citi’s marketing department. In comes 20 million odd undifferentiated cardholders, out goes a nicely pruned short list. If you weren’t the NSA, this task was impossibly expensive. So that was Axciom’s business model.
Along the way, Axciom became very good at imputation. They could look at a set of linked data and with some accuracy determine which blanks in the huge set of non-perfect data could be guessed. So the opening line to the humorous story was “Did you know that there are 14 different sexes?” Beginning with ‘male’ and ‘female’ there were various methods of imputation, the first and most obvious being by first name. To which the appropriate banter flows towards a boy name Sue. And so it went down some very interesting methods of which I remember none. In recent years the imputation rabbit hole caught and twisted the ankle of any number of proto-Wokies and celebrity economists like Stephen Dubner who have made all sorts of predictions about America’s DaShawns and LaTiefas and their prospects on resumes. The simple fact is that we all do imputations, and we pick between colors of breakfast cereal boxes in microeconomic ways that aggregate significantly. There is psychology involved in the personal selections, in the Axciom algorithms, not so much. Intent matters.
A New Race Box
These days, it is increasingly difficult to discern the intent of just about any decision people can put at the feet of racial identification and subsequent judgement. In that regard I have been generally impressed with the trend of the Department of Census in their evolution towards multiple choice. I think their hand should be nudged in a direction I believe to be fair, which is essentially to expand their ability to capture the nuance of the following sorts of statements:
“I am considered black by most people, and some consider me African-American, but if I had a choice on most applications I would put ‘decline to state’, because I should have a choice whether or not I should declare a race knowing that racial decisions will be made about me and this racial group.”
“I am considered Asian, and many people consider me Chinese, but I am from [blank filled] TAIWAN and I don’t want to be clumped into this big group that includes Indians, Malaysians and Pacific Islanders even though technically Taiwan is an island in the Pacific Ocean.”
So somebody name Hoyt came up with the following form:
Now I went and signed the petition over at Change.org and then Hoyt went and poisoned his pill by adding the word ‘reparations’ to his raisons d’etre. But I still think this level of detail in the matter of racial identification is key from a data science point of view. Why? Because it more accurately describes what’s going on. You aren’t just simply ‘White’. People call you white even though you don’t care about race and you’d rather not be stereotyped. I think that applies to most people - they don’t want to be racially stereotyped. So where exactly in life do we get to say so? I think it would be important for the Census to be that place.
Obviously, this would throw a great deal of racesplaining into disarray giving us one less thing to be sure about in this world of distrust and confusion. I say that’s a good start.
I was explaining to Ms. Chin the other day that I am becoming more of an absolutist when it comes to deracializing American thought and prejudice. My part is to acknowledge that we humans are being racialized. The way the Census and government agencies report, is that people self-identify as a certain race. This is true, but would we always if we had a choice not to? Why can’t we self-identify out? Because race matters? Well how much it matters to you is different than how much it matters to me. How much it matters to the powers that be is what’s at stake. There is a huge difference between being racialized and being acculturated. It’s actually not subtle if you think about it.
The patron saint in all of this is Ward Connorly. We shall be on the lookout for the Supreme Court decision. Perhaps his day has come.
Outside of the Race Box
This week I am in Vegas for a trade show. I got to talk to some blockchain people and some capitalists and some AI people and some fintech people. It’s so good to be back in the open throngs of free breathing folks, as much as I hate crowds, this is one I like.
Anyway the related story here is that I spoke at length to a guy who is in a credit service startup and I learned the basics of FCRA which is the regulatory framework for credit providers. The standard narrative when it comes to AI and algorithmic bias is that evil racist programmers and inadvertently idiot racist programmers are screwing over poor hapless people of color. It turns out that there are more subtle things going on.
So first, understand that as a data engineer, I have always known that there are hundreds of attributes that you can attach to any individual in a database other than race or sexual preference. Blood type and zodiac sign are two examples I like to use with the connotation that they aren’t as socially poisoned as race and as an individual one is free to take them seriously or not.
Well it turns out that FCRA regulations have a set of protected classes against which a credit provider is restricted from using when making a decision to approve a line of credit. Hashtag ‘To lend or not to lend, that cannot be a racial question’. I believe that marital status is another such question. You can certainly market to marrieds or Filipinos if you like, but you cannot use such marketing information materially in your approve / decline decisioning.
Now it turns out that despite what people say about redlining, the most important determining factor in the decisioning process is the FICO score. According to another presentation I saw, on the average with the top 10 factors used the FICO score is about 69% of the weight, and the other 9 weigh out to 31%. I didn’t take down all the factors, but three of them that I recall were: Balance on primary car, balance on secondary car and bank account cash flow. In the second presentation, the company goes through what it calls a ‘second-look’ process. In other words, we know that just relying on FICO gives us an easy no. What else could we look at that gives us a reasonable yes?
So this is interesting. What if you did an experiment to give different weights to the various possible ten factors (imagining that there might be a couple dozen) and then mix them and weigh them differently? Now take six or seven of these mixed weightings and see how differently those protected classes fare. It turns out that you can likely come up with a mix of metrics that minimizes the approval rate differentials across a protected class. And then given the actual payment histories of those approved by the new mix, it can be demonstrated that those previously excluded by the lazy mix are making the same loan payoffs.
This appeals to me for both practical and intellectual reasons. It demonstrates that lazy evaluations of people tend to go wrong in a way is actually measurable. It coopts the term of ‘algorithmic bias’ and breaks it down to those specific algorithms used and how. There’s not just one corrupt ring of computing power that dominates. There are at least two startup companies (and probably more) that saw how things went wrong and fixed it. Now I don’t know if they reverse-engineered the results or what, but they delivered an equality of outcomes using a smarter set of decision criteria by which to increase the size of the market.
This is something that was certainly conceivable. I knew it quite some time ago. Your marital status has nothing to do with whether or not your house is vulnerable to burning down. It matters what your roof is made of and how close your house is to a fire hydrant. But the guy I was talking to said, well, maybe marital status is not out of the question. Maybe the historical discriminations we have made and the outcomes we saw were the result of bad algorithm weights and mixes. We have only recently gained the compute power, AI methods and startup funding to review all of our previous methods. We have just been imputing intent in a reactionary way before now.
So it is lazy to just bark out the politically incendiary charge that Trans Union, Equifax and Experian are institutionally racist in intent. It is slightly different to suggest their algorithms might be biased and unfair, but that might just be weasel-wording the same overheated political message. To demonstrate a superior method of determining credit worthiness is a highly commendable step in the right direction. Yay capitalists. Yay AI dudes. This is the kind of deracialization that solves a politically racialized social issue by using methods other than weighting race itself to produce more equitable outcomes. The good intent of FCRA thus is resolved without fighting racial discrimination with racial discrimination.
Progress?
So there is some useful technology. The question remains as to how these improvements will be spun politically if it becomes reasonable to question the gut decisions and previous partial arithmetic that provided the basis for the enactment of the FCRA in the first place. This stoic makes no bets in that fight, but I will pay attention to the new methodologies.
There isn't a lot that gives this old man hope these days, but this does. More of this, please.
There's more data than ever to work with, and someone will do the work. The key is whether the work is done for good or for other reasons. When we project the impact of AI on this, it starts to look like a first past the post winner take all situation, at least for long enough that the impact will be exponential for the good or for the bad.
Godspeed. I hate interesting times.
Addendum: It has come to my attention that the author of this proposal, Carlos Hoyt, has a working relationship with two individuals I highly respect. Those would be Greg Thomas and Sheena Mason. The proposal's update with editorial on 'reparations' has been removed, and thus my prior objection. I urge you to sign the petition at Change.org (https://bit.ly/3SOz5FS).