We met with author Erik Larson to discuss Wikipedia, the influence algorithm he developed to support the idea of influence in rankings, artificial intelligence, and much more. Enjoy!
"I've gave up and said, "It's not gonna work. I don't think the information is available on Wikipedia. We're gonna have to do something else." And then by 6:00 AM, I had actually scribbled out the whole algorithm."” – Erik Larson
Tech entrepreneur and author, Erik Larson shares how he developed the idea for the ranking algorithm used by Academic Influence while reading Hannah Arendt. Using data from Wikipedia, Larson developed a way to show both how important a field is to an individual as well as how important an individual is to a field. This algorithm is the backbone of Academic Influence and allows us to quantify how influential our guests are. Larson also discusses how he got interested in computer science and artificial intelligence.
Check out our article on the Most Influential Computer Scientists Today to find out who’s leading the field.
And if you’re interested in pursuing an academic career in computer science, take a look at the following:
If you want to take a deeper dive into the fascinating topic of artificial intelligence, check out our article Controversial Topic: Artifical Intelligence.
(Editor’s Note: The following transcript has been lightly edited to improve clarity.)
Karina Macosko: Hi! My name is Karina Macosko from Academic Influence, and I’m here with Erik Larson. And you kind of came up with the original idea for ranking people based on how influential they are, and that is how we get our system for all of the people we interview on here.
But first, I wanna ask you, how did you originally get into computer science?
Erik: I was majoring in Math as an undergraduate, and so I had Philosophy going on one hand and I had Math going on the other hand. And sort of somewhere in there were all these interesting questions about Computer Science. And so I started delving into the philosophy of artificial intelligence.
And then as I continued doing Math work, I started getting interested in actually programming and engineering systems to exhibit intelligent behavior.
So I ended up... That’s the short answer. And then actually, when I was in graduate school, I taught myself to program, as I was studying, originally, Philosophy. And then I was hired by a high tech company, and that’s where I professionally increased my skills.
So as a graduate student, I started working as a software developer while I was finishing. And then I ended up switching to do... My PhD was primarily Computer Science, so it just eventually... It was sort of an, it was an evolution of interest, and then it became a kind of financial incentive as I was in Austin when there were very, very high-paying jobs for that sort of work. So as opposed to being a Philosophy professor, that was much better, that was much better for me, so yeah. I also have been...
Karina: Well, and what coding language...
Erik: Go ahead, yeah.
Karina: I’m sorry, go ahead.
Erik: No, no, it’s okay, go ahead. I’m done.
Karina: And what coding language did you start with, originally, that you taught yourself?
Erik: So I was working for my... So apart from the stuff we did in Math, which was really old, like Pascal, I was working for a company called Electronic Data Systems, which used to be owned by Ross Perot Jr. , who is a famous business person in Texas here. And at EDS, I was linking together all of their Excel spreadsheets in their call center using Visual Basic.
So my first programming language was this really embarrassing Visual Basic language, which is hardly a real language, but that was... I did a, I wrote thousands and tens of thousands of lines of code for EDS. And then my real language, when I transitioned out of that job, was Java, yeah.
So now, and now, I’m doing Python because sort of everybody’s doing Python, it’s like... So nobody, Java is not the language of choice anymore for my work, so yeah.
Karina: Wow! That is so interesting. And can you kinda give us the rundown of how you came up with this original idea for ranking people based on how influential they are.
Erik: Yeah, so I was actually reading... [chuckle] I was actually, and I remember distinctly, the idea sort of came, they had, when you think about ideas, there’s this what they say is a myth that you have this moment.
But in this case, that actually was true, I just sort of was invaded by this idea while I was reading something that seemed to be totally unrelated. So I was reading, the person, she was a philosopher in the mid-19th century or 20th century, The Banality of Evil. Do you know this phrase? Do you know who I’m talking about?
Karina: No, I don’t, but...
Erik: I can’t think of her name now. So she’s a famous philosopher, she was one of the disciples of Heidegger. And anyway, she went and covered the Nuremberg Trials. And she has a whole philosophy of technology that I was reading as background to write The Myth of Artificial Intelligence. So I was actually reading this person that has nothing to do with influence algorithms.
And it just popped into my head, I felt like, "Shouldn’t we be able to exploit the fact that Wikipedia publishes on influential people and mentions the topics that those influence... That mentions the primary topics where those influential people have influence?" And so that was the original idea.
And the story is kind of like... I mean it’s not like I invented the microchip or something, so let’s be clear that this is not the biggest idea in the world. But it is, it was very exciting how it came about.
I ended up stopping doing what I was doing. I went home and I stayed up all nights looking at Wikipedia pages and scratching out how we could use that to compute an influence score.
And I reached a couple of these points where I gave up and said, "It’s not gonna work. I don’t think the information is available on Wikipedia, we’re gonna have to do something else." And then by 6:00 AM, I had actually scribbled out the whole algorithm, and I texted Bill, who shall remain Bill, the Morpheus Bill person that I was working for. And I texted him at 6:00 in the morning and I said, "I have this algorithm worked out, and I wanna talk to you about it." And so when I had presented the idea to him, we started the company. And that’s what started. And then other people got involved and I became less important. [chuckle]
Karina: Wow! That is...
Erik: Yeah, I went to writing the book, yeah. Go ahead.
Karina: Yeah, and well, we just interviewed Dame Wendy on here, who is sort of an expert in Wikipedia. So it’s so interesting to compare those two.
But could you kind of explain how the algorithm works?
Erik: Yeah, I mean the basic idea of the algorithm is if you take a person page on Wikipedia and you start counting from the very first word of the first sentence, if you start counting until you hit a topic, what they do, that topic, roughly speaking, the number of tokens or words between the beginning of their description and the introduction of that topic will tell you how important that topic is to that person’s bio.
So if they mentioned that the person is a gardener in the first sentence, the person is probably an influential gardener, just intuitively. If they mentioned that the person gardens on their spare time and they’re into astrophysics, and gardening shows up in the reference or in the final sort of family life section or something at the end of the article, gardening is probably less important. So that’s the first half of the algorithm, is that the distance between the beginning and the occurrence of a subject matter will give you a rough idea of how important that subject matter is to the influence, the influence of the author, relative to that subject.
The second part is to flip it, and this was really the innovation, is to flip it and say, "If you go to that topic and you do the exact same thing and you start counting from the beginning of the description of the topic to the occurrence of the name of the person, that distance measured, combined with the original person page distance measured, in other words, person to topic, and then from that topic to person, as a weighted statistical combination of those two will give you a rough idea of how influential Wikipedia, and therefore, via sort of reasonable extension, the world, thinks that person is to that topic.
And so that’s the idea that I had when I was reading Hannah Arendt , that’s her name, A-R-E-N-D-T, Hannah Arendt. So that’s the idea that I had, reading this totally off stuff, and I don’t know where it came from, it just popped into my head. The idea was like, "There has to be some way of exploiting the information that’s sitting in Wikipedia to compute an influence measure."
"…I was trying to figure out a way to quantify what we mean by that, as a way to counteract the popularity measures that are very dominant on the web."” – Erik Larson
And I had been thinking about influence for a long time as a computer scientist, and I was trying to figure out a way to quantify what we mean by that, as a way to counteract the popularity measures that are very dominant on the web.
So Google, very broadly speaking, uses a popularity measure, which is to say that the more links into the web page, in other words, the more popular it is, the more authority that web page’s has. And all of major social media platforms also compute a kind of popularity or sort of insert popularity as the measure of relevance. So the number of likes on Instagram, and so on, is basically a popularity measure that is also supposed to give you an idea of influence.
"…you can have somebody that's an expert in string theory or something,but… They don't have a huge Instagram following."” – Erik Larson
And roughly speaking, it does. The problem is, is that you can have extremely influential people who are not, by and large, popular in modern or in broad media terms, right? Like you can have somebody that’s an expert in string theory or something, but they’re not sort of... They don’t have a huge Instagram following.
Erik: And so you wanna capture... There’s bubbles of really brilliant, influential people in the world that aren’t servicing on our major platforms, and my interest was in making sure that we could find them and we could develop software to find them. So that’s what happened. [chuckle]
Karina: Wow! That is really incredible! And to see what the whole algorithm has become, do you know if anybody else is trying to do the same kind of thing, ranking people’s influence, or even using Wikipedia to see how influential people are?
Erik: I don’t, there are many companies that are using, that have influence platforms, but they’re... And I’m not so much into the marketing of this anymore, so I’m not the best resource for this kind of thing.
But there are lots of companies that attempt to compute influence by basically getting a cross-section of that person’s footprint on the web. But in terms of being able to find authority or influence that’s measured in terms of professional expertise, I don’t think that we have very good measures of doing that, and I don’t think that the focus, the mainstream focus for other companies to do influence measurements, I think it’s primarily social media-based, still. That was certainly the case when I was doing this a couple of... Three years ago, and I don’t think it’s changed much.
So yeah, so it’s pretty unique. In fact, I think we, at one point, submitted a provisional patent, and I’m not sure what happened to that, but I think that that’s a subject of another discussion, but I think it’s potentially patentable material, so yeah.
Karina: Wow! And you’re probably pretty biased on this, but how good do you think the algorithm is at actually figuring out how influential people are?
Erik: Well, I stepped off the boat after we did the original implementation. There was a guy that I used to work with that I had recruited into the company that then left. And then we had actually DARPA funding in the initial stage. So we did a kind of an alpha version. And at that time, the results were pretty impressive.
So in general, this is... I think the product that you’re referring to has evolved or progressed since then. And I know the lead developer on that, and he’s very, very competent, very, very smart guy. And I’m sure he’s done wonderful stuff that I hadn’t thought of. [chuckle]
But even the alpha version, when I was involved, the first, which is the prototype version, the first thing, that actually would... It would reliably, if you put in a name, you would reliably get what, a sort of list of what that person was influential on a very, very large set of names because you’re dealing with the entire data set of Wikipedia, which is huge.
Karina: Right. [chuckle]
"Wow! It's magic! And we're not even using Google, it's not even a Google search."” – Erik Larson
Erik: And then conversely, if you put in a topic, you get a list of names that roughly, intuitively, would track people that are actively influential on those topics. So it was easy, even in the very beginning, like you would type in "molecular chemistry" and there would just, there’d just be this list of people that were just like really good molecular chemists. It’s like, "Where did they come from?" [chuckle] Like, "Wow! It’s magic! And we’re not even using Google, it’s not even a Google search. How did we do this?" Well, we exploited Wikipedia, and so... "Exploited" is a CS term meaning, "in a good way". It’s not like exploiting cheap labor, it’s, or offshore workers. It’s leveraging. We leveraged, so yeah.
Karina: Wow! Well, thank you so much for taking the time to talk with me. We interview all these people, but it’s really great to see how we get that list, how we figure out who’s influential in what field, so thank you so much.
Stay informed! Get the latest Academic Influence news, information, and rankings with our upcoming newsletter.