Illustration by Gayathri Satheesh.

AI Research has a Transparency Problem

AI research is held back by practices that distract it from its ultimate goal: the betterment of human life. To achieve the ambitious goals of AI, researchers must be allowed to return to their transparent practices, independent of tech giants.

Two years ago, I read my first deep learning paper. The paper was complicated and full of terminology I did not understand, but I comprehended the core premise: there was a model that could generate coherent, human-like sentences from a few simple words. I tested it out and was amazed by its performance. A few basic prompts gave me grammatically coherent and responsive suggestions. I typed in “The best football player in the world is …”, the model said Lionel Messi and continued with a glowing description about his excellent performance last season. I was sold. There was no question about it: this model was legit.
This paper was for a model called GPT-2. GPT-2 was released in 2019 and would pave the way for “large language models” (LLM), a specific form of deep learning based models that within the last four years have taken over the field of AI. LLMs, and deep learning models in general, are probabilistic black boxes, the specific details of their functioning as unfamiliar to their designer as to their user. Models are evaluated on their end-performance, not on how they are constructed. Usually, there is one particular metric (something that tells us how “accurate” the models are) to rank models. So, the best way to construct the best-possible model for a job is to use your prior intuition, combine things that have worked well together before and see what leads to the best performing model.
“State-of-the-art research” is often reduced to throwing things at the wall and seeing things that stick. It helps if you have more data and more resources: the larger the model, the better it performs. A cursory look at the state of academia in LLMs shows that the best performing models, and subsequently the most impactful publications, are from large corporations like Google, Microsoft and Facebook. Independent researchers, especially from smaller universities and institutions, cannot compete with the massive resources these tech giants can pour on a single research project. GPT-3, the successor to GPT-2 and one of the largest and most impressive LLM trained to date, was trained by OpenAI, an AI research lab with investment from Microsoft, and cost 20 million dollars to train.
Of course, there is nothing inherently wrong with intuition based research. The scientific method begins with a well-informed intuition and research has always been resource intensive. However, the problem in deep learning lies in the institutionalization of speculative claims. The prioritization of performance over understanding the model means that papers with a significant boost in performance are often published with wonky and speculative claims, based simply on intuition. If the authors of highly impactful papers make a claim, it is very likely to be propagated into subsequent papers as a fact.
The prioritization of performance metrics over analyzing the appropriate use of the model has already led to a significant problem. Models are trained on data, and their outputs can only be as good as the data. Often, datasets are biased, and the models subsequently amplify and perpetuate these biases. GPT-3, the most impressive LLM, was trained on the entire internet and has shown to have a significant problem in amplifying negative stereotypes. Given the close tie between deep learning academia and the tech industry, it is also disturbing to consider how quickly a new model can be deployed into a service and reach millions of people.
There is a larger problem when the research culture is set by large corporations. Private corporations often have interests that conflict with independent research. OpenAI, the creators of GPT, are an excellent example of how conflicting motives can perpetuate problematic practices. OpenAI was started in 2015 as a non-profit research institute, with the ambitious and admirable goal of making a fairer and better world through AI. However, like most non-profits, it quickly ran out of resources and, to attract investment, abandoned its initial non-profit status. With the status change, came a 100 million investment from Microsoft. When OpenAI released GPT-3, their most notable work, they had an exclusive deal with Microsoft. Instead of releasing the model to the public, as is tradition in most deep learning research, OpenAI opted for a closed beta API access. You had to apply for access, and even then, you would not get access to the entire model.
OpenAI justified this explicit lack of transparency by touting the model’s capabilities to generate human-like text and their concern about it being used for misinformation, scams and legal abuse. However, if the model was so dangerous, why release it through an API at all? A year after its release, essay writing websites have already claimed to have integrated the GPT-3 API into their system, and email generators powered by GPT-3 could easily be used for spam and phishing. Independent researchers, who would benefit from analyzing and building upon the model, are left without an invaluable resource, and research into ethical implications have to inefficiently query the API.
GoogleAI, the research division of Google, has also been embroiled in controversy. Timnit Gebru, a researcher at GoogleAI, wrote a paper criticizing the environmental implications and the pitfalls of LLMs. Despite passing internal and peer review, a group of product leaders at Google demanded a retraction of the article, citing vague objections regarding the quality of the article. Regardless of attempts to address the objections, Google insisted on the article’s retraction, which eventually led to Gebru’s public outcry and her subsequent departure from Google for “behavior inconsistent with the expectations of a Google manager.” While one can argue about the specific cause of Gebru’s dismissal, the fact that Google can interfere with an independent criticism of its own models does not bode well for research that delves into the ethics of LLMs.
In the late 19th century, the German state decided to use the newly formed science of fiscal forestry to maximize the revenue from their forests. They would replace the wild, unruly trees in the natural forest with trees having the largest yield of timber. Their failure to consider the rich, lively ecosystem necessary for the proper yield ultimately led to the collapse of the entire forestry industry. The history of science is filled with similar myopic decisions, where under the guise of rationality, complex systems are stripped of their inherent subtleties when modeled using underdeveloped sciences. LLMs aim to model language, possibly the most complex system devised by humanity. It is essential to not let their development fall solely on corporations whose main interest is to master legible control for financial gain. While corporations are going to move vertically, creating larger and more complex models, I think it is essential for independent researchers to move laterally, and focus not only on questioning specific LLMs, but also in theoretically analyzing the way in which the specific components building a LLM work.
Prajjwal Bhattarai is a contributing writer. Email him at
gazelle logo