|
Ethics and Legality of Generative AI in Journalism Lucas Eaton, Howard Community College Mentored by: Bethany Pautrat, M.A. |
Abstract
Journalism is a field that has persisted throughout ages of technological advancement and has become ever more accessible and efficient throughout the centuries. News journalism has a responsibility to deliver accurate, up to date, impartial information to its consumers. With the advent of generative Artificial Intelligence (generative AI) tools that can automatically create text, audio, images, video, and other forms of media based on a written prompt, journalistic institutions have been forced to adapt. This paper will analyze the breadth of uses for generative AI in news journalism, the opinions of industry groups on generative AI, expert opinions on the copyright status of and use of generative AI, and the ethical codes of news companies with the goal of finding in which circumstances generative AI tools are legally and ethically acceptable for use in the workflows of these news companies.
Introduction
Generative AI tools are machine learning programs that have the ability to create and modify various forms of media, including text, images, video, and audio [1] automatically based off of a written “prompt” describing what the user wants the generative AI program to output [2]. These outputs are based off of data, such as text or images, that the generative AI program is “trained” on in order to create a human-like response mimicking patterns in its training data [1]. Generative AI systems are already used in some fashion across the journalism industry, being used for everything from writing stories to summarizing research [1] to translation. The Associated Press, one of the largest institutions in journalism, has been using AI tools for “automatically generating” stories about corporate earnings since 2014 [3], and in 2023 made an agreement with OpenAI, the creators of popular generative AI tool ChatGPT, to essentially trade a license for part of the AP’s enormous text archive for OpenAI’s “technology and product expertise [4].” It is apparent at this point that, barring unforeseen circumstances, generative AI has received industry approval and is here to stay.
However, despite their increasingly common use in researching stories, compiling data, writing, and other parts of the journalism workflow, the implementation of these tools in the journalism industry is a controversial topic. Its unrestricted use faces opposition from journalism-related industry groups such as Reporters Without Borders (RSF) and the National Writers Union (NWU), experts in the AI and journalism fields, and various news companies.
When it comes to generative AI, RSF is mainly focused on ensuring that AI tools are truthful, accountable, and safe [5], whereas the NWU is more focused on protecting the livelihoods and rights of writers it represents [6]. Experts are concerned about the ethics of feeding people news that AI has had a hand in creating, the copyright status of AI generated works, and if these machines can be trusted to provide factual, reliable information. The New York Times, Washington Post, USA TODAY, CNN, the Associated Press, and various other companies all have standards restricting their own use of generative AI tools in some way [7, 8, 9, 10, 11].
However, none of these organizations rule out its use as a whole, with the AP proudly displaying its history of using AI tools for its journalism efforts on a dedicated web page [3]. Despite all these concerns, the industry seems intent on embracing, or at least accounting for, this generative AI future.
This paper attempts to analyze, compare, and contrast the opinions and actions of these news companies and industry organizations on the use of and effects of generative AI on the journalism industry. In addition, it analyzes expert interpretations and analysis of copyright law and the ethics of the use of generative AI systems for journalists to come to a conclusion on the ethics and legality of the use of generative AI for journalistic purposes. Additionally, it offers an example of a theoretical generative AI model that would satisfy the requirements of all of these organizations.
Methodology
This paper employs a review of prior research on the ethical considerations of using generative AI tools in journalism, the copyright status of the outputs of generative AI tools, and the legal and ethical status of AI training data. In addition, it employs articles from the National Writers Union (NWU) and Reporters Without Borders (RSF) on the topic of generative AI. Finally, it uses a review of the ethical guidelines set by individual news organizations in the use of generative AI. It takes the opinions and actions of all of these parties into consideration and seeks a conclusion on what uses of AI in journalism are acceptable and which uses are not.
Ethical Overview from the Perspective of Experts
A computer cannot be held accountable. It is generally agreed upon that the unrestricted, blind use of generative AI tools in journalism is an unwise decision. Fears of misinformation, copyright issues, the ethical sourcing of AI training data, and more permeate the writings of scholars in the field. However, many also agree that AI tools can be ethically used in one form or another in news journalism, even if the industry needs to amend its values of accuracy, impartiality, and accountability [5], with one paper saying that the use of generated AI for journalism “…challenges traditional conceptions of journalistic practice and necessitates the development of new ethical standards [1].” Despite these concerns, the industry has begun to use these tools in numerous ways.
A large area of concern is the possibility of AI created misinformation being published by news sources that people consider dependable. Writers Shi and Sun [1] claim that generative AI is a threat to the credibility, accuracy, and quality of journalism, and others agree, with Noain-Sánchez [12], who interviewed a collection of experts in the field, noting that the interviewed experts “…place especial emphasis on the role of the human journalist as an irreplaceable agent and as the professional that must supervise AI outputs…,” viewing AI tools as fallible and viewing it necessary to fact check its outputs to prevent misinformation, which AI tools will often write with complete confidence. These experts consider AI tools to be imperfect, but believe that their upsides, such as being very efficient at crunching data, outweigh their downsides. They consider them just another tool to augment the ability of human journalists rather than a true replacement due to this possibility of misinformation. They also generally agree that writers and editors must keep an eye open for incorrect information, even if an AI tool is being used to compile and sift through data rather than write independently.
Tomlinson, Patterson, and Torrance [13], writing for The San Diego Law Review, bring up fears of purposeful information. They consider the possibility of the training data the AI bases its outputs on being purposefully skewed to push an agenda or produce purposeful misinformation, claiming that “Any biases, inaccuracies, or malicious alterations that are present in the initial dataset, or introduced into the training set… have the potential to compromise the integrity and trustworthiness of the AI model.” A biased training set, whether done accidentally or on purpose, could lead to news articles being created that omit certain information. This could be difficult to catch by conventional fact checking, as it may not be technically wrong, just missing part of the full story. The authors also consider that biased training sets could cause
AI-generated news to disproportionally focus on topics that paint certain groups in a bad light, saying that “An AI-generated news summary might disproportionately focus on crime stories involving specific racial or ethnic groups if the training data overrepresent such stories, perpetuating harmful stereotypes and misconceptions [13].” AP is already piloting using AI to generate summaries of its articles [3], so organizations creating AI generated summaries of the news in general may be commonplace soon. This is an issue that would need to be solved by AI companies themselves, and journalists and the organizations they are a part of would need to choose their AI tools carefully to avoid these kinds of issues. However, several AI companies, including OpenAI and Google, claim to make attempts to avoid these issues. OpenAI claims that they “…teach our AI and implement filters to help prevent it from generating biased and harmful outputs [14].” Google says that they are “Employing rigorous design, testing, monitoring, and safeguards to mitigate unintended or harmful outcomes and avoid unfair bias [15].” In theory, this should mean that at least with these specific companies’ AI models, the fears of these authors should not be an issue, though OpenAI and Google are careful not to rule out the possibility, just say that they attempt to avoid it. This further proves the point of Shi and Sun that writers, editors, and readers all need to focus on fact checking in the age of AI journalism, as it is difficult to truly avoid all misinformation when AI is involved [1].
Copyright Overview from the Perspective of Experts
Another perspective that experts have focused on is the legal status of AI generated works and the AI models themselves. As current US copyright law does not consider AI generated news copyrightable material, there is a push for it to do so. If that initiative is successful, it could have far-reaching consequences in the journalism space. In the United States, copyright is not granted to any work not created by a human [16]. This means that presently, it is difficult in the United States to copyright a work generated by an AI model. An example of this is the rejection of copyright is seen with two different AI generated artworks, one with an AI listed as the only author and one with an AI listed as a co-author, both rejected for the reason above. Kuai [16], explains the reason this is as: “…in protecting human creative work as a practice, all copyright laws uphold the anthropocentric value of insisting on originality and creativity. This could potentially put automated news in a legal quagmire since the fact-based nature of news and the journalistic pursuit of factualness may come at odds with being ‘original’ or ‘creative’.” Essentially, Kuai is saying that this idea of copyright as a method for protecting the work as a human would mean that for the output of, for example, an AI chat bot to be copyrighted, it would need to be recognized as the work of either the person who typed in the prompt that led to the output, or some other party.
In “Blurring the lines: how AI is redefining artistic ownership and copyright”, authors Watiktinnakorn, Seesai, and Kerdvibulvech [17] state that based on a survey of legal professionals, creative professionals, and others, “humans and machines are now collaborators in the creative process. As a result, both parties involved rightfully deserve recognition and protection under copyright laws.” If this concept becomes the new standard of copyright, it could have a major impact on the use of AI in journalism. News companies may be less likely to heavily use AI tools if they are unable to copyright the contents. This idea is not without opposition. The authors found that “Among the eight individuals engaged in art production, a unanimous sentiment emerged, stressing the necessity and critical importance of enacting legal regulations to govern AI. They emphasized that the ramifications of AI directly impact their professions [17].” These ideas, though professed by artists, can be assumed to carry over to many other creative professions, including journalism. If the writers ideas are reflected in law, their fears may lead to the continuation of the status quo when it comes to copyrighting AI generated works. On the other hand, the legal professionals surveyed believe that there should be exceptions in copyright law to favor the use of these technologies [17]. If this does happen, it could lead to an explosion of AI generated news, though if news companies follow the prior advice of experts and keep a close eye on the outputs of their AI tools, they may be unable to increase throughput as much as would otherwise be possible. However, even if these exceptions are implemented, it is uncertain if AI generated news would even be a copyrightable work due to its pursuit of factualness over being original and creative [16]. This would make the entire copyright conversation irrelevant in terms of the news media unless it is decided that AI generated news is a creative work in the same way AI generated artwork would be.
The beliefs of the legal experts interviewed by Watiktinnakorn, Seesai, and Kerdvibulvech on the copyrightability of AI works are not without theoretical backing. O’Callaghan, writing for the Cornell International Law Journal [18] claims that the USA follows a philosophically utilitarian approach to granting copyright, granting it based on what would maximize the economic incentives to create. O’Callaghan [18] follows this up by arguing that AI systems, despite not needing economic incentives to create works, do need economic incentives to be created in the first place by AI companies. The author says that due to this, allowing AI generated works to be copyrighted would have a trickle-down effect to the creators of the AI model, as the people using the AI would be able to justify spending money on it due to being able to take advantage of the fact that they can copyright the outputs created by the AI. This would fulfill the idea of economic protection as a reason for copyright. It is of note that this argument does assume that generative AI companies should be economically incentivized to create their models in the first place. The author also includes arguments against the idea of needing economic advantages, acknowledging that there are those who believe that AI companies do not need that extra help, as they already have enough economic advantages. One example they give is that the AI systems themselves are already protected under intellectual property law. [18]. If these generative AI outputs are considered copyrightable, AI companies would have various advantages immediately. There is less human input needed for their outputs compared to traditional writing and research, creating a competitive advantage for AI tools over human journalism, where multiple people can be on the payroll to write one article. In this case the AI tools would be competing with human staff for “jobs,” and AI companies would have a market in companies looking to decrease their labor costs. This would have enormous consequences for journalists.
The final realm of copyright for AI is the use of copyrighted works of training sets, which is the set of data, such as websites, books, articles, images, and other material that the AI bases its outputs on. Spica [19] argues that based on legal precedent and how generative AI functions, the use of copyrighted works for training generative AI falls under the fair use doctrine. She does, however, believe that “…creators still have a valid copyright infringement claim against individuals claiming authorship over AI-generated output that exhibits a substantial similarity to the creator’s copyrighted work.” In short, her conclusion is that whatever the AI is trained on does not matter as long as its output is not “plagiarizing” the works on which it was trained. Only the output matters, not the source. On the one hand, this idea means that journalists would not be able to get compensation for the works of theirs that are trained on by AI companies. On the other hand, they would be able to reasonably assume copyright for anything they create using generative AI tools, assuming it is in fact not too similar to a preexisting document, making it a more valid tool for doing serious work. This would also mean that there is even more of an incentive to proofread AI outputs, as they would need to ensure that the AI outputs are not too close to existing works in addition to fact checking.
Positions of Industry Groups
Various groups relevant to journalism have taken stances on the use of AI in the field. One of these groups is Reporters Without Borders (RSF), an international non-profit with the goal of promoting freedom of information [20]. They have published, in collaboration with other organizations, the Paris Charter on AI and Journalism. A second group is the National Writers Union (NWU), which represents and advocates for journalists, authors, and other writers in the United States. They have published their Platform and Principles for Policy on Generative AI on their website, taking a position in defense of the livelihoods of the authors [6]. Both of these documents make a statement about the use of generative AI and provide guidance on its use while forwarding the respective organizations’ goals.
Reporters Without Borders intends the charter to be a guide to journalists and companies worldwide when using AI in journalism. They make it clear with the first principle that AI cannot change the core values of journalistic ethics. They quickly corroborate the necessity of accuracy stated by Shi, Sun [1], and others, along with the necessity of unbiased outputs of Tomlinson, Patterson, and Torrance [13], stating that the use of generative AI cannot mean the rolling back of ethical standards of accuracy, non-discrimination, and impartiality [5]. In addition to this, they make it clear to the reader that “media outlets are always responsible for the content they publish” and that the media needs to be transparent about where and how AI was used [5]. When it comes to the AI systems themselves, they believe that AI systems employed by the media should be evaluated by the media company and third parties to ensure that they adhere to their standards of journalistic ethics, and that they need to “respect privacy, intellectual property and data protection laws [5].” Another important part of this charter is that RSF believes that people and organizations involved in journalism should be “included in any global or international institutional oversight of AI governance and regulation [5].” As journalists are some of the people whose creations have the largest reach and influence that would have a reason to be using AI tools, along with having a major stake in how they collect and output information due to copyright and ethical issues, it makes sense to include journalists perspectives in these conversations. The final important part for this charter is one that is not popular among AI companies. The charter states that “AI system owners must credit sources, respect intellectual property rights, and provide just compensation to rights holders. This compensation must be passed on to journalists through fair remuneration. AI system owners are also required to maintain a transparent and detailed record of the journalistic content utilized to train and feed their systems [5].” Generally, AI companies treat all information on the internet as if it is freely available and theirs to use as training data [6]. RSF does not consider this the case. They want the people who own the content AI is trained on to be fairly compensated for their work, and for AI companies to be transparent about their use of specific “journalistic content” for training [5]. Overall, they want a better deal for journalists in the age of generative AI while keeping them accountable for what they create using those tools. To summarize, they want to allow the use of AI if and only if it is used ethically by their standards.
The National Writers Union (NWU) is an organization which seeks to protect its members before all else. The NWU makes both ethical and legal statements in support of their generally negative stance on AI within this document [6]. They view of AI as a threat to their livelihood and role in their field, with one of the first passages stating, “As generative AI technologies expand to displace human creators of almost every type of copyrighted work, we must remember not to leave any creative worker behind.” The NWU [6] views generative AI as a force that must be mitigated and agrees with many of the aforementioned authors that generative AI cannot be used without a human element. With this organization’s opinions added to the pile, we begin to see a consensus forming on the use of generative AI in the creation of news reporting. The NWU shares a fear of biased training data with Tomlinson, Patterson, and Torrance [13], saying that “generative AI reproduces, and sometimes enhances, pre-existing social inequities and biases such as racism, sexism, homophobia, transphobia and more [6].” They do not provide any solutions for preventing these issues when using generative AI tools, only including them as an acknowledgement of the fact they exist. RSF’s [5] advice on evaluating AI systems before using them would help prevent these issues.
Many of the authors whose ideas are considered in this paper only considered misinformation and copyright on the output side of the generative AI equation. NWU, in its goal to protect writers, also considers the sourcing of the training data for these AI models. They claim that “Generative AI works because it ‘ingests’ voluminous amounts of human-made creativity – the work of millions of human lives – which should be protected from exploitation and erosion. On this issue, even more than in other copyright debates, our humanity matters [6].” This is a call to action from a moral stance rather than a legal one, introducing the idea that AI companies should not have free reign to use whatever they want for training AI models. This is a sentiment echoed by RSF [5] and others, and is clarified in a later sentence: “Right now, generative AI companies are benefiting handsomely from algorithms they’ve trained on millions of pieces of our work that they haven’t paid a cent for, even though without the work of creators as input, these systems would not work at all [6],” introducing the core of their argument. They believe that the use of their creations as training data without consent is tantamount to theft, and they want to be compensated both for the ingestion and the continuous usage of the content [6]. This is at odds with the thoughts of Spica [19], who concluded that the use of copyrighted material for training AI falls under the fair use doctrine. The NWU specifically petitions for the government to, either through law or statute, declare using copyrighted content for AI training not free use. This is a logical conclusion for the NWU, as even if others have concluded that it is fair use, it is the job of the NWU to advocate for its members.
When it comes to the authors own use of AI, The NWU [6] also agrees with the RSF [5] that AI generated content should have a transparent dataset but go a step further and say that the output of the AI should provide attribution to training data used as “sources” for the output. In addition, they agree with the RSF and other experts that AI generated content must be labeled as such. Once again, a consensus begins to form in the industry. Generative AI outputs used in journalism must be fact-checked, unbiased, and labeled as AI generated content as to ensure they are not misleading the reader on the nature of both the content itself and the information within, whether unintentionally or otherwise.
One thing that industry experts, groups representing journalists, and scholars can all agree on is that the AI cat is out of the bag, and there is no putting it back. The NWU states that, “Where we cannot protect jobs from displacement by AI, we must ensure that we’re providing pathways to safe, just, and accessible economic opportunities [6].” They, just like all other parties, have concluded that AI here to stay and they must adapt to the best of their ability. They now need to race to get the best possible outcome for themselves, as trying to avoid it all together will result in a greater loss for themselves than embracing it where they can and fighting it where they feel they need.
NYT vs. OpenAI
In December 2023, the New York Times Company filed a lawsuit against OpenAI, the creator of ChatGPT, along with Microsoft, of its largest investors. The suit alleges that OpenAI illegally used its copyrighted works when it trained its AI [21]. The case has yet to be decided as of January 2025. However, it shows that the NWU [6] is not alone in its view that non-consensual AI training is theft. In addition, the Times’ claims are in certain ways supported by the ideas of O’Callaghan [18] and Spica [19]. The NYT claims that due to the ability of these AI tools to reproduce content very similar to the Times’ articles, these generative AI products are causing financial harm to the Times by providing access to its reporting without paying the Times for the content [21]. As reported by NPR [22], an OpenAI lawyer has stated that the infringement the Times refer to only happened after “’thousands of tens of thousands’ of queries. In essence, [The OpenAI Lawyer] argued that the publishers primed the chatbot to spit out text that was lifted from the publishers’ websites [22].” If true, this would mean that the Times’ claims of too-similar reproduction are not as applicable in the real world as they want you to think, which could reduce the credibility of the New York Times’ arguments.
However, if they can do it, that means, in theory, anybody can, effectively circumventing the New York Times’ paywall. If it is possible and reproducible, it is an actual issue, and an infringement of copyright despite the use of the training data itself being fair use as asserted by Spica [19] essentially meaning the New York Times Company wins unless OpenAI finds a way to prevent that entirely, or possibly enough that it is inconsequential. O’Callaghan [18] argues that AI companies need a financial incentive to make generative AI products, and claims that making it possible to copyright AI generated material would provide that incentive due to increased commercial activities. The NYT turns this on its head by claiming OpenAI’s use of its copyrighted works damages its business by reducing its financial incentive to produce its content [21] under the utilitarian United States copyright philosophy. In theory, due to this idea OpenAI’s ability to use their copyrighted works without payment would constitute a loss to the “economic value of creativity and innovation [18]” of the Times, and possibly provide a reason to create an exception to the fair use doctrine in the case of AI training as requested by the NWU [6], as it could shift the playing field between OpenAI and the New York Times in a way that promotes the more original, creative endeavors of the NYT.
Present Uses of Generative AI in Journalism
AI has been used for over a decade in the news industry, with the first widespread use of it for published pieces being the aforementioned case of the Associated Press automating the creation of financial reports. However, the organization has since extended its efforts to use AI into other applications, including, as of the ninth of December 2024, automating sports writing, video transcriptions, and timestamping videos. Additionally, AP is working on both using it to follow trends on social media and using it for image recognition [3]. Though AP is the company that advertises its use of AI most heavily, other major publications are not ignoring the possibilities that AI brings with it. The New York Times [7], Washington Post [8], USA TODAY [11], CNN [10], and FOX [23, 24] all are either known to use generative AI, or, (with the exception of FOX), have considered the possibility enough to publish the standards surrounding its use in their newsrooms. Due to its self-admitted heavy use of AI compared to what other media outlets advertise, AP will be the company to which other companies AI use and policies will be benchmarked in this paper, along with the industry groups.
The Associated Press [3], in addition to its page explaining how it uses AI tools in the newsroom, has two pages on AI ethics. Near the top of their code of ethics, they state that “the central role of the AP journalist – gathering, evaluating and ordering facts into news stories, video, photography and audio for our members and customers – will not change. We do not see AI as a replacement of journalists in any way [3],” and specify that despite their use of AI, they are still committed to their previous ethical standards and have not lowered them to promote the use of AI. They have a strict stance on the publishing of AI generated content, saying that “Any output from a generative AI tool should be treated as unvetted source material [3].” As AI tools do not necessarily “know” where their outputs come from, this is an understandable policy, especially when accounting for the fears of misinformation presented by various scholars in the field [1, 13]. AP also treats AI generated imagery as per the guidelines of industry experts, including the RSF [5], stating that they will not use “AI-generated images that are suspected or proven to be false depictions of reality [3],” only making an exception for if the topic of the news story is the image itself, in which case they will label it as AI generated. This policy is one that is agreed upon by the industry at large and follows the ideas behind principle three of the Paris Charter on AI and Journalism [5]. Overall, the only part of the RSFs guidelines on the journalism itself that is not stated in APs guidelines is principle three, which requires that “AI systems used in journalism undergo prior, independent evaluation [5].” It is expected that the Associated Press, one of the largest news agencies in the world, would want to follow these guidelines as closely as possible to preserve its reputation in the age of AI, while not missing the opportunity to strengthen and optimize its own operations.
Despite its lawsuit against OpenAI, The New York Times Company [7] states that “Machine learning already helps us report stories we couldn’t otherwise, and generative A.I. has the potential to bolster our journalistic capabilities even more.” This suggests that the NYT is currently using AI in its reporting in some fashion. They do not elaborate on what stories AI is helping them report that they could otherwise not. However, the Times does establish what they believe are benefits to its audience, saying that “The Times will become more accessible to more people through features like digitally voiced articles, translations into other languages and uses of generative A.I. we have yet to discover [7].” None of these use cases for AI are out of the ordinary and are possibly not as prone to misinformation compared to writing articles or doing research with generative AI. When it comes to the standards they hold themselves to, The Times claims that they must only use AI “With human guidance and review [7],” stating that “Generative A.I. can sometimes help with parts of our process, but the work should always be managed by and accountable to journalists [7].” This concept of strict human oversight is corroborated by the opinions of the RSF, along with Shi and Sun [1]. Overall, the New York Times follows the same track as The Associated Press, just without the advertising of what it specifically uses AI for, such as APs writing of headlines. Just like AP, they want to have the opportunity to use AI in their operations while maintaining their reputation and credibility.
CNN’s [10] guidelines are fairly similar to those of AP and the NYT, stating the familiar commitments to transparency and accuracy. They say that they will, “clearly signify to our users and audiences when they are seeing, hearing or reading AI content [10],” matching the RSFs guidelines on disclosing the use of AI [5]. When it comes to accuracy, they state both that they are committed to accuracy in their journalism and that they have human oversight and guardrails for AI. Though they are not as specific as other organizations on what kind of oversight and guardrails they have, they do state that their “…employees have oversight and responsibility for the AI systems and tools used, and the content they produce [10].” This, in theory, would put them in line with expert recommendations for fact checking. They do not state specific processes, but CNN [10] does say that they hold their employees accountable for AI generated content, which would presumably lead to a similar effect to the more specific guidelines of other companies if enforced. CNN does take one step further than many of its peers and the RSF [5] when discussing AI however, saying that they are “committed to fair representation and diversity [10].” This, depending on the interpretation, could help them counter fears that generative AI tools could produce information biased against certain groups due to flawed training data [13]. If their writers and editors are keeping an eye out for a lack of, “…fair representation and respect for diversity…[10]” when “…utilizing innovative technologies such as AI tools and services [10].” If these guidelines are followed, they could catch more obscure issues of bias due to their employees enforcing this policy than they would if they were just checking responses for pure misinformation. Overall, CNNs policies around the use of generative AI are largely in line with the recommendations of the RSF [5], though, as with the others, they do not specifically state that they follow the third principle of the Charter and will only use AI tools have been independently evaluated. However, they do state that “We undertake rigorous due diligence to evaluate our internally built tools and potential partners to ensure they adhere to both CNN and Warner Bros. Discovery’s AI principles [10],” suggesting that they do perform their own investigations into the AI tools they use. Though not exactly the same as what the Charter recommends, it is a step above using tools of unknown credibility.
The Washington Post [8] has fairly similar published policies to the above outlets. They say they will be transparent about AI, not attempt to pass off realistic AI generated images, video, or visual works without disclosing its use, and verify any information sourced from AI tools [8]. The specific wording of “Employ AI to generate images, video or visual works that purport to represent reality [8]” leaves open the possibility of using AI to generate imagery that does not purport to represent reality without labeling it as AI generated.
USA TODAY [11] has guidelines akin to those of other news outlets for the use of AI tools, though their published policies appear intended to be used by their employees themselves in addition to being read by third parties and have more specific wording than the policies of some of their peer organizations. They have similar ideas of fact checking AI generated content and not treating it as a source. However, they go a step further than certain other outlets and say that they do “…not use AI-generated photo-realistic images in our news coverage [11].” Some news outlets, such as AP [3] have not gone as far, and allow it with disclosure under specific circumstances. In addition to this, similar to CNN [10], they attempt to be representative and diverse in their reporting when using AI. However, USA TODAY and specifies that AI generated content in particular must not “discriminate against any individual or group based on race, ethnicity, religion, gender, sexual orientation, or any other characteristic [11].” This is a much more specific stance than the one taken by CNN, which simply states, “We are committed to fair representation and diversity [10],” and puts USA TODAY, based on published policy, at the top of the list when it comes to outlets that may share Tomlinson, Patterson, and Torrance’s fears of biased data sets [13]. If USA TODAY is specifically looking out for biases in AI generated content in addition to misinformation, they would be the most resistant of all of these companies to training set manipulation, hindering the possibility of it causing biased results towards or against one group or another showing up in articles.
Several trends form across these outlets’ policies when compared to the recommendations of RSF [5]. All the AI use guidelines require at some level that AI generated content be checked for factuality, matching with the recommendations of principles two and four of the Paris Charter on AI in Journalism. Most of these outlets additionally require AI generated content to be labeled as such, especially if it is an image, fitting with principle five, and to an extent, principle seven. The only one of these outlets that explicitly rules out realistic looking AI generated imagery in line with principle seven is USA TODAY [11]. Other outlets either only rule out doing it when the image is not a part of the news itself, (such as AP [3]), or simply do not mention AI generated imagery as a discrete concept. Outlets attempt to keep the door open for the use of non-realistic AI generated images, which does not violate the recommendations of RSF [5]. However, looking at the debate ethically, use of AI generated imagery may be considered immoral by the NWUs demands of compensation for the creators of the works used to train AI models [6]. Another principle not followed by all media outlets is principle three, which recommends that “AI systems used in journalism undergo prior, independent evaluation [5].” Most of these media companies do not state that they review the tools they use for “adherence to the core values of journalistic ethics [5].” Though it can be assumed that the outputs would be screened for violations to the codes, this is a step that is missing from the policy pages of many major outlets. Overall, the majority of major news outlets in the United States analyzed in this paper are mostly aligned with the recommendations of Reporters Without Borders.
Further Discussion And Conclusions
Generative AI has taken the journalism world by storm over the past several years, and just about every organization has adapted its operations and policies to this new paradigm. There are growing conflicts between journalists and others in more creative disciplines, the organizations that employ and represent them, and the AI companies who are at the forefront of bringing AI tools to the masses – and to the journalists themselves.
The alignment of news agencies in the United States to the recommendations of scholars and its own industry groups is not exact. One guideline that the grand majority of the major news outlets included in this paper can agree on is that, as suggested by Shi and Sun [1], they needed to update their codes of ethics to account for AI generated content. In general, these organizations adopted similar ideas to those suggested by Reporters Without Borders [5]. The New York Times, The Washington Post, USA TODAY, CNN, and The Associated Press have published promises to at some level label AI generated content as such, and fact check AI generated content in line with RSF recommendations. Only USA TODAY explicitly forbids the use of realistic looking AI generated imagery in all circumstances as recommended by RSF. Most other organizations have looser limits on their ability to publish AI generated realistic imagery. However, there are gaps in the AI policies of these companies. A minority of these organizations claim to specifically attempt to avoid bias and discrimination in AI outputs. Even if these concepts are forbidden elsewhere in an ethical code, as humans are not writing AI outputs, extra care must be placed into verifying there are no biases inherent in AI generated or assisted works. Additionally, many organizations do not state in their published ethical codes that they are using independently evaluated AI tools, putting them again at odds with RSF guidance, and furthering the risk of data set manipulation, which can lead to tainted, biased outputs.
The final two principles of the Paris Charter on AI and Journalism are in part related to the ongoing conflict of interests between AI companies and journalists. AI companies want free reign over their models and free use of any data they can gather, regardless of licensing.
Organizations such as the National Writers Union and Reporters Without Borders both advocate for AI companies to credit the sources of the data they use to train AI models, (which includes news reporting), and compensate the rights holders of that data. AI companies claim that their use of news as training data falls under fair use in the United States, which is why The New York Times Company has sued OpenAI and Microsoft for using its reporting to train AI without consent, and the outcome of this case will most likely be the deciding factor on whether journalists will ever get the compensation and rights they believe they deserve. The case is far from decided, however Spica [19] believes that the case will fall in favor of OpenAI, preventing journalists from getting the compensation their representative organizations believe they deserve. Despite this, it may not be exclusively bad for journalists as Spica, along with other experts, believe that although currently AI generated works are denied copyright in the United States due to the lack of a human author, provided the output of a generative AI tool is sufficiently original, the law may end up shifting to favor the people behind an AI generated work. This would be beneficial to news organizations, who would otherwise have more difficulty copyrighting news created with AI assistance than they would have with human-written content.
Despite the industry at large following the ethical guidance of the experts, there are still gaps that should be filled. Any news organization that does not have a published code of ethics or policies in relation to AI should publish one for the sake of transparency. Accountability is also important, being included in principle one of the Paris Charter on AI and Journalism. As a computer cannot be held accountable, the people who create, edit, and publish works created in conjunction with generative AI tools must be held accountable for misinformation, bias, or other issues with the work, even if they were created by the generative AI tool. In addition, many of these news organizations do not specify how, if at all, they screen AI tools before beginning to use them. Publishing this information is a next step that would further the credibility of AI assisted news writing and help catch issues with a model’s factuality and bias before putting them into action in the newsroom. This would also help prevent the effects of data set manipulation, by ideally catching issues before they start and allowing outlets with concerns about bias and discrimination in their AI outputs to check for them ahead of time. This could benefit the news companies materially too by reducing the probability of needing to rewrite or re-research a news piece due to issues caused by a flawed, discriminatory dataset.
Another possibility to help prevent the issue of misinformation and biased data sets is proposed by Dierickx et al [25]. These authors believe that a method to keep AI tools aligned with the goals of accurate, fair, and transparent journalism would be by making them smaller, more specialized, and more carefully made. By better vetting the data that goes into the models to prevent misinformation from being trained on in the first place, combined with developers following their framework for designing AI tools, they believe that journalists could more confidently and safely use AI tools in their work. All of this would also bring the outlets more in line with the guidance of the RSF, who prioritizes, accuracy, fairness, and transparency. The consensus of the industry is that there is no concrete ethical or legal reason as of the present day to not use AI in journalism, if, and only if, the outputs are vetted properly. The ethics and legality of training off unlicensed data, however, are still being debated.
Creating a model following the Open Source Initiative’s (OSI) [2] definition of open source AI, while also only using licensed data, would be a theoretically concrete way to create a truly transparent and fair AI model and is possibly the most universally ethical approach. The OSI requires that information on what data was used to train the model, along with where and how to obtain it, be publicly available [2]. This would help with transparency and theoretically make it easier for third parties to look for issues with bias in the training data. In combination with all training data being properly licensed, the model would be aligned with the opinions of the NWU and RSF on the matter of proper compensation for the rights holders of the training data, transparency, and non-discrimination. A model created under these conditions would not be guaranteed to only create factual outputs. Despite this, due to the transparency of the training set, it would be easier to find root sources of misinformation.
The discussion around the use of AI in journalism is rapidly developing. Journalists, AI companies, and institutions are all still figuring out their place in this new world. AI companies and writers are in conflict over the ethical sourcing of training data, and the copyrightability of AI generated works is a subject of contention. As these groups fight for their interests, laws and concepts of ethics may shift. However, in a field as important as journalism, it is important that ethical guidelines are upheld as much as humanly possible. Though it is yet to be fully seen to what ends generative AI tools, journalists, and the law will adapt themselves to suit the other, a few things are clear. Fact checking of AI outputs, and transparency when using them, is necessary. Journalism needs to continue to be a reliable, impartial, source of information, and most major institutions claim to be committed to this. Without these things, AI generated journalism cannot be trusted, but with them, there is no concrete reason not to.
Further Research
After the New York Times’ lawsuit against OpenAI and Microsoft is decided, some of the speculation in this paper will be outdated, but I believe more opportunities will open up for research in their absence. In the future, researchers could examine the real world practicality of smaller, more focused models for journalism as suggested by Dierickx et al, and examine exactly what it would take, technologically and financially, to create a functional real world AI model that satisfies both journalists and AI companies. Additionally, future research could be done into how to compensate the the rights holders of data used to train AI, and how much to fairly compensate them for.
Acknowledgements
I would like to thank Bethany Pautrat and the rest of the HNUR staff involved in the Rouse Scholars Program for their mentorship and support, and for convincing me that this was a topic worth writing a paper about.
Contact: lucas.eaton@howardcc.edu
References
[1] Y. Shi and L. Sun, “How Generative AI Is Transforming Journalism: Development, Application and Ethics,” Journalism and Media, vol. 5, no. 2, pp. 582–594, Jun. 2024, doi: https://doi.org/10.3390/journalmedia5020039.
[2] Open Source Initiative, “The Open Source AI Definition – 1.0,” Open Source Initiative, 2024. https://opensource.org/ai/open-source-ai-definition
[3] “Artificial Intelligence,” The Associated Press. https://www.ap.org/solutions/artificial-intelligence/
[4] The Associated Press, “AP, Open AI agree to share select news content and technology in new collaboration,” The Associated Press, Jul. 13, 2023. https://www.ap.org/media-center/press-releases/2023/ap-open-ai-agree-to-share-select-news-content-and-technology-in-new-collaboration/
[5] Reporters Without Borders, “RSF and 16 partners unveil Paris Charter on AI and Journalism | RSF,” rsf.org, Nov. 10, 2023. https://rsf.org/en/rsf-and-16-partners-unveil-paris-charter-ai-and-journalism
[6] “Platform and Principles for Policy on Generative AI,” National Writers Union, 2023. https://nwu.org/issues-we-care-about/generative-ai/ (accessed Feb. 13, 2025).
[7] S. Dolnick and Z. Seward, “Principles for Using Generative A․I․ in The Times’s Newsroom,” The New York Times Company, May 09, 2024. https://www.nytco.com/press/principles-for-using-generative-a%E2%80%A4i%E2%80%A4-in-the-timess-newsroom/
[8] The Washington Post, “Policies and Standards,” Washington Post, Jan. 01, 2016. Accessed: Feb. 13, 2025. [Online]. Available: http://www.washingtonpost.com/policies-and-standards/
[9] N. Meir and The Associated Press, “Standards around generative AI,” The Associated Press, Nov. 22, 2024. http://www.ap.org/the-definitive-source/behind-the-news/standards-around-generative-ai/
[10] CNN, “ABOUT CNN DIGITAL,” CNN, Feb. 28, 2014. https://www.cnn.com/about (accessed Dec. 08, 2024).
[11] USA TODAY, “USA TODAY NETWORK Principles of Ethical Conduct For Newsrooms,” Usatoday.com, Dec. 04, 2023. https://cm.usatoday.com/ethical-conduct/
[12] Noain-Sánchez, “Addressing the Impact of Artificial Intelligence on Journalism: the perception of experts, journalists and academics,” Communication & Society, vol. 35, no. 3, pp. 105–121, Jun. 2022, doi: https://doi.org/10.15581/003.35.3.105-121.
[13] Tomlinson, D. J. Patterson and A. W. Torrance, “Turning Fake Data into Fake News: The AI Training Set as a Trojan Horse of Misinformation,” The San Diego Law Review, vol. 60, (4), pp. 641, 2023.
[14] OpenAI, “Safety & responsibility,” openai.com, 2024. https://openai.com/safety/ (accessed Dec. 08, 2024).
[15] Google, “Google AI Principles,” Google AI, 2023. https://ai.google/responsibility/principles/ (accessed 08, 2024).
[16] Kuai, “Unravelling Copyright Dilemma of AI-Generated News and Its Implications for the Institution of Journalism: The Cases of US, EU, and China,” New Media & Society, vol. 26, no. 9, pp. 5150–5168, Aug. 2024, https://doi.org/10.1177/14614448241251798.
[17] Watiktinnakorn, Seesai and C. Kerdvibulvech, “Blurring the lines: how AI is redefining artistic ownership and copyright,” Discover Artificial Intelligence, vol. 3, (1), pp. 37-10, 2023.
[18] O’Callaghan, “Can output produced autonomously by AI systems enjoy copyright protection, and should it?: An analysis of the current legal position and the search for the way forward,” Cornell International Law Journal, vol. 55, (4), pp. 305-350, 2022.
[19] Spica, “Public Interest, the True Soul: Copyright’s Fair Use Doctrine and the Use of Copyrighted Works to Train Generative AI Tools.,” Texas Intellectual Property Law Journal, vol. 33, no. 1, pp. 67–91, Sep. 2024.
[20] Reporters Without Borders, “Who are we?,” org, Jan. 22, 2016. https://rsf.org/en/who-are-we#rsf-is-2846
[21] Pope, “NYT v. OpenAI: The Times’s About-Face,” Harvard Law Review, Apr. 10, 2024. https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-timess-about-face/
[22] Allyn, “‘The New York Times’ takes OpenAI to court. ChatGPT’s future could be on the line,” NPR, Jan. 14, 2025. https://www.npr.org/2025/01/14/nx-s1-5258952/new-york-times-openai-microsoft
[23] FOX, “Business Ethics,” FOX. https://www.foxcorporation.com/corporate-governance/sobc/business-ethics/ (accessed Dec. 08, 2024).
[24] “FOX Corporation Uses AI to Change the Face of Media | AWS Summit New York 2023 video | AWS,” Amazon Web Services, 2023. https://aws.amazon.com/solutions/case-studies/fox-summit-ny-2023-keynote/ (accessed Dec. 08, 2024).
[25] Dierickx et al, “A data-centric approach for ethical and trustworthy AI in journalism,” Ethics and Information Technology, vol. 26, (4), 2024.