LLMs to Fill the Void: A Framework to Mitigate Forecasting Failures in U.S. Intelligence

By Enaya Bokhari

GW AI Policy Accelerator and Enaya Bokhari

May 20, 2026

This memo was written by undergraduate Enaya Bokhari as part of the GW AI Policy Accelerator’s semester-long AI Policy Writing Fellowship. For more information or to get involved, contact GWaipolicy@gmail.com

The U.S. Intelligence Community (IC) has long been known to be better at explaining what is happening in a given moment than forecasting what comes next. This is most evident in the U.S. IC’s failure to foresee and thus effectively mitigate major historical developments, such as but not limited to: Pearl Harbor, the Korean War, 1979 Iranian Revolution, and September 11 attacks. Artificial Intelligence in the form of Large Language Models (LLMs) provides the best current solution to fill the forecasting void, now more than ever as the Trump Administration is devoting large swaths of federal funding for Al deployment as a top national security priority. Efficiencies in the intelligence cycle’s prior stages, such as open-source data collection, have been realized. Yet, enhancements in the latter forecasting remain largely unrealized. Effective national security depends on quality intelligence analysis, and the latter depends on accurate predictive capabilities. Without quality forecasts, U.S. intelligence analysis will continue to suffer, and may even lag behind as allies and adversaries alike are racing to deploy Al in their governments. It is thus urgent that Al be strategically used to mitigate forecasting inabilities. What the IC needs, beyond funding, is a deployment strategy for all 18 agencies one that enhances the current intelligence cycle with increased LLM usage, allowing greater space for analysts to develop and sharpen their forecasting tradecraft.

The Trump Administration, on March 20, released its National Policy Framework for Artificial Intelligence, furthering the agenda of the AI Action Plan issued in July of 2025. The framework focuses on accelerating innovation and maintaining U.S. dominance in Al by centralizing control at the federal level and minimizing regulation. Elsewhere, David Sacks, co-chair of the President’s Council of Advisors on Science and Technology, and formerly Trump’s AI czar, told a gathering of tech executives and lawmakers, that Al is a driving force of the American economy. Earlier on Feb 9, the Washington Post reported that the White House had rapidly expanded Al use across federal agencies in keeping with the April 2025 Office of Budget Management memo, urging deployment of Al across the executive branch. Additionally, in a September 2025 essay, tech expert John S. Hollywood of the Rand Corporation argued that AI could improve policymaking by using tools like “virtual stakeholders” to generate more ideas and analysis, to which its value lies mainly in augmenting human judgment rather than replacing it. Tech policy, specifically AI, has become intrinsic to national security discussions, and deploying it in government is no longer a possibility stifled by past federal policies keen on regulating usage. Thus, the void of quality forecasts in intelligence analysis finally has a tangible solution.

I. Intelligence Background

The lack of quality forecasts to mitigate national security threats has long been evident in U.S. foreign policy conversations. The most prominent scholar on this issue, Philip E. Tetlock, attributes these failures to the tradecraft itself, or the lack thereof. In his 2005 book: Expert Political Judgment, he argues most analysts are overconfident in superficial predictions or “yes/no” answers. Predictive capabilities must be developed, and that requires discipline that leans off intuition, embracing probabilistic thinking as opposed to determinism. In his chapter of Understanding the Intelligence Cycle, Aaron Brantly, a professor of International Affairs and Cyber at the U.S. Military Academy, argued over a decade ago that digital information, or cyber, overwhelms and disrupts the linear intelligence cycle when it comes to collecting and processing intelligence (Brantly, 2013). Fast forward to a study almost two decades later, AI Expert Philipp Schoenegger alongside Tetlock demonstrated how LLMs can boost human thinking, decreasing cognitive biases in tasks requiring predictive capabilities (Schoenegger et al, 2024). These works underscore that forecasting challenges and how to address them have gained considerable attention. In fact, Al is already being deployed in intelligence analysis.

The Intelligence Advanced Research Projects Activity under the Office of Director of National Intelligence (ODNI) in 2023 launched the REASON program. REASON, which stands for Rapid Explanation, Analysis and Sourcing Online aims to assist human analysts in the U.S. Intelligence Community in enhancing their analytical output with Al. The application does this in two distinct ways: 1) It automatically points analysts to relevant information they have not yet considered, including disconfirming data & 2) REASON acts as a “logic checker” that compares competing explanations, highlights evidentiary strengths and weaknesses, and exposes gaps in reasoning to improve analytic rigor. Identifying what the challenges are in intelligence analysis and forecasting specifically is a well-defined topic, and technology as an analytical aid has been suggested for over a decade. The latest advancements with Al agents provide the right tech to assist intelligence analysis. Al has seen implementation in the earlier collection and processing stages, with IARPA’s REASON bringing it to the third stage of analysis. Despite this, Al agents may not be used to their full potential.

II. Analysis

Incorporating LLMs into the four-stage linear intelligence cycle (collection, processing, analysis, and dissemination) is not new. The prior stages of collection and processing of open-source information are largely assisted by machine-learning, as seen by the ODNI and the DoD’s National Security Agency (NSA). Moreover, initiatives like REASON have demonstrated that they can even assist in the analysis process as well. Still, taking Al implementation to the next level in the third stage is what is needed to improve human analysts’ ability to effectively forecast. At its current level, AI in the analysis stage is merely acting as a fact-checker of human labor, such as drafted reports, as seen with REASON. Deriving meaning, contextualizing, and creating insights from processed intelligence is the crux of the analysis stage, and frontier LLM models are at the stage where it can assist with almost all of the associated tasks. In other words, it can accelerate the process of human understanding of events that have already transpired. At a general level, tasks in this third stage include: evaluating source reliability, identifying patterns or trends across processed intel, drafting judgements (the writing labor itself,) and flagging intelligence gaps and uncertainties. All of these, especially drafting labor, can be effectively assisted by frontier models.

At its core, analysis is about making sense of the who, what, where, and when regarding a particular development. Intelligence organizations are forward-looking in their products, specializing in what is likely to happen, i.e.: the forecast (Friedman, 2010). Yet, this role is neglected when most analysts spend the bulk of their time trying to assess what is happening in the here and now, which could be due to the present-oriented media landscape analysts operate within (Judson, 2013). As a result, finished products across intelligentsia, not limited to the IC, spend the bulk of their time synthesizing why something is happening, with superficial policy recommendations born from underdeveloped forecasts as a result. This next step, then, is to move beyond explanation and begin assessing what these dynamics are likely to mean going forward, in other words: Developing predictive capabilities.

LLMs as of yet cannot fully take over the role of analyzing intelligence, for they are not a substitute for human intuition and cognitive abilities. Moreover, adequate generated products require sound human prompts to begin with, and additional oversight for hallucinations (GSMA Intelligence, 2025). Still, they can certainly assist human analysts and the cycle’s analysis stage as a whole in the broad task of making sense of collected intelligence. In this way, the input from LLMs/AI agents can reduce the amount of time analysts expend on evaluating the present or what is. This will create room for analysts to think about the what next, in other words, the forecast.

Forecasting is a skillset that has not been developed in large part because analysts have had to spend an inordinate amount of their time in finding the signal from the noise, and their signal is often riddled with errors and cognitive biases (Huo, Yu, and Ji, 2026). This has especially been the case with the expansion of Open-Source Intelligence (OSINT). With the advent of the internet in the 1990s followed by social media in the following decade, there has been an exponential increase in the amount of publicly available information. Keeping up with the volume of OSINT has been a challenge. Thus, the causes of analytical and forecasting hurdles are twofold: This information frenzy working in tandem with underdeveloped forecasting tradecrafts. These two factors go on to repeat the cycle of inaccurate judgements in intelligence products.

The emergence of more nascent Al tools in the late 1990s and early 2000s, such as signals intelligence (SIGINT) classification with neural networks, began to assist U.S. IC analysts in finding patterns humans would miss. Still, these were mostly narrow and task specific. It wasn’t until the late 2010s and early 2020s that Al in the form of LLMs became a force multiplier in collection and collation, now compressing exponentially larger amounts of intel in minutes. With this, human analysts gained the bandwidth to focus on drawing inferences from the data. In essence, analysts have moved from sorting data themselves, collecting and processing as per the intelligence cycle, to focusing on interpreting machine-generated insights.

III. Forecast

The rapid expansion of generative Al in the form of GPT, Claude, Gemini, and Llama can also help accelerate the process of analysis. But LLMs can now also be deployed such that they can aid human analysts being able to unpack how actors are behaving and the logic behind their actions. What that can do is create the time and space for them towards developing predictive capabilities. The what next has always been an afterthought, but now that LLMs can help with making sense of developments unfolding in real time, or the what is.

The U.S. IC has historically missed many major developments over the decades. These include the: Soviet intervention in Afghanistan, 1979 revolution in Iran, collapse of the Soviet Union, rise of China as a major geoeconomic competitor, the September 11 attacks, etc. In a way these, misses are understandable considering the challenge of keeping up with events that have already transpired. The ongoing revolution in LLM technology now shows how this chronic failure can be rectified.

Allies and adversaries alike are swiftly reacting to the global Al race. The U.S. still spearheads innovation, owning over half of global compute alone. Deployment, however, is the subsequent imperative in leading AI, and where Washington is less proficient in (Bokhari and Polyak 2025). For example: Washington’s largest geoeconomic competitor, China, inches closer in deployment with a more rapid strategy. This is seen with their diffusion of cost-efficient models that use “distillation” techniques: using American frontier models to train theirs (Chan, 2026). Elsewhere, Beijing has deployed machine-learning technology across sectors, in government services such as taxation and welfare programs. Additionally, the nature of the one-party rule is conducive to a far more rapid deployment strategy. The U.S. need not mimic exactly how and where China deploys but must devote attention to fix its own enduring issues, such as those relevant to national security decision making.

If the IC is to continue without a strategy to deploy Al towards analysis, human analysts will have less time to spend on forecasting. What this will lead to is underdeveloped forecasting abilities. Consequently, U.S. decision-making will likely continue to suffer because policies will be crafted without much foresight of emerging global developments. The inability to predict is an issue in and of itself which is amplified with the risk of lagging behind competitors in the current deployment race. Thus, falling behind is a twofold issue: 1) The inability to deploy Al at scale; 2) U.S. foreign policy decision-making risks the same fate as other governments race to deploy and diffuse the technology within their systems.

IV. Policy Recommendations

The Trump Administration must deploy LLMS to enhance analysis within the IC:
LLMs are already being deployed and boosting efficiency within the intelligence cycle’s earlier stages, i.e., collection and processing. While current usage must continue, it alone is not at a level conducive to mitigate forecasting inadequacies, which requires the predictive tradecraft to first be prioritized and thus developed. Defining where in the cycle (analysis stage) to further deploy Al to aid forecasting, at enterprise level at that, is where current usage falls short in the IC and beyond. Increasing LLM usage in the third analysis stage, deriving the how and why, means using it in the hefty sums of drafting labor (judgements, reports, SITREPs, memos, etc.). The analysis stage should thus become AI-assisted, with human analysts left to do the necessary oversight of it, but largely synthesizing the what next.
Human analysts must prioritize forecasting within the third ‘analysis’ stage of the intelligence cycle:
With the time and resources leftover from analysis that is largely AI-assisted, the third stage can increasingly be dedicated to predictive capabilities. This is where human analysts should expend most of their energy, as most LLM usage will be done in basic assessments of the former how and why of a particular development. With this, predicting the what next can become its own task within the broader third stage of analysis, and not merely an afterthought.
This strategy should be implemented simultaneously across the 18 agencies overseen by the ODNI:
Considering the strategic losses the U.S. will face if underdeveloped forecasts persist, it is imperative the IC prioritizes a deployment strategy that secures uniform progress in U.S. intelligence analysis. This way, intelligence products dealing with domestic and foreign developments alike will be enhanced. Thus, this framework must achieve two things: 1) Create the time and opportunities necessary to sharpen forecasting tradecrafts 2) Not stifle competence between agencies by creating bureaucratic inefficiencies.
Each agency must operationalize constant human oversight as Al usage increases:
With increased Al usage, human oversight must increase in parallel. None of the intelligence cycle’s four stages, even with Al-assisted analysis, should be entirely Al driven, as agents must work in tandem to human analysts. Instead of having human analysts reviewed by AI, this framework vouches for the opposite. The strategic dynamic between human and machine in this third stage can be understood as a military commander and their oracles, or in this case, Al agents (Probasco et al., 2025). This requires agencies to operationalize constant review of generated insights or products within each stage’s use of AI. Implementation may include the creation of oversight roles who work alongside analysts. Their job would be to continuously review and fact-check generated insights without disrupting the analyst’s workflow. Establishing an additional final review of the finished intelligence product would also be worthwhile. This final pre-dissemination review would specifically check for Al hallucinations, practicality of recommendations, and other generated errors. Still, the details of execution should largely be left to each agency and their established bureaucratic systems, so as to not implement inefficiencies from the ODNI’s broader level.

The Trump Administration’s treatment of Al as an intrinsic asset to our national security forges the necessary environment and opportunities to fix existing discrepancies in intelligence work. The real solution, and likewise, challenge, is how the U.S. Intelligence Community will integrate it within existing frameworks, to solve existing issues. Predictive capabilities are the core of intelligence and yet remain the industry’s outstanding deficiency. Adopting an Al strategy for the IC will not only accelerate Washington’s position in global Al deployment but will enhance the broader context of its foreign policy decision-making as the global superpower.

Works Cited

Arora, Gunisha. “Beyond the Numbers: Why AI Can’t Replace Human Analysts.” GSMA Intelligence, (June, 2025). https://www.gsmaintelligence.com/blogs/beyond-the-numbers-why-ai-cant-replace-human-analysts
Bokhari, Kamran, and Mark Polyak. “The Geopolitics of Code: Artificial Intelligence and the Future of American Soft Power.” The National Interest, (July, 2025). https://nationalinterest.org/blog/techland/the-geopolitics-of-code-artificial-intelligence-and-the-future-of-american-soft-power
Brantly, Aaron F. “Defining the Role of Intelligence in Cyber: A Hybrid Push and Pull.” In Understanding the Intelligence Cycle, edited by Mark Phythian, 79-96. London: Routledge, (2013).
Brennan Center for Justice. “Rethinking Intelligence: Interview with George Friedman.” YouTube video. Posted (August, 2014).

Chan, Kyle. “China Is Running Multiple AI Races.” Brookings Institution, (March, 2026). https://www.brookings.edu/articles/china-is-running-multiple-ai-races/
Huo, Yupeng, Litongxing Yu, and Yushan Ji. “Cognitive Biases in Military Intelligence Analysis: Amplification Mechanisms and Intervention Strategies in the Digital-Intelligent Era.” BMC Psychology (2026). https://www.researchgate.net/publication/403439203_Cognitive_biases_in_military_inteligence_analysis_amplification_mechanisms_and_intervention_strategies_in_the_digital-intelligent_era
Intelligence Advanced Research Projects Activity. “REASON: Rapid Explanation, Analysis and Sourcing Online-Technical Description.” Office of the Director of National Intelligence, (December 2022). https://www.iarpa.gov/images/Propsers DayPDFs/REASON/REASON TechnicalDescriptionfinal122222-1.pdf
Judson, David D. “Geopolitical Intelligence, Political Journalism and ‘Wants’ vs. ‘Needs’.” Forbes, (October 2013). https://www.forbes.com/sites/stratfor/2013/10/29/geopolitical-intelligence-political-journalism-and-wants-vs-needs/?utm_source=chatgpt.com
Philip E. Tetlock. Expert Political Judgment: How Good Is It? How Can We Know? Princeton, NJ: Princeton University Press, (2005).
Probasco, Emelia, Helen Toner, Matthew Burtell, and Tim G. J. Rudner. Al for Military Decision-Making: Harnessing the Advantages and Avoiding the Risks. Center for Security and Emerging Technology, Georgetown University, (April 2025). https://cset.georgetown.edu/publication/ai-for-military-decision-making/
Spitzer, Philipp, Katelyn Morrison, Violet Turri, Michelle Feng, Adam Perer, and Niklas Kühl. “Improving Human Forecasting Accuracy.” ACM Transactions on Interactive Intelligent Systems 15, no. 1 (March 2025). https://dl.acm.org/doi/pdf/10.1145/3707649
White House. Accelerating Federal Use of Artificial Intelligence through Innovation, Governance, and Public Trust. Washington, DC: The White House, (February 2025). https://www.whitehouse.gov/wp-content/uploads/2025/02/M-25-21-Accelerating-Federal-Use-of-AI-through-Innovation-Governance-and-Public-Trust.pdf.
White House. America’s Al Action Plan: Winning the Race. Washington, DC: The White House, (July 2025). https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf
White House. National Policy Framework for Artificial Intelligence: Legislative Recommendations. Washington, DC: The White House, (March 2026). https://www.whitehouse.gov/wp-content/uploads/2026/03/03.20.26-National-Policy-Framework-for-Artificial-Intelligence-Legislative-Recommendations.pdf

A guest post by

Enaya Bokhari

An undergraduate's independent research and analyses aiming to strategically understand the world. | GW Elliot School of International Affairs ‘29 Fiction portfolio: https://enayabokhari.wixsite.com/writingwebsitereal

GW AI Policy Accelerator

Discussion about this post

Ready for more?