| Can AI weather models predict out-of-distribution gray swan tropical cyclones? Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand, and Dorian S. Abbot 人工智能(AI)提前几天预测天气的能力已经给科学家们留下了深刻的印象。与传统气象系统相比,这些新模型运行速度更快,使用的计算能力也少得多。但有一个问题:人工智能只能预测它之前看到的东西。当涉及到极端天气时,这是一个严重的问题。
 
 来自芝加哥大学、纽约大学和加州大学圣克鲁斯分校的研究人员开始探索这个问题。研究表明,虽然人工智能可以在典型的日常预测方面做得很好,但它很难应对罕见或前所未有的事件。其中包括5级飓风、百年一遇的洪水或打破所有历史记录的热浪。
 
 有明确限制的聪明模型
 
 ChatGPT等人工智能模型从大量数据中学习。以天气为中心的神经网络也以同样的方式工作。科学家们给它们提供了过去几十年的天气观测数据。在此基础上,模型学会预测接下来可能发生的事情。
 
 当你给这些模型提供最新的天气数据时,它们可以迅速生成与传统超级计算机模型相媲美的预测结果。但当一些完全意想不到的事情发生时,问题就来了。
 
 Pedram Hassanzadeh是芝加哥大学地球物理科学副教授,也是该研究的通讯作者。“人工智能天气模型是人工智能在科学领域取得的最大成就之一。我们发现它们很了不起,但并不神奇。”“我们推出这些车型才几年,所以还有很大的创新空间。”
 
 人工智能低估了危险天气
 
 问题是:如果训练数据中没有发生危险的天气事件,人工智能模型还能捕捉到它吗?
 
 研究人员通过训练一个人工智能模型来测试这一点,该模型排除了所有2级以上的飓风。然后,他们要求该模型预测一个通常会导致5级飓风的情景。结果呢?
 
 “它总是低估了这一事件。该模型知道有什么东西要来,但它总是预测它只会是2级飓风,”芝加哥大学的研究科学家、该研究的合著者Yongqiang Sun说。这种错误被称为假阴性。在天气预报中,它可能是致命的。高估风暴可能会导致不必要的疏散,这是昂贵的,但并不危险。然而,低估它可能会让人们对灾难毫无准备。
 
 
 Fig. 1 Schematic overview of this study. (A) Training of five versions of FourCastNet. The panel depicts the histogram of minimum mslp in the tropics (30°S–30°N) in the training set (ERA5, 1979–2015). Note that a lower mslp corresponds to a stronger TC. Vertical lines indicate the 5th and 25th percentiles, which are 970 hPa and 988 hPa, respectively. For FourCastNet-Full, the full training dataset is utilized. For FourCastNet-noTC, samples with instances of mslp below 988.0 hPa anywhere in the tropics are removed from the training set. FourCastNet-Rand uses a training set of the same size and seasonal distribution as noTC but with samples removed randomly (while ensuring that samples with mslp < 988.0 hPa are retained). Two additional models are also trained for which samples below 988 hPa only over the tropical Western Pacific (noWP) or tropical North Atlantic (noNA) basin are removed. For each training set, five independent versions (realizations) are trained from different random weight/bias initializations to account for model uncertainty. (B) Testing of the five models. The forecast skill of each trained model is evaluated for TCs with mslp below 970 hPa (Category 5) in the test set. The Right panels provide an example of the forecast results for Hurricane Lee (2023), a Category 5 TC. Shading represents the 25th to 75th percentile range of forecasts, derived from five model realizations and 51 different initial conditions (ICs) provided by an ensemble of data assimilations (EDA) from ECMWF; See Methods and Data. 人工智能记忆极端天气
 
 传统的天气模型使用方程式来描述大气是如何工作的。它们包括物理、数学,以及热、压和风如何相互作用的知识。
 
 神经网络不会这样做。它们更像是花哨的自动补全工具——仅根据它们之前看到的内容提供预测。这种差异很重要。这意味着人工智能模型可能会错过训练历史之外的事件。
 
 科学家们开始使用人工智能来探索长期风险和未来的气候情景。但如果人工智能不能预测极端情况,它在这些任务中的用处就会变得有限。
 
 不过,希望还是有的。研究人员发现,如果该模型看到了类似的极端事件——即使是在世界上不同的地方——它也可以做出更好的预测。例如,如果人工智能从未见过大西洋飓风,但见过强烈的太平洋飓风,它仍然可以预测强大的大西洋风暴。Hassanzadeh说:“这是一个令人惊讶和鼓舞人心的发现:这意味着这些模型可以预测在一个地区没有出现但在另一个地区偶尔发生的事件。”
 
 
 Fig. 2 FourCastNet’s difficulty in extrapolating to gray swan TCs. Forecasting of all 20 Category 5 TCs from the test set (2018–2023) by three versions of FourCastNet trained on different datasets: FourCastNet-Full (left column: A and D), FourCastNet-Rand (middle column: B and E), and FourCastNet-noTC (right column: C and F). The dashed line shows the critical threshold for 25th percentile of minimum mslp (roughly Category 3 TC) used in the noTC training set. All panels show the evolution of the median mslp (solid line) and the interquartile range from the 25 to the 75th percentile (shading) over all 20 Category 5 TCs, 5 realizations of each trained model, and 51 perturbed ICs from EDA (5,100 forecasts). Shading for ERA5 is over the 20 TCs. Forecasts are initialized 1 d before each TC reached the critical threshold (weak phase, top row) or 1 d after the TC reached this threshold (strong phase, bottom row). The latter ICs are out-of-distribution. As an additional note, detailed analysis shows that none of the ensemble members in the FoureCastNet-noTC forecasts reached the observed lowest mslp values. Although a few members’ mslp reached 970 hPa, this occurred because these members transitioned to an unstable state that eventually led to blow-up, rather than capturing realistic intensification of the storm. 将AI与经典工具相结合
 
 那么解决方案是什么呢?研究人员认为,将传统物理学与人工智能相结合是下一步。
 
 Hassanzadeh说:“我们希望,如果人工智能模型能够真正学习大气动力学,它们将能够弄清楚如何预测灰天鹅。”
 
 为了实现这一目标,科学家们正在探索新技术。其中一种叫做主动学习。在这种方法中,AI模型有助于指导基于物理的模拟,以创建更多罕见事件的示例。这些例子可以用来提高人工智能的准确性。
 
 该研究的合著者Jonathan Weare是纽约大学科朗数学科学研究所的教授。“长时间的模拟或观察数据集是行不通的。我们需要考虑更智能的方式来生成数据。”
 
 “在这种情况下,这意味着要回答这样一个问题:‘我应该把我的训练数据放在哪里,以在极端情况下获得更好的表现?“幸运的是,我们认为人工智能天气模型本身,当与正确的数学工具相结合时,可以帮助回答这个问题。”
 
 
 Fig. 4 FourCastNet generalizes across tropical regions for dynamically similar events. (A) Comparison of the forecast skill of FourCastNet-noWP against other models for Category 5 TCs (from the test set) in the Western Pacific, initialized at the TC’s weak phase. (B) As in (A) but for TCs in the North Atlantic basin. (C and D) As in (A and B) but initialized at the strong phase of the TCs. Solid lines and shading are as in Fig. 2. |