By some criteria, AI systems can now compete with traditional computing methods for producing weather forecasts. However, training penalizes errors, so predictions tend to be “smeared”. As you move forward in time, the model makes fewer specific predictions. This is because the prediction is more likely to be wrong. As a result, you’ll begin to see the storm trail spread out, or the storm itself lose its clearly defined edges.
However, the use of AI remains very attractive. This is because the alternative is computational atmospheric circulation models, which are extremely computationally intensive. Still, the model has been very successful, and the European Center for Medium-Range Weather Forecasts’ ensemble model is considered best in class.
In a paper published today, Google’s DeepMind claims its new AI system outperforms European models in forecasting for at least a week, and often longer. DeepMind’s system, called GenCast, integrates several computational approaches used by atmospheric scientists with diffusion models commonly used in generative AI. The result is a system that maintains high resolution while significantly reducing computational costs.
Ensemble prediction
Traditional computational methods have two main advantages over AI systems. First, they are based directly on atmospheric physics, incorporate the known rules that govern the behavior of real weather, and calculate some of the details in a way that is directly informed by empirical data. That’s what I’m doing. These are also performed as ensembles. That is, multiple instances of the model are executed. Due to the chaotic nature of the weather, these different runs gradually diverge, providing a measure of forecast uncertainty.
There has been at least one attempt to integrate some aspects of traditional weather models with AI systems. Google’s internal project used traditional atmospheric circulation models that divided the Earth’s surface into a grid of cells, but used AI to predict the behavior of each cell. This significantly improved computational performance, but at the expense of relatively large grid cells and relatively low resolution.
To tackle AI weather forecasting, DeepMind decided to skip the physics and instead embrace the ability to run ensembles.
Gen Cast is based on a diffusion model and has important features that can help you here. Essentially, these models start with a mixture of originals (images, text, weather patterns) and are then trained on variations injected with noise. The system is supposed to create variations of noisy versions that are close to the original. Once trained, you can feed it pure noise and let it evolve closer to what you’re targeting.
In this case, the target is realistic weather data, and the system inputs pure noise and evolves it based on the current state of the atmosphere and recent history. For longer-term forecasts, “history” includes both actual data and predicted data from previous forecasts. Since the system advances in 12-hour increments, the forecast for day 3 incorporates the starting conditions, previous history, and the two forecasts for days 1 and 2.
This is useful for creating ensemble forecasts because you can feed different patterns of noise as input, each producing slightly different outputs of weather data. This serves the same purpose as traditional weather models: providing a measure of prediction uncertainty.
For each grid square, GenCast uses six weather measurements at the surface, plus six to track atmospheric conditions and 13 different altitudes to estimate barometric pressure. These grid squares are each 0.2 degrees on a side, which is a higher resolution than the European model uses for predictions. Despite this solution, DeepMind estimates that a single instance (meaning not a complete ensemble) could take 15 days either: Google’s tensor processing system In just 8 minutes.
By running multiple versions of this in parallel and integrating the results, you can make ensemble predictions. Considering the amount of hardware Google has at its disposal, the entire process from start to finish is likely to take less than 20 minutes. Source data and training data are placed on GitHub pages. DeepMind’s GraphCast Project. Given the relatively low computational requirements, it is likely that individual academic research teams will begin experimenting.
measure of success
DeepMind reports that GenCast dramatically outperforms the best traditional predictive models. DeepMind found that GenCast was more accurate than European models in 97% of the tests it used, which used standard benchmarks in the field and checked various output values at different points in the future. Furthermore, the uncertainty-based confidence values obtained from the ensemble were generally reasonable.
Until now, AI weather forecasters have been trained on real-world data, but they have generally not been good at dealing with extreme weather events because they rarely appear in the training set. However, GenCast performed very well, often outperforming the European model in terms of extreme high and low temperatures, atmospheric pressure, etc. (less than 1% of the time, including the 0.01 percentile).
DeepMind also went beyond standard testing to determine if GenCast could help. This study includes projection of tropical cyclone tracks, an important task for predicting models. For the first four days, GenCast was significantly more accurate than the European model, holding a lead of about a week.
One of DeepMind’s most interesting tests was checking global wind power output forecasts based on information from a global power plant database. This involves using it to predict the wind speed at 10 meters above the ground (actually lower than where most turbines are located, but the best possible approximation) and writing that number It involves calculating the amount of electricity produced using . The system outperformed traditional weather models by 20% in the first two days and remained in the lead as the lead declined over the week.
Researchers haven’t spent much time investigating why performance appears to gradually decline over a period of about a week. Ideally, researchers would probably be thinking about more details about GenCast’s limitations, as they would be useful for further improvements. In any case, today’s paper presents a second case where predictions have been reported to be improved by adopting something akin to a hybrid approach, one that combines aspects of traditional prediction systems with AI . And both of these cases take very different approaches, raising the possibility that it will be possible to combine some of their features.
Nature, 2024. DOI: 10.1038/s41586-024-08252-9 (About DOI).