This is the official repository for the Pangu-Weather paper.
Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast, arXiv preprint: 2211.02556, 2022.
by Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu and Qi Tian
Resources including pseudocode, pre-trained models, and inference code are released.
The downloaded files shall be organized as the following hierarchy:
├── root
│ ├── input_data
│ │ ├── input_surface.npy
│ │ ├── input_upper.npy
│ ├── output_data
│ ├── model_jit_cpu_1.onnx
│ ├── model_jit_cpu_3.onnx
│ ├── model_jit_cpu_6.onnx
│ ├── model_jit_cpu_24.onnx
│ ├── inference_cpu.py
│ ├── inference_gpu.py
│ ├── inference_iterative.py
If you use a CPU environment, please run:
pip install -r requirement_cpu.txt
If you use a GPU environment, please first confirm that the cuda version is 11.6 and the cudnn version is the 8.2.4 for Linux and 8.5.0.96 for Windows (please see this page for details). Then, please run:
pip install -r requirement_gpu.txt
Please download the four pre-trained models (~1.1GB each) from the Google drive:
The 1-hour model: model_jit_cpu_1.onnx
The 3-hour model: model_jit_cpu_3.onnx
The 6-hour model: model_jit_cpu_6.onnx
The 24-hour model: model_jit_cpu_24.onnx
These models are stored using the ONNX format, and thus can be used via different languages such as Python, C++, C#, Java, etc.
Please prepare the input data using numpy. There are two files that shall be put under the input_data
folder, namely, input_surface.npy
and input_upper.npy
.
input_surface.npy
stores the input surface variables. It is a numpy array shaped (4,721,1440) where the first dimension represents the 4 surface variables (MSLP, U10, V10, T2M in the exact order).
input_upper.npy
stores the upper-air variables. It is a numpy array shaped (5,13,721,1440) where the first dimension represents the 5 surface variables (Z, Q, T, U and V in the exact order), and the second dimension represents the 13 pressure levels (1000hPa, 925hPa, 850hPa, 700hPa, 600hPa, 500hPa, 400hPa, 300hPa, 250hPa, 200hPa, 150hPa, 100hPa and 50hPa in the exact order).
In both cases, the dimensions of 721 and 1440 represent the size along the latitude and longitude, where the numerical range is [90,-90] degree and [0,359.75] degree, respectively, and the spacing is 0.25 degrees. For each 721x1440 slice, the data format is exactly the same as the .nc
file download from the ERA5 official website.
Note that the numpy arrays should be in single precision (.astype(np.float32)
), not in double precision.
We support ERA5 initial fields and ECMWF initial fields (e.g., the initial fields of the HRES forecast), where the latter often leads to a slight accuracy drop (mainly for T2M because the two fields are quite different in temperature). A .nc
file of ERA5 can be transformed into a .npy
file using the netCDF4 package, and a .grib
file of the ECMWF initial fields can be transformed into a .npy
file using the pygrib package. Note that Z represents geopotential, not geopotential height, so a factor of 9.80665 should be multiplied if the raw data contains the geopotential height.
We temporarily do not support other kinds of initial fields due to the possibly dramatic differences in the fields when Z<0.
We provide an example of transferred input files, input_surface.npy
and input_upper.npy
, which correspond to the ERA5 initial fields of at 12:00UTC, 2018/09/27. Please download them using Google drive:
After the above steps are finished, please check inference_cpu.py
for an example of making a 24-hour weather forecast on CPU with the 24-hour model, and inference_gpu.py
for the GPU version.
For example, running the following command, one can get the 24-hour forecast in the output_data
folder:
python inference_cpu.py # python inference_gpu.py for gpu environment
Also, inference_iterative.py
shows an example to generate per-6-hour forecast within a week.
pseudocode.py
contains the pseudocode that elaborates our main algorithm. It is written in Python and can be implemented using any deep learning library, e.g. PyTorch and TensorFlow.
Note that one needs to download about 60TB of ERA5 data and prepare for computational resource of 3000 GPU-days (in V100) to train each model.
Pangu-Weather is released under the MIT license.
Also, please note that all models were trained using the ERA5 dataset provided by ECMWF. Please do follow their policy and note that commercial use of these models is forbidden.
If you use the resource in your research, please cite our paper:
@article{bi2022pangu,
title={Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast},
author={Bi, Kaifeng and Xie, Lingxi and Zhang, Hengheng and Chen, Xin and Gu, Xiaotao and Tian, Qi},
journal={arXiv preprint arXiv:2211.02556},
year={2022}
}