General Questions:

Q1: Do you have the PPTs which were displayed during the Beijing training camp?
AnswerYes, these PPT files will be uploaded to ASC19 official Website very soon: www.asc-events.org/ASC19/

Q2: We need the license to be able to import the data in Teye application. So where we can get the Teye license?
Answer: You should provide some info, such as name, email, request reason, MAC address. You only have one-month trial period. And we only provide technical support for ASC. So we fail to ask the questions which you met during the software installation.

Q3: The first question about design HPC system, it require based on the Inspur NF5280M5. But our team didn’t have NF5280M5, so can we design the cluster and power evaluation according to the theoretical.
Answer:  The clustering scheme design should be based on Inspur NF5280M5. But it is not demand you to actually build it. Your clustering scheme design should satisfy the requirement and reasonable, theoretical analysis correct, and highlight the design bright spot.

Q4: How to submit the preliminary contest results data?
Answer: You can use FTP to upload your preliminary contest results data. The address of the FPT server is: 47.94.106.168. We highly recommend that all teams using the FTP client to connect FTP server, such as FlashFXP、FileZilla etc. The FTP account and password have sent to the email of team advisor and team leader.
If you are Outside mainland China and have problems with FTP uploading, you can choose network disk such as OneDrive and deliver the address to us in your proposal.
The FTP servers using Cloud hosts. The maximum bandwidth is 50Mb/s. So we suggest you upload your results data as early as you can. Or upload the largest data files in advance. 
Please send your Proposal to [email protected]. Also upload it to FTP server. For CSEM and SISC results data only upload on FTP server.

About AI Application Task - SR

Q1: I want to know how to calculate the RMSE score, as I do not have the ground truth images;
Answer: The committee do not supply the GT images, you can build yourself test-set to evaluate the RMSE of your model;

About HPC Application Task - CESM

Q1: In the notifications, the compset of EXP2 is required to be B1850CN in the title, and B in the text (screenshot below). Should the compset of EXP2 be B_2000(B) or B_1850_CN? They are different compsets and are likely to generate different results.
AnswerThis is a written mistake. Please choose the compset B_2000(B).

Q2: When I running the EXP1, there are some file missing. The error message is as follows: File was not found in svn repo: https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/ocn/docn7/SOM/UNSET
AnswerThis is because DOCN_SOM_FILENAME in EXP1 is UNSET, you should precised the data name of the ocean model-“DOCN_SOM_FILENAME” by the following command:
./xmlchange -file env_run.xml -id DOCN_SOM_FILENAME -val pop_frc.gx1v6.091112.nc
You can find it in the notifications.

Q3: When running the cesm, the cesm will stop because of some missing input data. How can I solve this problem?
AnswerWe only provide the input files mentioned in the file *.input_data_list. The missing data when running the cesm, please download from the following website:
https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/
User name: guestuser
Password: friendly

Q4: How should I control the run length to be 10 years? Should I set STOP_N to 3650 or STOP_DATE to 18600101 (the models seem to start at 1850)? 3650 days doesn't mean strictly 10 years considering leap years, and I want a precise definition of "ten-year run length" from you.
AnswerThe run length can control with STOP_OPTION and STOP_N. You can set STOP_OPTION=nyears and STOP_N=10. In the test of the ASC19, the CALENDAR in env_build.xml is NO_LEAP,We don’t need to consider leap years here. You can refer to Chapter 4 “running CESM” of the CESM’s user guide to study how to run the model.

Q5: Which file does the DOU_T_SROOT/.../log refer to, and which files does Screen output (*log) mean?

AnswerThere will be a log file every time when you run the model. For example, your atm.log.190119-081612.gz refers to the log file generated when you run the model at 8:16:12 on January 19, 2019. You only need to provide a log file corresponding to your final result, do not need to provide all. Here.../ refers to atm cpl ocn and so on. The Screen output(*log)means the screen out of the committed step.

Q6: Is the Evaluation data the last data we measured for our results? I saw that this step is not necessary, does it have an impact on scores?
AnswerThis part of the verification does not need to be done, but we will verify the Evaluation data submitted by you, and the verification whether is done or not will not affect the score. You can just verify the accuracy of the results yourself.

Q7: Can we use CESM1.2.2-FASTCAM instead of CESM1.2.2?
AnswerI have not heard of cesm1.2.2-FASTCAM, and I have not found the corresponding information in the website. What does FASTCAM refer to here? Please provide some instructions about FASTCAM.

Q8: After I successfully run CESM, I found that there is only one file “ccsm_timing_stats” and one “checkpoints” folder in the generated timing folder (there is nothing in the folder), and there is no file related to CESM TIMING PROFILE. Later I changed the "SAVE_TIMING" file in the env_run.xml file, but it’s still useless.
How can I solve the problem of generating CESM TIMING PROFILE?
AnswerFor the information of timing data, please refer to http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/x1516.html .
The CCSM TIMING PROFILE here is not a separate file and is part of the $CASEROOT/timing/ccsm_timing.$CASE.$date file. If there is no timing/ccsm_timing.$CASE.$date file in the $CASEROOT directory, please modify the $CASE.run file:
If ($DOUT_S == 'TRUE') then
   Echo "Archiving cesm output to $DOUT_S_ROOT"
   Echo "Calling the short-term archiving script st_archive.sh"
   Cd $RUNDIR; $CASETOOLS/st_archive.sh
endif
After this command you can added:
cd $CASEROOT

Q9: Can we use E3SM (Energy Exascale Earth System Model) instead of CESM 1.2.2?
Answer: E3SM should require relatively large computing resources. In the competition, many people's computing resources may not fulfil the requirements. In order to compare with other team results, we do not recommend using E3SM.

Q10: How should we save the desired output? The document says we should "extract" the five variables (U10,TS, PS, Z3, SST) "to the specific file". I've managed to read these variables from the history output files with MATLAB. However, each variable of each file is a 2D or 3D matrix. In what format should we save these variables?
Answer:The format should be nc file. You can use the netcdf software to read and also write the nc file. There are many netcdf softwares, such as nco, ncl, the matlab can write the nc file too. The nco may be convenient in my idea.

Q11: Is there a way to verify the model results? We ran CESM on multiple clusters with different compilers, and the results all differ slightly. Optimizations like tuning the compiling parameters also have an impact on the results. Is there a reference result we can compare ours to?
Answer:You can use the AMWG diagnostics package to evaluate your result. It can compare your result with observation data and the model data. It can calculate the RMSE too. Because the diagnostics package is too big to upload, you can download it from the following command:
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/amwg_diag5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/obs_data_5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/map_files_5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/cam35_data/

Q12: After run cesm_setup, get following problem, and don't know where I had made mistake, thanks for helping

Answer: Your mistake is about the expression syntax. The preview_namelist script is a csh file, you should make sure that there is a c shell (csh) in your machine.

Q13: I tried to run CE1's EXP1 with multiple threads and found the following errors when running. How can I solve this problem?

Answer:I looked at your 3 mistakes and all the errors occurred when reading the file rtmi.I1850CRUCLM45BGC.0241-01-01.R05_simyr1850_c130515.nc. Please check the size of the file. Maybe the file was not uploaded successfully during the upload process. You can try to download and upload the file again:

Q14: What kind of form should we submit the evaluation data, is it the monthly average, or is it written once a day, or how often?
Answer:Each case will output one result. For cam, there will be two files, h0 and h1. Please provide h0 (monthly file) file.

Q15:After successfully setup the two cases (EXP1 and EXP2) and run for a short model time (5 days), we found that using the default settings, running EXP1 and EXP2 for ten model years on our 4-node cluster (each with two Xeon Silver 4110 CPUs) would take roughly a week (wall time) for each case. Are we expected to observe this long execution time on such a cluster? If not, what are the execution time you expected to run these two cases on our cluster (or your cluster)? On the other hand, while we are trying to reduce the execution time, we study the CESM User Guide and come up with a question. We're wondering if we are allowed to change the coupling interval settings in env_run.xml (such as NCPL_BASE_PERIOD and ATM_NCPL).
Answer: I used 64 cores on my cluster, and each EXP took about 1 day before any optimizations were done. Taking into account the computing resources of all people and the verification of the calculation results, the current running time is ten years. The ASC competition is not attention to your computing resources, but the optimization method. If your computing resources are really limited, you can consider shortening your run time, but don't shorten it too far. This will not have much impact on your performance, and please be sure to explain this in the final discussion. Please do not modify NCPL_BASE_PERIOD and ATM_NCPL, this may affect the results.

Q16: I am surprised to find in my cesm1_2_1/models/utils/pio directory, I cannot find 'configure', which is required for build model. In ASC19 Preliminary Contest Notifications, there is one sentence "If you couldn’t download the pio and genf90 successfully, you could download them form the following websites:", but in the Baidu SkyDrive, I can just find CESM input data.
Answer: The file ASC19_CESM_inputdata.tar.bz2 in the Baidu SkyDrive include the model source code cesm1_2_2, please copy the cesm1_2_2/models/utils/pio to your source code.

Q17: Where can I find the five evaluation data?
AnswerThe variable U10, TS, PS, Z3 can find in the $CASE.cam.h0.*.nc file. The variable SST can find in the $CASE.pop.h.*.nc file for exp2 case. For the case of exp1, ocean model is “dead”, so you don’t need to provide the SST variable.
If your pop file don’t have “SST” variable for exp2 case, you can do the following steps before running the case refer to the Question “CAM: How do I use B compset history output to create SST/ICE data files to drive an F compset?” in Chapter 6 of the CESM USER GUIDE
Save monthly averaged SST information from pop2. To do this, copy $CCSMROOT/models/ocn/pop2/input_templates/gx1v6_tavg_contents, to $CASEROOT/SourceMods/src.pop2 and change the 2 in front of SST to 1 for monthly frequency.
If you have already running the case successfully and don’t want to run the case again, please do the following procedure to obtain the SST from the TEMP variable.
ncrcat -v TEMP $CASE.pop.h.${yyyy}-${mm}.nc temp.${yyyy}-${mm}.nc
ncra -O -h -F -d z_t,1,1 temp.${yyyy}-${mm}.nc sst.${yyyy}-${mm}.nc

Q18: how big is the output file size of these two cases in 10 years? I want to know the 10-year output data size of EXP1 and the 10-year output data size of EXP2.
Answer: After extracting the five variables, the size of the two cases should be around 4.5G after compression.

Q19: When verifying the results, do I need to write RMSE procedure to compare them, or use related tools. For example, can cprnc in tools be used for comparison verification? Is the comparison verification verified by the results of the baseline of ten years.
Answer The review will use the CESM AMWG diagnostics package, or you can write your own RMSE program. The verification results will be compared with the observations and the model data, taking into account both.

Q20: Do we still need to do result verification and error analysis?
Answer: For verification, you only need to provide the data of these variables. We will use the diagnostic package provided by CESM to do the evaluation. You don't need to verify it by yourself. If you want to see the correctness of your results, you can download the diagnostic package yourself to verify. For more information on the extracted variables and verification, you can look at the Frequently asked questions summary published before.

Q21: What does running ten years mean? Which one do I need to generate, .cam.h1.0010-01-01-00000.nc or .cam.h1.0011-01-01-00000.nc?
Answer After running ten years, you should generate the .cam.h1.0011-01-00000.nc.

Q22:  We had tried to compile CESM using the PGI compiler and the GNU compiler. When using the GNU compiler, we turned on O2 level optimization, and everything is working fine. But when we use the PGI compiler, we try to open the O2 level optimization (and the fast flag), the compiler can pass normally, but the CLM cannot be initialized normally at runtime, and all the processes are stuck. We want to know if we finally use the PGI compiler, and do not start the compiler optimization option as a standard (adding the compiler optimization flag does not work properly) to optimize, will it affect our Final score?
AnswerI can't confirm your problem according to the sentence "the CLM cannot be initialized normally at runtime, and all the processes are stuck.". It may be a PE setting problem, a netcdf library mistake, or other reasons. You need to try to use the same pgi compiler to compile the Netcdf. If there is still a problem, please attach your log file(cesm.log*,  cpl.log*  and the clm.log* ), your env_mach_pes.xml and the Macros file.

Q23: What do the Command line file(*.sh) and the Screen output(*.log) mean? And does the Evaluation data mean that I should move to the related nc file to a folder named “Evaluation data”? And how can I View the variable value of the nc file?
AnswerCommand line file(*.sh): You can put the steps from creat_case to run in a script file and name it exp1.sh exp2.sh.
Screen output (*.log), also can be named exp1.log exp2.log.
In the linux system, since the netcdf library is already installed, you can use the ncdump command to view the variable values in the nc file. You can put the nc file in a folder and name the Evaluation data.

Q24: I got an error when I’m running EXP1 , here is the error message of the .log file:
[[email protected]] HYD_pmcd_pmi_alloc_pg_scratch (pm/pmiserv/pmiserv_utils.c:527): assert (pg->pg_process_count * sizeof(struct HYD_pmcd_pmi_ecount)) failed
[[email protected]] HYD_pmci_launch_procs (pm/pmiserv/pmiserv_pmci.c:108): error allocating pg scratch space

I think there must be something run about MPI, but I could run the test cases well. The problem is : Is the running of input_EXP1 limited by memory, storage space or just any computing capability ? Because I’m doing the task with my own computer for testing.
Answer: I think this should be something wrong with your MPI. If there is something limited by memory, storage space or any computing capability, there will not occur such error. You can consider looking at the following website, the error was the same with yours, maybe it will solve your problem.
https://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-March/012006.html

Q25: How to extract the SST variable in the historical file?
Answer:If you have already running the case successfully and don’t want to run the case again, please do the following procedure to obtain the SST from the TEMP variable.
ncrcat -v TEMP $CASE.pop.h.${yyyy}-${mm}.nc temp.${yyyy}-${mm}.nc
ncra -O -h -F -d z_t,1,1 temp.${yyyy}-${mm}.nc sst.${yyyy}-${mm}.nc

From the above steps, the variable in sst.${yyyy}-${mm}.nc is the sea surface temperature, but the variable name is still TEMP, and the variable name can be modified by ncrename.
ncrename –v TEMP SST sst.${yyyy}-${mm}.nc  ${yyyy}-${mm}.nc
Notice that the evaluation data you provided should be the monthly file, it means the variable was from the $CASE.cam.h0. ${yyyy}-${mm}.nc and $CASE.pop.h. ${yyyy}-${mm}.nc

Q26: If we can't provide all the data for ten years, what effect will it have on the results?
Answer: The result evaluation will have a score which is relatively small compared with the optimization method in the proposal. The number of running results you provided has little effect on the score, but if it is not provided, we cannot verify whether your optimization plan is correct.
We will calculate multiple sets of RMSE to compare based on all the team submitted. The uncompleted models will not affect your scores unless you don’t provide any of the results.

Q27: how can I find the historical data?
Answer If you run a CASE successfully, the historical data will store in the DOUT_S_ROOT you set. If you set the running years of 10, but you only running 5 years because of the machine limited, you can find the historical data in the EXEROOT you set.

Q28: We managed to run CESM on a single node, but we encountered segmentation faults when we tried to apply it to our cluster.

Answer:
You can try to use the same NTASKS in env_mach_pes.xml  and NTHREADS to 1, and see if there is a similar error.
Check your mkbatch file for the following commands:
limit stacksize unlimited
limit coredumpsize unlimited
If the problem is solved, then you need to meet the requirements when setting up PES. If you still have problems, please attach your atm log file due to the error happened when Initialize atm component.

Q29: When I run EXP2, the CSM Execution has finished but there are no output file. I think, because there is no "SUCCESSFULLY TERMINATION" In CPLlogFIle, but there are no sufficient information about error. As i attached the log, there is no hint and just mpi killed... I also do 'ulimit -s unlimited' and thread is also 1 so 'openmp stack size' also not related.
AnswerI find in your cesm.log file, the error happed when the output file was written. And you should check your computer resources, such as your memory, your storage space, et.al. There is nothing wrong with the machinefile of CESM you provided, I can run it properly on my machine. But from your log file I only know that the model was killed, can’t see the detailed error information. You can add -traceback (ifort) or -fbacktrace(gfortran) to FFLAGS to compile the model, so that you can find out which procedure caused the MPI killed.

Q30:  I found that the program did not stop running when I left the process for more than 10 hours.
Answer

In your log file, I find that your error occurred when call mpi_init (ierr), it seems that you didn't set the MPI_PATH, INC_MPI and LIB_MPI  correctly. Please check it.

Contact Us
Technical Support Yu Liu, Weiwei Wang [email protected]
Media Jie He [email protected]
Collaboration Vangel Bojaxhi [email protected]
General Information [email protected]

 

Partners      Follow us
Copyright 2019 Asia Supercomputer Community. All Rights Reserved