Finals

summary

Q1: Can we use pre-trained model for the FSR problem? Is it a must to train the model with the training dataset provided by the committee?
Answer: Pre-trained model is allow to be used in the final competition. There is no strict requirement to train the model with the training dataset provided by the committee.

Q2: When studying wtdbg2, we found that the result might vary under same configurations(same software environment, hardware environment, and parameters), so how can we verify the correctness of the output?
Answer: Because there are random functions in the application,so the results will be slightly different, it is normal. You can check the value of “TOT” 、“N50” and some other indicators which recorded in log file, compare it with standard results of your test data(if your test data provided the reference result), if the error rate within |15%|, the result is correct.

Q3: About the math optimization, which optimization can be allowed?
Answer: You should notice the sentence in the final competition notification, you should ensure that the new algorithm must be mathematically equivalent to the original one.

Q4: which namelist can be modified?
Answer: There are many parameters in the namelist variable. For details, please refer to  http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/modelnl/modelnl.html
Because the final competition is mainly to see the running time, the parameters of the namelist file are not allowed to be modified in principle. But the namelist related with the communication and process division related to parallel can be tried. Which parameters can be modified will be issued from the instructions on site.

Q5: Regarding the FAQ on the official website of the game, I have a bit of suspicion: the FAQ mentioned that the example provided in the preliminary round can use 64 cores, we now have the confidence that such data is not scientific, I hope the game can be read carefully and make a reasonable explanation.
Answer: In the preliminary game, there was a mistake in replying. After checking the results of the previous operation, the result of the preliminary competition was running one day using 192 cores in our platform. Please check the results.

About SR

Q1: Where can I get the download link of dataset that provided by the committee?
Answer: The training dataset and test dataset will not be provided before the final competition.

Q2: Is it the only requirement to achieve a higher IS value in the final competition?
AnswerIn the final competition, the IS value will be the only one metric to measure the performance of FSR model.

Q3: Are the images in the dataset similar to the example in the notification?
Answer: Yes, the images in the dataset provided by committee will similar to the example in the final competition notification. There will be some background noise in the face images.

Q4: In SphereFace paper, the cropped faces are obtained by similarity transformation with face landmarks detected by MTCNN. I'm wondering if the same preprocessing method will be used in the ASC contest as well.  If so, will the landmark detecting code be given on the spot in the final competition?
Answer: MTCNN will not be used in the final competition. So, the landmark detecting code will not be provided.

Q5: Can we use a new network in the final competition?
Answer: A new network and a new model is allowed to be used in the final competition.

Q6: What is the maximum size of face images? Will it exceed the size of 150x150?
Answer: The size of LR and SR images will be given before the final competition. The size will not exceed 150x150.

Q7: Will the landmark of face images be provided in the final competition.
Answer: The landmark will not be used in the scoring script.

Q8: Can we use Horovod for PyTorch to accelerate and parallelize the training process.
Answer: In the final competition, Horovod is allowed to be used to accelerate the training process.

Q9: How many pictures are there in the train set?
Answer: The number of images in the training dataset provided by committee will not less than 5,000. The number of images in the test dataset will not less than 100.

Q10: Is there side faces?
Answer: There will be side face images in the training dataset and test dataset.

Q11: Can we use other dataset to train our model?
Answer: Arbitrary training dataset is allowed to be used in the final competition.

Q12: Is it necessary to train the model in the final competition?
Answer: It is not strictly required to train the FSR model during the final competition.

Q13: Will the final dataset include the labels (like the names of the people) of the face images?
Answer: Label is not required in the FSR challenge. Therefore, it will not be provided.

Q14: Whether the final scoring script includes the evaluation model network architecture and parameters that we can use for self-evaluation.
Answer: The scoring script will contain the evaluation model network architecture and parameters.

Q15: Whether the resolution of the face images will be fixed in the final competition?
Answer: The resolution of the face will be fixed in the final competition.

Q16: Is there any noise in the face images (such as background, clothes)?
Answer: There will be some background and clothes in the face images.

Q17: Will the face detection model be used to extract the face of the SR picture and then evaluate the result?
Answer: The face detection model will not be used in the final competition.

Q18: When can I get the scoring script?
Answer: The scoring script will be provided on the spot of the final competition.

About WTDBG

Q1: Which version of WTDBG should I select ? The latest version of WTDBG is V2.4, but git link of WTDBG in the “ASC Final Competition Notification Techsupport” is V2.3. Could I use the latest WTDBG V2.4 ?
Answer: The version of WTDBG we announced in “ASC19 Final Competition Notification Techsupport” is V2.3, but it just is an example. In fact, we don’t restrict the version of software. If there is an updated version of WTDBG, we recommend using the latest version of WTDBG.

Q2: When studying wtdbg2, we found that the result might vary under same configurations(same software environment, hardware environment, and parameters), so how can we verify the correctness of the output?
Answer: Because there are random functions in the application,so the results will be slightly different, it is normal. You can check the value of “TOT” 、“N50” and some other indicators which recorded in log file, compare it with standard results of your test data(if your test data provided the reference result), if the error rate within |5%|, the result is correct.

About CESM

Q1: Can we use other CESM versions? (such as CESM2), or only can we use CESM1.2.2?
Answer: you should try to use CESM 1.2.2 as much as possible. The CESM in the final competition used cesm1.2.2 for testing. It is not guaranteed that cesm2 can run correctly.

Q2: When studying CESM, I found that there is an external library (ESMF) that can be used as the coupler/driver of CESM. The question is: Are we allowed to use that external library? Will it affect the output result of CESM?
Answer: You can use the ESMF.

Q3: As for CESM, we found that the outputs are volatile with some compilation flags, PEs layout and other optimizations. Although the evaluation method has been specified in the Preliminary Contest Notifications, its acceptable interval of RMSE is still unknown, which makes us afraid of our modifications may generate a result that will be regarded as a wrong answer. Could you specify the evaluation in final competition? 
AnswerThe RMSE calculated by the diagnostics package of the cesm1.
The reference values can refer to the following website please:
http://www.cesm.ucar.edu/experiments/cesm1.0/diagnostics/trk1_1deg_chm_1850_b55.01/atm_70-99-obs/set1/table_GLBL_ANN_obs.asc
http://www.cesm.ucar.edu/experiments/cesm1.0/diagnostics/trk1_1deg_chm_1850_b55.01/atm_70-99-obs/set1/table_GLBL_ANN_obs.asc

The website is for reference only and the reference values for different questions may be different. As long as the error does not exceed 1/10 of the reference value, we think the result is right. Under normal circumstances, the error caused by the compile option and optimizations will be within this range.

Q4: In cesm1.2.2, the user can modify the default namelist variable by modifying the user_nl_* file to make some dynamic adjustments to the module's operation. We found that some of the namelist variable settings can adjust the algorithm used by the model, which has a certain impact on the efficiency of the model. Our question is: Is there some namelist variable that is not allowed to be modified, if there is a hope that the game can be further explained?
Answer: For all teams to use the same algorithm, the namelist variable is not allowed to be modified.

 Q5: We would like to know more details on the types of CESM runs tested at final competition. More specifically, the compsets and duration of simulations. If we know the duration of each simulation we can estimate a goal for optimization.
Answer: We can't tell you more details about CESM compsets and duration of simulations right now. But what we can tell you is that the compset will contain the CAM module. You just need to make sure your optimization is fast enough.

Preliminary

General Questions:

Q1: Do you have the PPTs which were displayed during the Beijing training camp?
AnswerYes, these PPT files will be uploaded to ASC19 official Website very soon: www.asc-events.org/ASC19/

Q2: We need the license to be able to import the data in Teye application. So where we can get the Teye license?
Answer: You should provide some info, such as name, email, request reason, MAC address. You only have one-month trial period. And we only provide technical support for ASC. So we fail to ask the questions which you met during the software installation.

Q3: The first question about design HPC system, it require based on the Inspur NF5280M5. But our team didn’t have NF5280M5, so can we design the cluster and power evaluation according to the theoretical.
Answer:  The clustering scheme design should be based on Inspur NF5280M5. But it is not demand you to actually build it. Your clustering scheme design should satisfy the requirement and reasonable, theoretical analysis correct, and highlight the design bright spot.

Q4: How to submit the preliminary contest results data?
Answer: You can use FTP to upload your preliminary contest results data. The address of the FPT server is: 47.94.106.168. We highly recommend that all teams using the FTP client to connect FTP server, such as FlashFXP、FileZilla etc. The FTP account and password have sent to the email of team advisor and team leader.
If you are Outside mainland China and have problems with FTP uploading, you can choose network disk such as OneDrive and deliver the address to us in your proposal.
The FTP servers using Cloud hosts. The maximum bandwidth is 50Mb/s. So we suggest you upload your results data as early as you can. Or upload the largest data files in advance. 
Please send your Proposal to [email protected]. Also upload it to FTP server. For CSEM and SISC results data only upload on FTP server.

About AI Application Task - SR

Q1: I want to know how to calculate the RMSE score, as I do not have the ground truth images;
Answer: The committee do not supply the GT images, you can build yourself test-set to evaluate the RMSE of your model;

About HPC Application Task - CESM

Q1: In the notifications, the compset of EXP2 is required to be B1850CN in the title, and B in the text (screenshot below). Should the compset of EXP2 be B_2000(B) or B_1850_CN? They are different compsets and are likely to generate different results.
AnswerThis is a written mistake. Please choose the compset B_2000(B).

Q2: When I running the EXP1, there are some file missing. The error message is as follows: File was not found in svn repo: https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/ocn/docn7/SOM/UNSET
AnswerThis is because DOCN_SOM_FILENAME in EXP1 is UNSET, you should precised the data name of the ocean model-“DOCN_SOM_FILENAME” by the following command:
./xmlchange -file env_run.xml -id DOCN_SOM_FILENAME -val pop_frc.gx1v6.091112.nc
You can find it in the notifications.

Q3: When running the cesm, the cesm will stop because of some missing input data. How can I solve this problem?
AnswerWe only provide the input files mentioned in the file *.input_data_list. The missing data when running the cesm, please download from the following website:
https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/
User name: guestuser
Password: friendly

Q4: How should I control the run length to be 10 years? Should I set STOP_N to 3650 or STOP_DATE to 18600101 (the models seem to start at 1850)? 3650 days doesn't mean strictly 10 years considering leap years, and I want a precise definition of "ten-year run length" from you.
AnswerThe run length can control with STOP_OPTION and STOP_N. You can set STOP_OPTION=nyears and STOP_N=10. In the test of the ASC19, the CALENDAR in env_build.xml is NO_LEAP,We don’t need to consider leap years here. You can refer to Chapter 4 “running CESM” of the CESM’s user guide to study how to run the model.

Q5: Which file does the DOU_T_SROOT/.../log refer to, and which files does Screen output (*log) mean?

AnswerThere will be a log file every time when you run the model. For example, your atm.log.190119-081612.gz refers to the log file generated when you run the model at 8:16:12 on January 19, 2019. You only need to provide a log file corresponding to your final result, do not need to provide all. Here.../ refers to atm cpl ocn and so on. The Screen output(*log)means the screen out of the committed step.

Q6: Is the Evaluation data the last data we measured for our results? I saw that this step is not necessary, does it have an impact on scores?
AnswerThis part of the verification does not need to be done, but we will verify the Evaluation data submitted by you, and the verification whether is done or not will not affect the score. You can just verify the accuracy of the results yourself.

Q7: Can we use CESM1.2.2-FASTCAM instead of CESM1.2.2?
AnswerI have not heard of cesm1.2.2-FASTCAM, and I have not found the corresponding information in the website. What does FASTCAM refer to here? Please provide some instructions about FASTCAM.

Q8: After I successfully run CESM, I found that there is only one file “ccsm_timing_stats” and one “checkpoints” folder in the generated timing folder (there is nothing in the folder), and there is no file related to CESM TIMING PROFILE. Later I changed the "SAVE_TIMING" file in the env_run.xml file, but it’s still useless.
How can I solve the problem of generating CESM TIMING PROFILE?
AnswerFor the information of timing data, please refer to http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/x1516.html .
The CCSM TIMING PROFILE here is not a separate file and is part of the $CASEROOT/timing/ccsm_timing.$CASE.$date file. If there is no timing/ccsm_timing.$CASE.$date file in the $CASEROOT directory, please modify the $CASE.run file:
If ($DOUT_S == 'TRUE') then
   Echo "Archiving cesm output to $DOUT_S_ROOT"
   Echo "Calling the short-term archiving script st_archive.sh"
   Cd $RUNDIR; $CASETOOLS/st_archive.sh
endif
After this command you can added:
cd $CASEROOT

Q9: Can we use E3SM (Energy Exascale Earth System Model) instead of CESM 1.2.2?
Answer: E3SM should require relatively large computing resources. In the competition, many people's computing resources may not fulfil the requirements. In order to compare with other team results, we do not recommend using E3SM.

Q10: How should we save the desired output? The document says we should "extract" the five variables (U10,TS, PS, Z3, SST) "to the specific file". I've managed to read these variables from the history output files with MATLAB. However, each variable of each file is a 2D or 3D matrix. In what format should we save these variables?
Answer:The format should be nc file. You can use the netcdf software to read and also write the nc file. There are many netcdf softwares, such as nco, ncl, the matlab can write the nc file too. The nco may be convenient in my idea.

Q11: Is there a way to verify the model results? We ran CESM on multiple clusters with different compilers, and the results all differ slightly. Optimizations like tuning the compiling parameters also have an impact on the results. Is there a reference result we can compare ours to?
Answer:You can use the AMWG diagnostics package to evaluate your result. It can compare your result with observation data and the model data. It can calculate the RMSE too. Because the diagnostics package is too big to upload, you can download it from the following command:
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/amwg_diag5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/obs_data_5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/previous_versions/map_files_5.6/
svn export https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/cam35_data/

Q12: After run cesm_setup, get following problem, and don't know where I had made mistake, thanks for helping

Answer: Your mistake is about the expression syntax. The preview_namelist script is a csh file, you should make sure that there is a c shell (csh) in your machine.

Q13: I tried to run CE1's EXP1 with multiple threads and found the following errors when running. How can I solve this problem?

Answer:I looked at your 3 mistakes and all the errors occurred when reading the file rtmi.I1850CRUCLM45BGC.0241-01-01.R05_simyr1850_c130515.nc. Please check the size of the file. Maybe the file was not uploaded successfully during the upload process. You can try to download and upload the file again:

Q14: What kind of form should we submit the evaluation data, is it the monthly average, or is it written once a day, or how often?
Answer:Each case will output one result. For cam, there will be two files, h0 and h1. Please provide h0 (monthly file) file.

Q15:After successfully setup the two cases (EXP1 and EXP2) and run for a short model time (5 days), we found that using the default settings, running EXP1 and EXP2 for ten model years on our 4-node cluster (each with two Xeon Silver 4110 CPUs) would take roughly a week (wall time) for each case. Are we expected to observe this long execution time on such a cluster? If not, what are the execution time you expected to run these two cases on our cluster (or your cluster)? On the other hand, while we are trying to reduce the execution time, we study the CESM User Guide and come up with a question. We're wondering if we are allowed to change the coupling interval settings in env_run.xml (such as NCPL_BASE_PERIOD and ATM_NCPL).
Answer: I used 64 cores on my cluster, and each EXP took about 1 day before any optimizations were done. Taking into account the computing resources of all people and the verification of the calculation results, the current running time is ten years. The ASC competition is not attention to your computing resources, but the optimization method. If your computing resources are really limited, you can consider shortening your run time, but don't shorten it too far. This will not have much impact on your performance, and please be sure to explain this in the final discussion. Please do not modify NCPL_BASE_PERIOD and ATM_NCPL, this may affect the results.

Q16: I am surprised to find in my cesm1_2_1/models/utils/pio directory, I cannot find 'configure', which is required for build model. In ASC19 Preliminary Contest Notifications, there is one sentence "If you couldn’t download the pio and genf90 successfully, you could download them form the following websites:", but in the Baidu SkyDrive, I can just find CESM input data.
Answer: The file ASC19_CESM_inputdata.tar.bz2 in the Baidu SkyDrive include the model source code cesm1_2_2, please copy the cesm1_2_2/models/utils/pio to your source code.

Q17: Where can I find the five evaluation data?
AnswerThe variable U10, TS, PS, Z3 can find in the $CASE.cam.h0.*.nc file. The variable SST can find in the $CASE.pop.h.*.nc file for exp2 case. For the case of exp1, ocean model is “dead”, so you don’t need to provide the SST variable.
If your pop file don’t have “SST” variable for exp2 case, you can do the following steps before running the case refer to the Question “CAM: How do I use B compset history output to create SST/ICE data files to drive an F compset?” in Chapter 6 of the CESM USER GUIDE
Save monthly averaged SST information from pop2. To do this, copy $CCSMROOT/models/ocn/pop2/input_templates/gx1v6_tavg_contents, to $CASEROOT/SourceMods/src.pop2 and change the 2 in front of SST to 1 for monthly frequency.
If you have already running the case successfully and don’t want to run the case again, please do the following procedure to obtain the SST from the TEMP variable.
ncrcat -v TEMP $CASE.pop.h.${yyyy}-${mm}.nc temp.${yyyy}-${mm}.nc
ncra -O -h -F -d z_t,1,1 temp.${yyyy}-${mm}.nc sst.${yyyy}-${mm}.nc

Q18: how big is the output file size of these two cases in 10 years? I want to know the 10-year output data size of EXP1 and the 10-year output data size of EXP2.
Answer: After extracting the five variables, the size of the two cases should be around 4.5G after compression.

Q19: When verifying the results, do I need to write RMSE procedure to compare them, or use related tools. For example, can cprnc in tools be used for comparison verification? Is the comparison verification verified by the results of the baseline of ten years.
Answer The review will use the CESM AMWG diagnostics package, or you can write your own RMSE program. The verification results will be compared with the observations and the model data, taking into account both.

Q20: Do we still need to do result verification and error analysis?
Answer: For verification, you only need to provide the data of these variables. We will use the diagnostic package provided by CESM to do the evaluation. You don't need to verify it by yourself. If you want to see the correctness of your results, you can download the diagnostic package yourself to verify. For more information on the extracted variables and verification, you can look at the Frequently asked questions summary published before.

Q21: What does running ten years mean? Which one do I need to generate, .cam.h1.0010-01-01-00000.nc or .cam.h1.0011-01-01-00000.nc?
Answer After running ten years, you should generate the .cam.h1.0011-01-00000.nc.

Q22:  We had tried to compile CESM using the PGI compiler and the GNU compiler. When using the GNU compiler, we turned on O2 level optimization, and everything is working fine. But when we use the PGI compiler, we try to open the O2 level optimization (and the fast flag), the compiler can pass normally, but the CLM cannot be initialized normally at runtime, and all the processes are stuck. We want to know if we finally use the PGI compiler, and do not start the compiler optimization option as a standard (adding the compiler optimization flag does not work properly) to optimize, will it affect our Final score?
AnswerI can't confirm your problem according to the sentence "the CLM cannot be initialized normally at runtime, and all the processes are stuck.". It may be a PE setting problem, a netcdf library mistake, or other reasons. You need to try to use the same pgi compiler to compile the Netcdf. If there is still a problem, please attach your log file(cesm.log*,  cpl.log*  and the clm.log* ), your env_mach_pes.xml and the Macros file.

Q23: What do the Command line file(*.sh) and the Screen output(*.log) mean? And does the Evaluation data mean that I should move to the related nc file to a folder named “Evaluation data”? And how can I View the variable value of the nc file?
AnswerCommand line file(*.sh): You can put the steps from creat_case to run in a script file and name it exp1.sh exp2.sh.
Screen output (*.log), also can be named exp1.log exp2.log.
In the linux system, since the netcdf library is already installed, you can use the ncdump command to view the variable values in the nc file. You can put the nc file in a folder and name the Evaluation data.

Q24: I got an error when I’m running EXP1 , here is the error message of the .log file:
[[email protected]] HYD_pmcd_pmi_alloc_pg_scratch (pm/pmiserv/pmiserv_utils.c:527): assert (pg->pg_process_count * sizeof(struct HYD_pmcd_pmi_ecount)) failed
[[email protected]] HYD_pmci_launch_procs (pm/pmiserv/pmiserv_pmci.c:108): error allocating pg scratch space

I think there must be something run about MPI, but I could run the test cases well. The problem is : Is the running of input_EXP1 limited by memory, storage space or just any computing capability ? Because I’m doing the task with my own computer for testing.
Answer: I think this should be something wrong with your MPI. If there is something limited by memory, storage space or any computing capability, there will not occur such error. You can consider looking at the following website, the error was the same with yours, maybe it will solve your problem.
https://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-March/012006.html

Q25: How to extract the SST variable in the historical file?
Answer:If you have already running the case successfully and don’t want to run the case again, please do the following procedure to obtain the SST from the TEMP variable.
ncrcat -v TEMP $CASE.pop.h.${yyyy}-${mm}.nc temp.${yyyy}-${mm}.nc
ncra -O -h -F -d z_t,1,1 temp.${yyyy}-${mm}.nc sst.${yyyy}-${mm}.nc

From the above steps, the variable in sst.${yyyy}-${mm}.nc is the sea surface temperature, but the variable name is still TEMP, and the variable name can be modified by ncrename.
ncrename –v TEMP SST sst.${yyyy}-${mm}.nc  ${yyyy}-${mm}.nc
Notice that the evaluation data you provided should be the monthly file, it means the variable was from the $CASE.cam.h0. ${yyyy}-${mm}.nc and $CASE.pop.h. ${yyyy}-${mm}.nc

Q26: If we can't provide all the data for ten years, what effect will it have on the results?
Answer: The result evaluation will have a score which is relatively small compared with the optimization method in the proposal. The number of running results you provided has little effect on the score, but if it is not provided, we cannot verify whether your optimization plan is correct.
We will calculate multiple sets of RMSE to compare based on all the team submitted. The uncompleted models will not affect your scores unless you don’t provide any of the results.

Q27: how can I find the historical data?
Answer If you run a CASE successfully, the historical data will store in the DOUT_S_ROOT you set. If you set the running years of 10, but you only running 5 years because of the machine limited, you can find the historical data in the EXEROOT you set.

Q28: We managed to run CESM on a single node, but we encountered segmentation faults when we tried to apply it to our cluster.

Answer:
You can try to use the same NTASKS in env_mach_pes.xml  and NTHREADS to 1, and see if there is a similar error.
Check your mkbatch file for the following commands:
limit stacksize unlimited
limit coredumpsize unlimited
If the problem is solved, then you need to meet the requirements when setting up PES. If you still have problems, please attach your atm log file due to the error happened when Initialize atm component.

Q29: When I run EXP2, the CSM Execution has finished but there are no output file. I think, because there is no "SUCCESSFULLY TERMINATION" In CPLlogFIle, but there are no sufficient information about error. As i attached the log, there is no hint and just mpi killed... I also do 'ulimit -s unlimited' and thread is also 1 so 'openmp stack size' also not related.
AnswerI find in your cesm.log file, the error happed when the output file was written. And you should check your computer resources, such as your memory, your storage space, et.al. There is nothing wrong with the machinefile of CESM you provided, I can run it properly on my machine. But from your log file I only know that the model was killed, can’t see the detailed error information. You can add -traceback (ifort) or -fbacktrace(gfortran) to FFLAGS to compile the model, so that you can find out which procedure caused the MPI killed.

Q30:  I found that the program did not stop running when I left the process for more than 10 hours.
Answer

In your log file, I find that your error occurred when call mpi_init (ierr), it seems that you didn't set the MPI_PATH, INC_MPI and LIB_MPI  correctly. Please check it.

Contact Us
Technical Support Yu Liu, Weiwei Wang [email protected]
Media Jie He [email protected]
Collaboration Vangel Bojaxhi [email protected]
General Information [email protected]

 

Partners      Follow us
Copyright 2020 Asia Supercomputer Community. All Rights Reserved