If I need to run "many serial programs" in parallel (because the problem is simple but takes time - let me read Required in many different data sets for the same program), the solution is simple if I only use one node whatever I do, serial jobs with an ampersand after every order The submission continues, e.g. In Job Script:
./program1 & amp; ./program2 & amp; .program3 & amp; ./program4 which will run each serial program naturally on a different processor. It works well on a login server or standalone workstation, and of course only asks for one node.
But if I set 110 different instances of data to run the 110 different instances of the same program? If I present a number of nodes (say 14) with a script that submits an order of 110./program#, will the batch system run every job on different processors in different nodes, or will they Will all try to run, 8 core nodes?
I have tried to use a simple MPI code to read various data, but the result of many errors, 100 out of 110 processes are being successful, and others are crashing. I have also accepted job arrays, but I'm not sure my system supports it.
I have tested large scale serial programs on individual data sets - there are no runtime errors, and I have available memory on each node
No, PBS will not automatically distribute jobs between your nodes. But this is a common thing that you want to do, and you have some options.
-
The simplest and in some ways is to tweak your work in the most beneficial 1-node shape, chunks, and present those bundles as personal jobs. Will start fast; A 1-node job is normally set faster than 14 (No.) jobs, just because there are more than 14 schedules having one-node-shaped holes. It works especially well if all jobs take almost the same amount of time, because then it is very easy to split.
-
If you want to do it in the same job (say, to make bookkeeping easier), then you may or may not be in Pbsdsh command ; There is a good discussion about this, it allows you to run a script on all of your job processors. You then write a script which asks which NNODS PBS_VNODNYU of PNB job * PPN jobs, and runs the appropriate work.
-
If not pbsdsh, then the other tool can simplify these tasks, it is like zaragha, if you are familiar with it, but in parallel, with many nodes , Will run. So you'll collect your (say) 14-node job and run a gnu parallel script in the first node. The good thing is that it will also schedule for you, even if the job is not all of the same length. For this type of things, we recommend to users on our system to use gnu parallel, so note that if gnu parallel is not installed on your system, and for some reason your cesadmins will not do this, then you It can be set in your home directory, it is not a complex build.
Comments
Post a Comment