Home    About me    Publications    Blog    Tweets    Contacts

Fabio Ruini's blog

'cause Italians blog better

Habemus Grid (forse)

Ok, magari non si tratta di una grid composta da mille o piu’ di mille macchine come al Virginia Tech, ma diciamo che ora ci siamo piu’ vicini di prima. P-ARTS, acronimo per Plymouth Advanced Robots Training Suite, e’ finalmente in piedi. Solo in locale, per ora, visto che stiamo aspettando che i tecnici assegnino gli IP fissi alle varie macchine, ma ad ogni modo c’e’ e funziona. Nonostante qualche problemino (non relativo al deployment) con l’esecuzione da parte di Xgrid delle applicazioni sviluppate in Qt, che speriamo di poter risolvere al piu’ presto.

Nel frattempo, visto che Mac OS X server provvede un server web (come d’altronde anche la versione client) con tanto di Wiki pre-configurato, perche’ non farne uso? Ecco allora che ho buttato giu’ qualche riga per gli utenti che dovranno utilizzare P-ARTS.

How to submit a job to the grid

First of all, it is of fundamental importance to know that jobs should be submitted to the grid using shell scripts. In the simplest scenario possible, the script will just provide to run the executable generated by our software once. In most complicated scenarios, the script can also do additional stuff, as running multiple instances of the software passing them different input parameters. For more information about how to create a Bourne Shell script, have a look to the this link.

Once our script has been prepared, the syntax to be used for submitting a job to the controller is the following:

$ xgrid –h < controller’s IP > -p < password > –job submit < sh script > -in < directory where the script is located >

For example, if the Xgrid’s controller is identified by the IP address 192.168.1.1 (with password “password”), the script to run is called start_app.sh, and it is stored within the Release sub-directory of the current working directory, the command to use in order to submit the job is the following:

xgrid -h 192.168.1.1 -p password -job submit Release/start_app.sh -in Release

Pay attention on how the “/” character is used. When this symbol is present at the beginning of a path (and we are therefore referring to an absolute path), the xgrid command will not provide to copy the .sh file on the agent machines, assuming it is already present on them (and on the same location as well). This will often provoke a runtime error, since we don’t have first-hand control on the agent machines. Vice-versa, if we don’t use the “/” character (and we are therefore referring to a relative path) both the script and all the other files included in the directory specified after the –in parameter passed to the xgrid command will be copied on the agents.

How to check the status of a job

When we submit a job to the controller, we should receive a response from the controller similar to the following one (a part of course for the number):

{
jobIdentifier = 1869;
}

jobIdentifier is the ID of the job we have submitted. It is important to keep note of this number, since it is required in order to check the status of the job and, even more important, to retrieve the data generated by it.

The command:

$ xgrid –h < controller’s IP > -p < password > –job list

returns the list of all the jobs present on the grid, without caring about their status (they could be on queue, running, finished, suspended, etc.). The output produced by this command is something like:

{
jobList = (
1855
1868
1869
);
}

Once we know the ID of the job we are interested in seeing the status, we can obtain it running the command:

$ xgrid –h < controller’s IP > -p < password > –job attributes –id < jobID >

The typical output (generated in this case from the previous example) looks like:

{
jobAttributes = {
activeCPUPower = 0;
applicationIdentifier = “com.apple.xgrid.cli”;
dateNow = 2008-10-22 18:25:46 +0100;
dateStarted = 2008-10-22 18:06:23 +0100;
dateStopped = 2008-10-22 18:06:23 +0100;
dateSubmitted = 2008-10-22 18:06:23 +0100;
jobStatus = Finished;
name = “Release/start_app.sh”;
percentDone = 100;
taskCount = 1;
undoneTaskCount = 0;
};
}

The value of jobStatus is typically the parameter we are interested in. When our job is “Finished” we can proceed to collect the results it has generated.

How to retrieve the data generated by a job

When a job has finished, it is possible to retrieve all the data it has generated using the command:

$ xgrid -h < controller’s IP > -p < password > -job results -id < jobID > -out < local directory where to store the data >

This command will copy the entire working directory (inclusive of all its sub-folders) created on the agents during the execution of the job into the directory of the local machine specified after the parameter –out. Please consider that this directory has to be already existent when the command is run.

Always referring to the previous example, this command might look like this:

$ xgrid -h 192.168.1.1 -p password -job results -id 1869 -out /Users/Shared

If our program has created, for example, two text files and two sub-directories containing other two files each, the 6 files and the directories structure will be copied into the directory /Users/Shared of our machine (the machine from which the command above has been executed).

How to suspend/resume, stop/restart or delete a job

Once the job we have submitted to the grid has finished and we have collected all the generated data, it is important to remember that we need to delete the job from the controller. In order to accomplish this task, we use the xgrid command with the following syntax:

$ xgrid -h < controller’s IP > -p < password > -job delete -id < jobID >

Replacing the parameter delete with suspend, resume, stop or restart, it is possible to respectively suspend, resume, stop or restart a job present on the grid.

Non ci sono ancora commenti. Vuoi essere il primo?

Lascia un commento

%d blogger cliccano Mi Piace per questo: