Scheduler 2.0

This document describes changes made to the scheduler as of 2.0.0.

Design Goals

There are several problems with the previous scheduler that we need to address with the new version. These include:

  • Improved agent communications, reducing overhead for the scheduler when communicating with agents
  • Define an easy to understand and consistent terminology for talking about data moving through fossology
  • Job preemption, if a more important job enters the system, all agents working on the less important job will be preempted.
  • Move away from unified Scheduler.conf file, move to Agent.conf and Host.conf files
  • Create an easy to use agent API that takes care of communication, heartbeat, etc...
  • Make it possible for more than one instance of an agent to be working on a single job.
  • Make it easier to understand how information is saved within the scheduler, i.e. make it easier to understand what is what within the scheduler
  • Improve logging, i.e. any error information generated by a job is stored with the job in the database instead of one unified log file.

Quick Explanation of terminology

There are several terms that I will use later that need some clarification: * agent: this is the actual process that the scheduler will be communicating with. An instance of copyright would be an example of an agent. * meta agent: information necessary for the creation of an agent. Copyright would have just one meta agent within the new scheduler * job stream: all of the steps necessary to get a particular upload from unpack all the way to its finish state. A list of jobs. * job: a specific step in the process of working on an upload. Each job would be associated with a different type of agent. For example if a person uploaded a .tar file and wanted license and copyright analysis run on the file, there would be a job for the unpack step, a job for the license step and a job for the copyright step. There may be multiple agents allocated to any one job.

this terminology is not fixed and is subject to change.

Scheduler Architecture

The hardest problem to fix for the new scheduler will be agent communication. Every agent needs a different amount of information (wget needs URL and file path, but nomos just needs the upload_pk for database access), but the amount of information distributed this way by the scheduler is causing the scheduler to spend all of its time communicating with agents. We have decided upon using a classic client server communication style for both the agent communication and the UI communication. Every time a new communication channel is needed (whether with the agents or the user interface), the scheduler will create a new thread to manage the communications with that channel. This makes the communication logic much simpler.

  • for the following figures, the formating is as such: * Blue boxes are individual computers * Red boxes represent a process, anything inside a red box must be running on only one system * green boxes represent different threads within a single system * Purple boxes represent some type of memory construct

Multi-thread the scheduler

{{ :task:scheduer_threaded.png |Proposed Multi-threaded scheduler layout}}

Within the scheduler, the master thread would take a job out of the job queue and perform the SQL query to get all of the information that will be passed to the agent. It then creates a new job with this information by spawning a new thread for each agent that it is going to have run on this data. These threads perform the IO with the agents and report any status changes back to the master thread so that it can take action.

pros: * Scheduler can take advantage of multiple processors on whatever machine it is running on. * Simplified code for communicating with agents since all IO would be blocking instead of non-blocking as it is in the current scheduler. * Master thread can not get swamped with communications between the agents and can concentrate on managing new jobs. * Would use the pthread api. Using pthread_mutex and pthread_wait_condition are much easier than inter process synchronization.

cons: * debugging a multi-threaded process can get extremely complicated. (big problem) * Master thread still performs the starting database query. (small problem)

No more Scheduler.conf

That is not entirely true, Scheduler.conf does still exist for the new scheduler, but in a very stripped down version. Currently, the main purpose of Scheduler.conf is to provide a list of agents to start on other machines. This will now be handled by two new types of files, agent.conf files and host.conf files. Every agent.conf file will specify the information necessary to start and manage one type of agent. Each host.conf file will provide the information necessary to start and manage an agent on any one machine, including localhost. The Agent.conf and Host.conf files will each be placed in a known location and the scheduler will simply read every file in those locations. This means that unlike the current scheduler, where adding a new agent involves changing mkschedconf.c and recompiling, the new scheduler will only need the new Agent.conf to be placed in the correct directory followed by a scheduler reload. It also means that the scheduler will be able to start any type of agent it would like on any of the host machines instead of being limited to what is specified in Scheduler.conf. There are allowed to be as many Agent.conf files as needed so long as they are in the correct location.

Agent.conf's

With the new scheduler each agent will come with its own configuration file.

Each configuration file must provide the following:
  • command: the command line that will be used when starting the agent
  • max: the maximum number of instances of this agent running concurrently, -1 for no max.
  • special: list of anything special about the agent.
Possible special falgs:
  • EXCLUSIVE -- the agent will not run concurrently with any other agents
  • NOEMAIL -- when the agent finishes running, it will not generate an email notification
  • LOCAL -- only run this agent on the same computer as the scheduler

An example Agent.conf for copyright:

; scheduler configure file for this agent
[default]

; command: The command that the scheduler will use when creating an instance of this agent. 
; This will be parsed like a normal Unix command line.
command = copyright

; max: The maximum number of this agent that is allowed to exist at any one time. 
; This is set to -1 if there is no limit on the number of instances of the agent.
max = -1

; special: Scheduler directive for special agent attributes.
; A comma separated list of values.
; Directives:
;     EXCLUSIVE: the agent cannot run concurrently with any other agent.
;     NOEMAIL: do not send notification emails when this agent finishes.
;     LOCAL: only run this agent on the same computer as the scheduler.
special = 

fossology.conf (previously Scheduler.conf)

This will now hold all fossology configuration information that is not agent specific (i.e. doesn't belong in an agent.conf file). This will be formatted to match the php.ini file to make reading it from the user interface very easy. A default fossology.conf is provided during install of fossology and should be self explanatory.

Common UI changes

The old scheduler had different modes for running with command line input and without command line input. The new version of the scheduler removes this distinction and provides a single point of entry for both the GUI and the CLI. To accomplish this the scheduler will listen on a port specified in the conf file. Anytime a new connection is made on the port, the scheduler will open a set of sockets for the connection and wait for credentials. Anytime someone logs into the web interface, the web interface will open a new connection to the scheduler and forward the persons credentials. This means that the CLI can be another program that also opens a connection on the port and forwards anything entered to the scheduler. After any command is entered, the scheduler will respond with received on the same socket. This response will be sent, even for commands that already have a response, such as status.

This is very nice from the scheduler's point of view since both the GUI and the CLI can be treated identically. It also simplifies many of the notification procedures within fossology:
*Currently every job stream ends with a job that informs the UI that it has successfully run, in the new version the scheduler can simply inform the GUI that it is finished. Instead of the scheduler status needing to be monitored.
*Currently the scheduler status is saved in the database. This is not a good practice since the scheduler status isn't persistent information and would change if the system were to crash. For the new scheduler, the GUI will simply be able to ask the scheduler for its status using a socket authenticated by whoever requested the status.
*It will be possible to create a CLI for a scheduler that is already running. This has the added advantage that many actions that would currently require a scheduler restart can simply become commands entered using the CLI. Ideally, the only thing that should need a scheduler restart is a re-installation of fossology or the scheduler.

List of commands that can be sent to the scheduler: * close: This is technically to the GUI or CLI, this will close the connection to the scheduler. * stop: The scheduler will gracefully shutdown. Depending on what is currently running, this could take some time. * pause <job_id>: The job provided will pause. All agents associated with the job will stop using processor power. * agents: The scheduler will respond with a space separated list of valid agents * reload: The scheduler will reload the configuration information for the agents and hosts. * status: The scheduler will respond on the same socket with all of the status information * status <job_id>: the scheduler will respond with a detailed status for the specific job * restart <job_id>: the scheduler will restart the job specified. Used exclusively on jobs that have been paused, if the job isn't paused this will error. * verbose <level>: the verbosity level for the scheduler will be changed to match level. * verbose <job_id> <level>: the verbosity level for all of the agents owned by the specified job will be changed to level. * priority: changes the priority of a particular job within the scheduler, this does not change the priority of the related job in the db. * database: this causes the scheduler to check the database job queue for new jobs.

The only command that should produce a response from the scheduler is the status command. If a job_id is not sent with the command the response will look like this and there will be one line for each job following the initial line: # scheduler:[#] daemon:[#] jobs:[#] log:[str] port:[#] verbose:[#] # job:[#] status:[str] type:[str] priority:[#] running:[#] finished[#] failed:[#] # job:[#] status:[str] type:[str] priority:[#] running:[#] finished[#] failed:[#]
.
.
. # end

If a job_id is provided then the formating will look like this and there will be one line line for each running/finished/failed agent after the initial line: # job:[#] status:[str] type:[str] priority:[#] running:[#] finished[#] failed:[#] # agent:[#] host:[str] type:[str] status:[str] time:[str] # agent:[#] host:[str] type:[str] status:[str] time:[str]
.
.
. # end

It is also important to note that if a user or the scheduler pauses another job it only releases the job's hold on the processor. Any memory that has been allocated by the agents that belong to the job will remain allocated.

What is What and Who owns Who

One of the goals of the new scheduler is to make it easier to understand how information is processed within the scheduler. This section is an explanation of exactly that. For this it is important to understand some of the inner workings of the new scheduler, and specifically how it is multi-threaded. The multi-threading within the scheduler is there simply to facilitate communication with the agents and any UI currently connected to the scheduler. Every time a new agent is created, a new thread is spawned to manage its communications. If the communication thread receives anything that would involve changing a data structure internal to the scheduler, it passes the information off to the main thread instead of changing it personally. This is done to simplify the need for locking large data structures and to hopefully eliminate the possibility of a race condition within the scheduler. The communication threads send information to the main thread using events. The communication thread will package the function and arguments and pass it to a concurrent queue that the main thread is waiting on.

When a new job stream is found in the job queue, the scheduler will create only the jobs for that job stream that have all the preconditions fulfilled. Once created the jobs will have agents allocated to them. Allocated agents belong to the job and will remain in the scheduler's data structures until the job is removed from the system. Even if agents fail, the information about what the agent was processing is needed since it will be given to a different agent in an attempt to resolve the failure. Therefore, within the system, a job stream owns jobs, and these jobs own agents. Jobs are responsible for cleaning up any agents allocated to them.

Within a job, when an agent is ready for data, it will inform the main thread that it is waiting. The main thread will then take a chunk of data from the job that the agent belongs to and allocate it to the agent. The main thread will then inform the communication thread that there is pending data. The communication thread will then be responsible for sending the data to the corresponding process. It is important to note that the communication threads are using blocking IO on the pipe from the corresponding process. The only way for the main thread to wake a communication thread is to write to this pipe. As a result, any string that starts with "@" is reserved as a communication from the scheduler instead of the corresponding process. Writing anything that starts with "@" to stdout within an agent will result in undefined behavior.

Log file improvements

The improvements suggested below have been implemented and documented here

The current log file is hard to read and understand. Messages are interlaced, their source is hard to decider and the formatting leaves something to be desired. Since the scheduler is getting a complete rewrite, it seemed like a important step to create new logging guidelines. Several steps have been taken to ensure a much more readable log file:

  • before all log messages, in addition to the time stamp a source will be included. Sources will be in all caps in an attempt to make them easier to identify.
  • all agent messages will include the job that owns the agents, the agent's pid and the type of agent (i.e. nomos, buckets, copyright, etc...)
  • every line in the log file will start with the time stamp to make it easier to read.
  • messages from agents will be written to a separate log file

Within the logging system there are now going to be 4 different logging functions in an attempt to make the log readable.

  • lprintf: acts much like a standard printf but prints formated data to the log file. The formating for this data is extremely strict. All lines will be perpended by a time stamp.
  • alprintf: This will print to the agents log file instead of the printing to the scheduler log file. Other than this it is functionally identical to lprintf
  • vlprintf: identical to lprintf() but takes a va_list. This provided exactly the same functionality as vprintf() does for printf().
  • clprintf: identical to lprintf() but should be used when printing to the log file from any thread that isn't the main thread. This essentially creates an event that will call lprintf to prevent messages from getting intermixed.

Setting Scheduler verbosity

The new scheduler uses a very powerful system for the verbosity levels, it combines the use of a bit mask with standard verbosity levels. This makes it possible to get powerful debugging info while making non-debugging info easy to understand. Any verbose level below 8 will be interpreted strictly as a verbosity level. For these verbose levels, if 1 is turned on, so is verbose level 0. Anything 8 and above will be interpreted as a bit mask for debugging the scheduler. For the bit mask, each bit will correspond to turning on verbosity for a particular source file. The highest order bit is used for things that should be file independent. Currently only the SPECIAL command from the agents is under this category as it can create a large amount of useless information in the log files.

The currently defined levels:
0: Scheduler will only print errors. All other logging is disabled
1: Scheduler will print errors and notifications. Notifications include the scheduler startup and shutdown messages
2: Scheduler will print errors, notifications, and warnings.

The verbosity levels can be set in the fossology startup file (e.g. /etc/init.d/fossology) with SCHEDULEROPT. By default the value is set to

SCHEDULEROPT="--daemon --reset --verbose=1"
The currently define bit mask:
mask relevant file integer value for fo_cli
00000001000 job.c 8
00000010000 agent.c 16
00000100000 scheduler.c 32
00001000000 event.c 64
00010000000 interface.c 128
00100000000 database.c 256
01000000000 host.c 512
10000000000 all files 1024

The debugging levels can be set using the scheduler command line tool, fo_cli.

.../fo_cli  -v <integer>

NOTE: this can result in very large values for the verbosity level when turning on debugging output. For example, a verbose level of 1016 is the smallest value that will turn on all verbosity levels.

Agent API

When creating the new Agent API it was goal to make it as easy to use as possible while providing all the power needed by the scheduler. Goals when developing the new api included:

  • making the need to print "OK" and "BYE" more transparent.
  • adding a set of verbose and error functions that provide correctly formated data to the scheduler.
  • make setting up the heartbeat simpler.
  • adequate documentation for how the hand shake between the agent and the scheduler should take place.

Towards this goal there were 5 functions created that handle all of the hand shake between the scheduler and the agents.

  • fo_scheduler_heart: agents should call this to update the number of items processed by the agents.
  • fo_scheduler_connect: an agent should call this before calling any other api functions or parsing their command line arguments
  • fo_scheduler_next: an agent can call this every time it needs a new piece of information from the scheduler. When this function returns NULL, the agent should clean up any allocated memory and call the next method.
  • fo_scheduler_disconnect: This closes the connection to the scheduler and provides the scheduler with the return code of the agent. Anything printed by the agent after this will not get placed in its log file or interpreted by the scheduler.
  • fo_scheduler_set_special: This allows the agent to set a special attribute about itself. Currently the only option is SPECIAL_NOKILL which causes the scheduler to not kill the agent event if it doesn't seem to be processing information.

There is also now a verbose flags defined by the api. All agents should use this as their verbose flag instead of whatever they are using now. This is implemented this way so that the scheduler is able to change the agent's verbose level. In addition to this flag, there are several verbose macros that can be used to correctly interpret the verbose flag if an agent does not wish to keep track of verbose level personally. All of these macros take the form of "VERBOSE#" where # is the verbose level ranging from 1 to 4. Beyond this they act exactly like a standard printf.

example minimum agents:

/* include the header for the fossology api */
#include <libfossology.h>

int main(int argc, char** argv)
{
  /* before parsing argv and argc make sure   */
  /* to initialize the scheduler connection   */
  fo_scheduler_connect(&argc, argv);

  /* we can set the agent to not be killed    */
  /* even if it appears to be dead            */
  fo_scheduler_set_special(SPECIAL_NOKILL, TRUE);

  /* enter the main agent loop, continue to   */
  /* loop until receiving NULL                */
  while(fo_scheduler_next() != NULL)
  {
    /* current simply provides access to last */
    /* message from the scheduler             */
    printf("%s\n", fo_scheduler_current());

    /* call the heart beat with the number of */
    /* items processed since last call        */
    fo_scheduler_heart(1);
  }

  /* after cleaning up agent, disconnect from */
  /* the scheduler. 0 indicates the agent ran */
  /* correctly. Anything else is a failure    */
  fo_scheduler_disconnect(0);

  return 0;
}