Jim Zajkowski

Munki Client Scheduling

May 09, 2023

A question on the MacAdmins Slack asked how groups handle scheduling Munki. We built out a server-side solution that offers a lot of flexibility with a very simple API.

Our requirements:

  • Allow users to run Managed Software Center manually at any time.
  • Provide for a number of flexible scheduling policies, such as daily maintenance windows, or delay until a future date for faculty traveling to low-connectivity areas like sub-Saharan Africa, and so on.
  • Be able to change schedules without client code changes.

First, we have a step in the munki preflight: it checks to see if the current munki run is auto, and if it is, it runs our scheduling client:

1if [[ "$1" == "auto" ]]; then
2    # if the scheduler returns zero, that means run now;
3    # anything non-zero means we want to stop running.
4    if ! /usr/local/izzy/bin/izzy scheduler; then
5        exit 1
6    fi
7fi

The izzy schedule command calls a JSON API and provides the client’s current timezone offset. The server either returns a “ask me again after” time, or the token now. If it’s not now, that time is stored on the client to preempt polling the server.

izzy schedule
 1let izzy = IzzySwift()
 2let nextActionSleepfile = "/var/tmp/next_izzy_action_at"
 3let fileManager = FileManager.default
 4let now = Date.init()
 5let zone = TimeZone.current
 6
 7// if we have a sleep file, and it's got an mtime in the future, just bail now.
 8if fileManager.fileExists(atPath: nextActionSleepfile),
 9    let attribs = try? fileManager.attributesOfItem(atPath: nextActionSleepfile),
10    let modifiedTime = attribs[FileAttributeKey.modificationDate] as? Date,
11    modifiedTime > now {
12        print("Scheduler: \(nextActionSleepfile) exists and has an mtime in the future; sleeping")
13        throw ExitCode.init(1)
14}
15
16// Ask izzy what to do
17let utcOffset = zone.secondsFromGMT()
18if let response = izzy.getJson(path: "/api/v1/next_action_at",
19                               params: [ "offset": String(utcOffset) ]) {
20    // good response
21    if response["status"] as! String == "success",
22        let nextAction = response["next_action_at"] as? String {
23
24        // now
25        if nextAction == "now" {
26            print("Scheduler: IzzyWeb says run now.")
27            // delete touch file
28            if fileManager.fileExists(atPath: nextActionSleepfile) {
29                try! fileManager.removeItem(atPath: nextActionSleepfile)
30            }
31            throw ExitCode.success
32
33        } else {
34            // create a touchfile for the next action, so we don't re-hit the API over and over
35            if let date = nextAction.toDate() {
36                fileManager.createFile(atPath: nextActionSleepfile,
37                                       contents: nil,
38                                       attributes: [FileAttributeKey.modificationDate: date.date])
39                print("Scheduler: updated \(nextActionSleepfile); deferring for now.")
40                throw ExitCode.init(1)
41            }
42        }
43
44    // failed
45    } else {
46        print("Scheduler: got an error: \(response)")
47        throw ExitCode.failure
48    }
49}

The scheduling logic server is implemented in Rails, using a STI polymorphic ClientScheduler class with a single required method, next_action_at().

1class ClientScheduler < ApplicationRecord
2  # All concrete schedulers implement this method
3  def next_action_at(tz)
4    raise NotImplementedError, "Abstract base class"
5    nil
6  end
7end

The simplest schedulers are the HourlyScheduler, which always thinks it’s a good time to run, and the NeverScheduler, which always returns 6 hours from now.

 1class HourlyScheduler < ClientScheduler
 2  def next_action_at(tz)
 3    :now
 4  end
 5end
 6
 7class NeverScheduler < ClientScheduler
 8  def next_action_at(opts)
 9    Time.now + 6.hours
10  end
11end

The most complicated is the MaintenanceWindowScheduler. The server uses the client’s provided timezone to decide whether the system is in or out of its window, rather than relying on the server’s time.

MaintenanceWindowScheduler
 1class MaintenanceWindowScheduler < ClientScheduler
 2  def next_action_at(opts)
 3    if opts.nil? || opts[:offset].nil?
 4      raise "Need :offset specified"
 5    end
 6
 7    offset = opts[:offset].to_i
 8    local_time_in_utc = ActiveSupport::TimeZone.new("UTC").now
 9    midnight_in_utc = ActiveSupport::TimeZone.new("UTC").parse("00:00").to_i
10
11    client_offset_from_midnight =
12        local_time_in_utc.to_i - midnight_in_utc.to_i + offset
13    if (client_offset_from_midnight < 0)
14      client_offset_from_midnight += 24.hours
15    end
16
17    local_window_start_time = self.window_starts_as_offset
18    local_window_end_time = self.window_ends_as_offset
19
20    # Non over-midnight window (eg, 4 PM to 8 PM)
21    if (local_window_end_time > local_window_start_time)
22
23      # In the window?
24      # _________S###^####E_____
25      if (client_offset_from_midnight >= local_window_start_time &&
26          client_offset_from_midnight <= local_window_end_time)
27        return :now
28      end
29
30      # Compute next start time
31      if (client_offset_from_midnight >= local_window_end_time)
32        # Next start is tomorrow - window passed today
33        next_start_time = local_window_start_time + 24.hours
34      else
35        # Next start is tonight - window hasn't arrived yet
36        next_start_time = local_window_start_time
37      end
38
39    # If the end time is earlier than the start time,
40    # the window spans midnight eg 8 PM to 8 AM
41    else
42      # Treat it as two windows, one from midnight to the
43      # end time, and another from the start time until midnight
44      # ###t##E____________S###t##
45      if (client_offset_from_midnight <= local_window_end_time ||
46          client_offset_from_midnight >= local_window_start_time)
47        return :now
48      end
49
50      # Next time is always later today
51      next_start_time = local_window_start_time
52    end
53
54    # 6 hours or the next start time, whichever comes first
55    time_until_next_start =
56      [ next_start_time - client_offset_from_midnight, 6.hours ].min
57    return local_time_in_utc + time_until_next_start
58  end
59end

If the client isn’t in its maintenance window, it returns a “ask again” time of either the start of the window or six hours in the future, whichever comes first. Remember: the client doesn’t re-poll the server before this time; six hours is a reasonable choice between lowering network traffic and responsiveness to server-side schedule changes.

We’re able to build a number of other complicated policies just by implementing a class with this single next_action_at method. Over the last five years, it’s been quite flexible and durable.