Munki Client Scheduling
A question on the MacAdmins Slack asked how groups handle scheduling Munki. We built out a server-side solution that offers a lot of flexibility with a very simple API.
Our requirements:
- Allow users to run Managed Software Center manually at any time.
- Provide for a number of flexible scheduling policies, such as daily maintenance windows, or delay until a future date for faculty traveling to low-connectivity areas like sub-Saharan Africa, and so on.
- Be able to change schedules without client code changes.
First, we have a step in the munki preflight: it checks to see if the current munki run is auto, and if it is, it runs our scheduling client:
1if [[ "$1" == "auto" ]]; then
2 # if the scheduler returns zero, that means run now;
3 # anything non-zero means we want to stop running.
4 if ! /usr/local/izzy/bin/izzy scheduler; then
5 exit 1
6 fi
7fi
The izzy schedule command calls a JSON API and provides the client’s current timezone offset. The server either returns a “ask me again after” time, or the token now. If it’s not now, that time is stored on the client to preempt polling the server.
izzy schedule
1let izzy = IzzySwift()
2let nextActionSleepfile = "/var/tmp/next_izzy_action_at"
3let fileManager = FileManager.default
4let now = Date.init()
5let zone = TimeZone.current
6
7// if we have a sleep file, and it's got an mtime in the future, just bail now.
8if fileManager.fileExists(atPath: nextActionSleepfile),
9 let attribs = try? fileManager.attributesOfItem(atPath: nextActionSleepfile),
10 let modifiedTime = attribs[FileAttributeKey.modificationDate] as? Date,
11 modifiedTime > now {
12 print("Scheduler: \(nextActionSleepfile) exists and has an mtime in the future; sleeping")
13 throw ExitCode.init(1)
14}
15
16// Ask izzy what to do
17let utcOffset = zone.secondsFromGMT()
18if let response = izzy.getJson(path: "/api/v1/next_action_at",
19 params: [ "offset": String(utcOffset) ]) {
20 // good response
21 if response["status"] as! String == "success",
22 let nextAction = response["next_action_at"] as? String {
23
24 // now
25 if nextAction == "now" {
26 print("Scheduler: IzzyWeb says run now.")
27 // delete touch file
28 if fileManager.fileExists(atPath: nextActionSleepfile) {
29 try! fileManager.removeItem(atPath: nextActionSleepfile)
30 }
31 throw ExitCode.success
32
33 } else {
34 // create a touchfile for the next action, so we don't re-hit the API over and over
35 if let date = nextAction.toDate() {
36 fileManager.createFile(atPath: nextActionSleepfile,
37 contents: nil,
38 attributes: [FileAttributeKey.modificationDate: date.date])
39 print("Scheduler: updated \(nextActionSleepfile); deferring for now.")
40 throw ExitCode.init(1)
41 }
42 }
43
44 // failed
45 } else {
46 print("Scheduler: got an error: \(response)")
47 throw ExitCode.failure
48 }
49}
The scheduling logic server is implemented in Rails, using a STI polymorphic ClientScheduler class with a single required method, next_action_at().
1class ClientScheduler < ApplicationRecord
2 # All concrete schedulers implement this method
3 def next_action_at(tz)
4 raise NotImplementedError, "Abstract base class"
5 nil
6 end
7end
The simplest schedulers are the HourlyScheduler, which always thinks it’s a good time to run, and the NeverScheduler, which always returns 6 hours from now.
1class HourlyScheduler < ClientScheduler
2 def next_action_at(tz)
3 :now
4 end
5end
6
7class NeverScheduler < ClientScheduler
8 def next_action_at(opts)
9 Time.now + 6.hours
10 end
11end
The most complicated is the MaintenanceWindowScheduler. The server uses the client’s provided timezone to decide whether the system is in or out of its window, rather than relying on the server’s time.
MaintenanceWindowScheduler
1class MaintenanceWindowScheduler < ClientScheduler
2 def next_action_at(opts)
3 if opts.nil? || opts[:offset].nil?
4 raise "Need :offset specified"
5 end
6
7 offset = opts[:offset].to_i
8 local_time_in_utc = ActiveSupport::TimeZone.new("UTC").now
9 midnight_in_utc = ActiveSupport::TimeZone.new("UTC").parse("00:00").to_i
10
11 client_offset_from_midnight =
12 local_time_in_utc.to_i - midnight_in_utc.to_i + offset
13 if (client_offset_from_midnight < 0)
14 client_offset_from_midnight += 24.hours
15 end
16
17 local_window_start_time = self.window_starts_as_offset
18 local_window_end_time = self.window_ends_as_offset
19
20 # Non over-midnight window (eg, 4 PM to 8 PM)
21 if (local_window_end_time > local_window_start_time)
22
23 # In the window?
24 # _________S###^####E_____
25 if (client_offset_from_midnight >= local_window_start_time &&
26 client_offset_from_midnight <= local_window_end_time)
27 return :now
28 end
29
30 # Compute next start time
31 if (client_offset_from_midnight >= local_window_end_time)
32 # Next start is tomorrow - window passed today
33 next_start_time = local_window_start_time + 24.hours
34 else
35 # Next start is tonight - window hasn't arrived yet
36 next_start_time = local_window_start_time
37 end
38
39 # If the end time is earlier than the start time,
40 # the window spans midnight eg 8 PM to 8 AM
41 else
42 # Treat it as two windows, one from midnight to the
43 # end time, and another from the start time until midnight
44 # ###t##E____________S###t##
45 if (client_offset_from_midnight <= local_window_end_time ||
46 client_offset_from_midnight >= local_window_start_time)
47 return :now
48 end
49
50 # Next time is always later today
51 next_start_time = local_window_start_time
52 end
53
54 # 6 hours or the next start time, whichever comes first
55 time_until_next_start =
56 [ next_start_time - client_offset_from_midnight, 6.hours ].min
57 return local_time_in_utc + time_until_next_start
58 end
59end
If the client isn’t in its maintenance window, it returns a “ask again” time of either the start of the window or six hours in the future, whichever comes first. Remember: the client doesn’t re-poll the server before this time; six hours is a reasonable choice between lowering network traffic and responsiveness to server-side schedule changes.
We’re able to build a number of other complicated policies just by implementing a class with this single next_action_at method. Over the last five years, it’s been quite flexible and durable.