Jim Zajkowski

Jamf and the Case of the Exploding Database

May 19, 2023

“Huh, that’s weird, why did mysqldump break? When did the DB get so big?” I asked aloud. My cat gave me an annoyed look for interrupting her nap, turned over, and fell right back to sleep.

“Wait, does this database actually have, uh, is that, 1.3 billion extension attribute rows!? 240 gigabytes? of EAs?! What the heck!”

 1+-----------------------------------+------------+--------------+--------------+------------+
 2| TABLE_NAME                        | table_rows | data_length  | index_length | Size in MB |
 3+-----------------------------------+------------+--------------+--------------+------------+
 4| extension_attribute_values        | 1367505492 | 124721840128 | 127488458752 |  240526.48 |
 5| event_logs                        |  182470223 |  28996288512 |  12575621120 |   39646.06 |
 6| log_actions                       |  185294463 |  24295505920 |   5252284416 |   28178.97 |
 7| policy_history                    |   38554225 |   3448963072 |   4084613120 |    7184.58 |
 8| logs                              |   39372868 |   2586886144 |   3100901376 |    5424.30 |
 9| hardware_reports                  |   18167461 |   2960146432 |    391643136 |    3196.52 |
10| operating_systems                 |   27521885 |   2778710016 |    490717184 |    3117.97 |
11| reports                           |   18804657 |   1035616256 |   1672773632 |    2582.92 |
12| object_history                    |    9148848 |   1548763136 |    904937472 |    2340.03 |
13| mobile_device_management_commands |     342277 |    467894272 |   1638137856 |    2008.47 |
14+-----------------------------------+------------+--------------+--------------+------------+
report sql
1SELECT table_name, table_rows, data_length, index_length,
2    round(((data_length + index_length) / 1024 / 1024),2) "Size in MB",
3    round((data_free / 1024 / 1024),2) "Free in MB"
4FROM information_schema.TABLES
5WHERE table_schema = "jamf_production" ORDER BY (data_length + index_length) DESC limit 10;

It turns out that every time your client “submits inventory” or you write an EA value to Jamf via an API, Jamf constructs a “report” by creating a parent row in the reports table and a number of rows in other tables: fonts, hardware_reports, hard_drive_partitions, hard_drives, extension_attribute_values, and a few more. This is typical database normalization.

Even if you only change is a single EA value, Jamf duplicates every row from all of these tables. That means if you have, say, 50 extension attributes, every API write for just one means 50 more rows get created: 49 with the old value and 1 with the new.

We run a web app to manage Munki and Jamf for our field staff, and it writes EAs to trigger scope changes for policies. Our app compares the existing EA value before trying to change it, but it still generated hundreds of extra writes - or in one very bad case, 55,000. 55,000 reports times 50 EAs is 2.75 million EA rows.

Our app was making these writes because Jamf was (correctly) stripping off spaces on the value provided, but the app’s data source had an extra space on the end. That is, it was comparing "Unit Name " with "Unit Name" and, seeing they were different, would write the EA back as "Unit Name ". Jamf would remove the space, repeating this cycle forever. We had a similar issue with Jamf returning values as strings but the app compared to an integer and got a type mismatch.

Jamf Support suggested we clear this out by trying to flush the reports table at increasingly short intervals, starting at the longest option and working down to one week. The problem is Jamf’s reporting flushing logic executes a DELETE FROM table WHERE report_id IN ([list of 5,000 IDs]). MySQL is particularly bad at this kind of query and it tends to bog down the entire table, causing any pending INSERTs or UPDATEs to hang. Our Jamf instance is pretty busy - about 15,000 endpoints - and so we quickly ran out of hanging database connections. After enough delay, Kubernetes notices Jamf has stopped handling web requests and it gets force restarted.

If the table is indexed property, a single-row DELETE .. where id = x executes in milliseconds and doesn’t bog down other queries.

I reversed out the methods Jamf uses to actually flush the database and wrote a few small utilities in Go to execute millions of row-level deletes. It took about two weeks to finish the deletions, but we have the database down to a reasonable size and it didn’t mean breaking our production environment while hoping Jamf’s manual flush worked.

I hope to release those tools and a garbage collector for Jamf DBs after they are cleaned up - they’re pretty hacky - soon.