Metrics
Data Model
Sampling Methodology
Taskobra’s data model both simple and robust. There are five main parts:
- Users
- Roles
- Systems
- Components
- Snapshots
- Metrics
Users
User representation is kept simple, with a username and OAuth token. Users can be assigned Roles at the site level or the system level.
Roles
There are two types of Role available to Users in taskobra. The first is Site Roles, which apply to the entire data model. The second is System Roles, which apply to the data for a particular System. There are four Roles different: Administrator, Observer, Reporter, and Owner.
Role | System/Host | Taskobra | Description |
---|---|---|---|
Administrator | ![]() |
![]() |
System/Host Administrators are System Observers and Reporters, and can also insert and update data in the System and Component tables for systems they own. Taskobra Administrators can affect any data in the data model, which means they can add/remove Users, Systems, Components, and read or write Snapshot data. |
Observer | ![]() |
![]() |
Taskobra Observers can read data from any tables. System/Host Observers can read data from the System, Component, and Snapshot tables. |
Reporter | ![]() |
![]() |
Reporters can insert data into the System, Component, and Snapshot tables. |
Systems
Individual Systems, or Hosts, in taskobra are represented by multiple tables.
The root is the System
table which contains an ID, the owner’s ID, and a system
name. Other information about the System is determined compositionally by querying
through the associative SystemComponent
table. The SystemComponent
table represents
a many to many relationship between System IDs and Abstract Component IDs.
Components
The gatekeeper to Component identification is the ComponentType
table, which contains
a mapping of Abstract Component ID to Concrete Component ID and Type. The Abstract
Component ID is a generic ID of the component without any knowledge of the type. The
Concrete Component ID is the ID of a Component of known type. There is a table for
each type of component that taskobra takes measurements of. These are indexed by
Concrete Component ID.
CPU | GPU | Memory | Network Adapter | Storage |
---|---|---|---|---|
Manufacturer | Manufacturer | Manufacturer | Manufacturer | Manufacturer |
Model | Model | Capacity | Type | Capacity |
ISA | Architecture | Timings | MAC Address | Maximum Write Rate |
TDP | TDP | Frequency | Maximum Send Rate | Maximum Read Rate |
Cores | Cores | Maximum Receive Rate | ||
Threads/Core | Memory | |||
Maximum Frequency | ||||
Minimum Frequency |
For each component type, there are defined metrics with specific formats. These metrics are each associated with a snapshot ID and Concrete Component ID.
Snapshots
Snapshots are a collection of metrics representing the host System’s state at a specific time. Each Snapshot contains a System ID and timestamp, and is associated with a set of records in the Metrics tables for each Component type. Because we periodically prune the backend database, snapshots need to be aware of the period of time they cover, which is represented as a base and an exponent. These are used during pruning to figure out how many data points to prune.
Metrics
Metrics are simple data points which contain a Snapshot ID,
Component ID, and some number of values, depending on their
specific format. Metrics are the building blocks of all the
views available in the Web Front-End. For example,
the total system CPU utilization can be computed by taking the
mean of the all CpuUtilization
metrics for a given Snapshot
across each Core and Thread in a system.
CPU Metrics
CPU Utilization | CPU Frequency | CPU Temperature |
---|---|---|
Snapshot ID | Snapshot ID | Snapshot ID |
CPU ID | CPU ID | CPU ID |
Core | Core | |
Thread | ||
Value | Value | Value |
GPU Metrics
GPU Utilization | GPU Temperature |
---|---|
Snapshot ID | Snapshot ID |
GPU ID | GPU ID |
Value | Value |
Memory Metrics
Memory Used | Memory Commit | Memory Paged |
---|---|---|
Snapshot ID | Snapshot ID | Snapshot ID |
Memory ID | Memory ID | Memory ID |
Value | Value | Value |
Network Adapter Metrics
Send Rate | Receive Rate |
---|---|
Snapshot ID | Snapshot ID |
Network Adapter ID | Network Adapter ID |
Value | Value |
Storage Metrics
Read Rate | Write Rate |
---|---|
Snapshot ID | Snapshot ID |
Storage ID | Storage ID |
Value | Value |
Pruning
When analyzing performance or debugging problems, down to the second data is important to have. On the other hand, looking at system and pool performance trends requires data to persist for long periods of time, but doesn’t need frequent samples. Rather than store a linear set of samples, taskobra can be configured to automatically compress aging data.
We use a logarithmic scale to calculate averages, combining multiple snapshots into one. This time period can be represented by a base and an exponent. Raw snapshot data generally is reported with an exponent of 0, a Reporter configurable rate and base. For example, the following snapshots are 1 second apart, with base 3 pruning.
Snapshot | User | System | Time | Rate | Base | Exponent |
---|---|---|---|---|---|---|
0 | 42 | 314 | 1000 | 1 | 3 | 0 |
1 | 42 | 314 | 1001 | 1 | 3 | 0 |
2 | 42 | 314 | 1002 | 1 | 3 | 0 |
After applying the pruning algorithm, the snapshot table would look something like this:
Snapshot | User | System | Time | Rate | Base | Exponent |
---|---|---|---|---|---|---|
3 | 42 | 314 | 1001 | 1 | 3 | 1 |
Pruned snapshots cover Rate*BaseExponent
seconds,
with a Timestamp in the center of that range, and contain an average
of all the snapshots they were generated from.
Properties of Pruning
- Lazy Evaluation, only requiring two snapshots at a time to be in memory
- Pruning the same data twice yields the same result
- Pruning the result of a prune yields the same result
- Two element merging satisfies associative, commutative, and identity properties
- Expects sorted data in descending time order.
- This means data should be queried from the database youngest to oldest
Metrics Monitoring
Using psutil
Python’s psutil
is a mature, well supported, cross-platform library that
provides system information and statistics. Statistic availability is
not the same across all platforms, for instance temperatures are only
available in Linux and FreeBSD, and fan speeds only in Linux and MacOS.
Our ORM is resilient to missing data, since snapshots are composed of
entries from many snapshot type tables. A missing entry simply means
the query returns no items, and presentation of that fact can be made
clear to users on the front end with a message like “No Data.”
In the future, extensions can be implemented to cover statistics across
more platforms by having fallback routines, or contributing back to
psutil
.
Using Open Hardware Monitor
Using the pythonnet module and the API provided by Open Hardware Monitor statistics such as temperatures and fan speeds can be retreived. It supports Windows 7/8/10 and all x86 Linux installations. Support for these statistics represents additional development effort and user setup time due to external dependencies.