This change fixes the condition when the first record after telempostd
starts is not inserted in the telemetry journal.
Notice! there is a change in the logic, while previously records
where inserted to juornal as soon the record was processed. After this
change records will be inserted in the journal only when successfully
delivered or record_server_delivery_enabled is set to false.
New Linux kernels detect/interpret/display BERT errors in klog.
This makes bertprobe unnecessary, as all the information can
be simply grabbed from klog. This also removes the need to
encode the binary data into HEX/ASCII, so all the encoding
code can be removed as well, including the test suite.
The change was implemented by adding a new pattern for BERT
in the oops_parser.c with some additional minor changes.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
telemetrics-client installation starts when the package is installed,
this change makes sure that to start telemetry the first time two steps
are needed: 1- telemctl opt-in and 2- telemctl start
Signed-off-by: Alex Jaramillo <alex.jch@gmail.com>
Build is broken for multiple binaries when configured for logging
to systemd journal:
$ ./configure --enable-logtype=systemd
$ make
All binaries that use the routine "telem_log" must link to additional
libraries when logging to systemd journal.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
Replaced strncpy with memcpy and added some buffer overflow checks in
order to avoid GCC9 compiler warning:
warning: ‘__builtin___strncpy_chk’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
While in there, removed one unused global variable and declared the remaining
global variables as static.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
The telempostd and telemprobd daemons both used identical
name "initialize_daemon" for daemon initialization.
Although the name was identical, the code for each daemon initialization
was different, leading to some potential confusion.
Also moved telemprobd specific code "stage_record" from iorecord.c to
telemdaemon.c
Modified local.mk accordingly.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
Have a default configuration set in the binary, with conf files only
being used to change from the defaults. This allows making simpler
configs which only toggle specific options, which may be useful for user
configuration or testing. See src/data/example.2.conf for an example of
a new simplified config.
By default the telemetrics-client is configured with
https://clr.telemetry.intel.com/v2/collector as the built-in server
location. This is the normal Clear Linux telemetrics backend. It can
be changed via the configure flag --with-backendserveraddr=URI. This
allows anyone else using this project to easily specify their own
default backend without patching. As per usual, users can specify a
different server location in a configuration file.
Signed-off-by: California Sullivan <california.l.sullivan@intel.com>
Most probes allow passing of non-default configuration file
via command line using the "-f" switch.
For example:
$ hprobe -f custom_cfg_file.conf
If the file "custom_cfg_file.conf" contains
server = http://<my backend server>
one would expect the hprobe payload will be sent to the server
http://<my backend server>. However, this is not the case, as the
various daemons delivering the payload to the backend are blissfully
unaware of the of the config file hprobe wanted to use.
The solution is to include the absolute path of the config file
specified on the command line as part of the payload. Once the
payload is about to be sent via the routine "post_record_http",
the routine checks if a non-default config file was requested.
If so, configuration is re-initialized with the file.
Upon exit, the routine re-initializes the original (default) configuration.
The non-default file may not exist at the send time anymore,
for example when sending some spooled records. In that case we
intentionally don't send anything to the backend.
If the record does not contain the optional configuration file
information, it's business as usual.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
We don't need to stay resident for a static two hours. Instead, exit
cleanly after five minutes idle. This also involves reducing the default
and maximum values of spool_process_time, as it was previously set to 30
minutes.
telempostd required several changes with timers, whereas telemprobd only
required changing the values of spool_process_time and
TM_DAEMON_EXIT_TIME, and refreshing the timeout when handling a client.
Signed-off-by: California Sullivan <california.l.sullivan@intel.com>
The previous sequence of operations was:
1. klogscanner waits for/monitors kernel buffer for messages.
2. If they come, it creates a "raw" file in /var/cache/telemetry/oops.
3. Once this file is placed into /var/cache/telemetry/oops, it is detected by oopsprobe.
4. oopsprobe parses the "raw" file, creates new payload and and sends it to the backend.
This commit merges the operations:
1. klogscanner waits for/monitors kernel buffer for messages.
2. If they come, parses the klog buffer, creates new payload and and sends it to the backend.
While at it, also declare routines as "static" when possible.
Fixed a few missplaced memory freeing.
Simplify the main loop operations, move allocate/free buffer outside
of the loop.
Return SUCCESS if terminated by the signal SIGTERM. This is a clean
way to stop the service. This prevents spamming the journal (and sending
journal/error telemetry data via journalprobe) each time telemetry is
restarted.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
This change alters the logic of post telemetry message to backend to
handle cases when the network or name resolution service is not
available when the daemon starts. Currently if a message is not
delivered it will be spooled and it should have to wait 900 seconds
(default value) for a future loop to process this message. With this
change when the daemon is started and is unable to deliver telemetry
messages it will atempt to re-send those messages using a t*t delay in
seconds up to t = 7 (or 1, 4, 9, 16, 25, 36, 49 seconds).
Signed-off-by: Alex Jaramillo <alex.v.jaramillo@intel.com>
This change adds a daemon that handles telemetry record retention,
record reporting, and record limiting policies. This new daemon
receives records in a folder that is monitored by inotify.
Signed-off-by: Alex Jaramillo <alex.v.jaramillo@intel.com>
This realease contains changes to add source file references as a
workaround for failing libcheck tests. This changes allows telemetry
client to be built using gcc8. This has to be fixed properly in the
future.
Signed-off-by: Alex Jaramillo <alex.v.jaramillo@intel.com>
This change fixes a bug where journal unit tests fail if the system
where the code is build has telemetry running.
Signed-off-by: Alex Jaramillo <alex.v.jaramillo@intel.com>
This change contains:
* New configuration keys: record_retention_enabled and
record_server_delivery_enabled. These keys are needed to control
remote delivery of records and record retention. These keys are
optional to preserve backward compatibility with existing custom
configurations.
* Record copy implementation. This change allows to save copies of
records locally when feature is enabled in configuration. This
operation is independent of record spooling and record reporting
to remote server.
* New telem_journal argument to allow record payload print from
local copy (if it exists).
Signed-off-by: avjarami <alex.v.jaramillo@intel.com>
Addressing comments from first code review and fixing travis-ci
check_journal error in prune test.
Signed-off-by: avjarami <alex.v.jaramillo@intel.com>
* An event_id header is needed to group records when multiple records
are generated by same event.
* Adding new available parameter to telem_record_gen, making possible
for this utility to tag multiple records with same event_id.
Signed-off-by: avjarami <alex.v.jaramillo@intel.com>
Additional headers added: board_name, cpu_model, and bios_version.
* Board name is a combination of board_name and board_vendor from
dmi file system.
* CPU model is read from /proc/cpuinfo.
* BIOS version is taken from dmi file system.
When frame addresses detected during the stack scan were not previously
found by unwinding, the string "? " is added as a prefix to the function
name.
However, the current oops parsing code strips "? " if encountered, so
the backtraces from kernel oopses are missing vital information; the
presence of "? " provides a hint for debugging a stack trace and
indicates that the frame info is "unreliable".
This commit removes the "? " strip code, ensuring the prefix is retained
by the function name, and updates unit tests that check for oops lines
that should contain the prefix.
Signed-off-by: Patrick McCarty <patrick.mccarty@intel.com>
The user may want to disable recycling (daemon auto exit).
This can be the case if telemd daemon is not configured
to respawn or if service is automatically restarted by
the system after package update.
This patch introduces the daemon_recycling_enabled config
entry.