The Recording Import Worker (RIW) is designed to import audio files and their respective metadata into the IV system via the VRX API. The standardised format for recordings and metadata is described in Call Recorder Connector Interface Technical Specification. It is the responsibility of any connectors using the RIW to prepare the metadata in this format.
The worker can be configured with a client-provided custodian file containing custodian details. This data is inspected and matched with participants listed in the call's metadata based on a pre-configured matching field.
If participant-custodian matches are identified, the custodian information is used to determine the language models most suitable for transcribing and processing the recording within IV. The recording is then imported into the IV system by making a request to VRX, with the language models, recording metadata and custodian information included in the import body.
If the worker is configured without a custodian file, then all recording will be imported without attempting to match any participants.
Installation
To integrate the Recording Import Worker into your IV installation, add the following configuration variable to the hosts.ini file:
recording_import_worker_enabled=true
Once installed, confirm the RIW is running by checking the output of the container logs:
docker logs recording_import_worker
This should display the following logs:
2024-01-15 10:21:51,575 recording_import_worker [1] [MainThread] [INFO ] GearmanRecordingImportWorker:41 <module> - Initialising gearman recording import worker... 2024-01-15 10:21:51,597 python3_gmtasks [1] [MainThread] [INFO ] __init__:149 serve_forever - Num workers: 1 of 1 2024-01-15 10:21:51,599 python3_gmtasks [7] [MainThread] [INFO ] __init__:223 _worker_process - Registering 6cafd49fd6ad.RecordingImportWorker.1 task recording-import
Configuration
Once installed, the RIW can be configured with the file config.yaml
. This configuration file allows for the worker specific settings and certain VRX import settings to be controlled, such as IV groups/users. It also allows for the 'recorders' and 'custodians' to be configured. These are discussed in the following sections.
When installed, the config file is mounted to be accessed by the RIW container. By default the locations are set to /usr/app/conf/config.yaml
within the Docker container, and {DATA_DIRECTORY}/recording_import_worker/config.yaml
on the host machine.
The config.yaml
file has the following configurable sections:
- worker
- vrx_api
- queues
- recorders
- custodians
Worker configuration
Configuration Variable | Description | Required | Default |
---|---|---|---|
worker_name | The name assigned to the worker. | No | RecordingImport |
worker_tasknames | A list of task names that the worker is responsible for processing. | Yes | - |
max_requeue_count | The maximum number of times a task can be requeued in case of failures. | No | 10 |
cutoff_time_interval | The time interval (in seconds) after which a task is considered as failed if not completed. | No | 30 |
For example,
worker: worker_name: "RecordingImport" worker_tasknames: "[recording-import]" max_requeue_count: 10 cutoff_time_interval: 30
VRX configuration
Config Variable | Description | Required | Default |
---|---|---|---|
user_id | The identifier for the user in IV. | No | 1 |
group_id | The identifier for the IV group. | No | 1 |
request_timeout | The maximum time (in seconds) the system will wait for a request to complete before timing out. | No | 500 |
api_root | The root endpoint for the VRX API. | No | /vrxServlet/v2 |
ca_cert | The file path to the CA certificate inside the container. | Yes | - |
For example:
vrx_api: user_id: 1 group_id: 1 request_timeout: 500 api_root: "/vrxServlet/v2" ca_cert: /usr/app/ssl/ca-cert.pem
Recorder configuration
The RIW can have any number of recorders configured in config.yaml
.
When a job is received by the RIW to import a recording, the job_data
will include a recorder-id
. This allows for multiple IV connectors to import recordings to IV through the same RIW instance, but with their own import settings and participant matching fields.
A recorder configuration will contain the following fields:
Config Variable | Description | Required | Default |
---|---|---|---|
id | Unique identifier for each recorder. | Yes | - |
custodian_file_id | Identifier for the custodian file associated with the recorder. | No | None |
import_unmatched_recordings | Import recordings regardless of successful custodian matching. | No | False |
participant_mapping.metadata_field | The participant field in the recording's metadata that will be used to match to a custodian. | Yes (if `custodian_file_id` supplied) | None |
participant_mapping.custodian_field | The custodian field to match a participant to. | Yes (if `custodian_file_id` supplied) | None |
iv_options.models | List of IV language model ids to use for importing. | Yes | - |
iv_options.diarization_enabled | Flag to enable or disable speaker diarization. | No | False |
iv_options.treat_all_files_as_single_channel | Treat all files as if they were a single channel. | No | False |
iv_options.sentiment.enabled | Enable or disable sentiment analysis. | No | False |
iv_options.sentiment.deconvolution_enabled | Enable or disable deconvolution in sentiment analysis. | No | False |
For example:
recorders: - id: recorder-1 custodian_file_id: custodians-1 import_unmatched_recordings: True participant_mapping: metadata_field: userName custodian_field: cloud9_username iv_options: models: - 1 - 2 - 4 diarization_enabled: True treat_all_files_as_single_channel: False sentiment: enabled: True deconvolution_enabled: True
Custodian Configuration
For the custodian specification, please see the following page with a standard format for both the custodian configuration and custodian file.
Queue Configuration
The worker queue names can be configured in config.yaml
with the following fields:
Config variable | Default | Description |
---|---|---|
import_queue | recording-import | This queue holds the job tickets for recordings that need to be imported. |
finished_queue | recording-finished | This queue receives tickets indicating a recording has been successfully imported without any errors. |
failed_queue | recording-failed | This queue receives tickets indicating a recording import has failed and could not be completed. |
For example,
queues: import_queue: recording-import finished_queue: recording-finished failed_queue: recording-failed
Environment variables
As with other IV components, the following env variables can be set in /opt/intelligent-voice/.env
. During installation, some of these variables can be set in the hosts.ini
file.
Env variable | Hosts.ini variable | Description | Required | Default |
---|---|---|---|---|
GEARMAN_SERVERS | - | List of Gearman servers to take jobs from , in the form [gearman1.example.com, gearman2.example.com]
|
No | [gearman.intelligentvoice.ivlocal:4730] |
RECORDING_IMPORT_WORKER_COUNT | - | Worker count | No | 1 |
IV_API_HOSTNAME | - | IV API hostname | No | vrx-servlet.intelligentvoice.ivlocal |
IV_API_PORT | - | IV API port | No | 8443 |
IV_API_AUTH_USR | iv_api_auth_user | IV API username | No | - |
IV_API_AUTH_PASSWD | iv_api_auth_passwd | IV API password | No | - |
LOGGING_LEVEL | logging_level | Logging level, can be set to ERROR, WARNING, INFO, DEBUG. | No | INFO |
Example: config.yaml
worker: worker_name: "RecordingImport" worker_tasknames: "[recording-import]" max_requeue_count: 10 cutoff_time_interval: 30 vrx_api: user_id: 1 group_id: 1 request_timeout: 500 api_root: "/vrxServlet/v2" ca_cert: /usr/app/ssl/ca-cert.pem recorders: - id: recorder-1 custodian_file_id: custodians-1 import_unmatched_recordings: true participant_mapping: metadata_field: userName custodian_field: cloud9_username iv_options: models: - 1 - 2 - 3 - 4 diarization_enabled: true treat_all_files_as_single_channel: true sentiment: enabled: true deconvolution_enabled: true - id: recorder-2 custodian_file_id: custodians-2 import_unmatched_recordings: true participant_mapping: metadata_field: userID custodian_field: redbox_user_id iv_options: models: - 1 - 3 diarization_enabled: True treat_all_files_as_single_channel: true sentiment: enabled: false vox: 6 - id: recorder-3 custodian_file_id: custodians-1 import_unmatched_recordings: false participant_mapping: metadata_field: userID custodian_field: email_address_list iv_options: models: - 2 - 3 - 4 diarization_enabled: false treat_all_files_as_single_channel: false custodians: - id: custodians-1 file: /usr/app/custodians/custodians-1.csv - id: custodians-2 file: /usr/app/custodians/custodians-2.xml queues: import_queue: recording-import finished_queue: recording-finished failed_queue: recording-failed
Starting and stopping
The service runs continuously within a docker container and can be started and stopped with these commands:
systemctl start recording_import_worker.service systemctl stop recording_import_worker.service