IV system consists of several services and workers. When some services failed, IV system may not work properly. Here describes the basic checking of IV system, and then some common issues with solutions.
- 1 IV Services and Containers
-
2 Operational FAQ
-
2.1 Checking containers health
- 2.1.1 Logging
- 2.2 Checking or restarting system services
- 2.3 Automatic Update enable/disable
-
2.4 Backup
- 2.4.1 Passwords
-
2.5 Database
- 2.5.1 Database Login
- 2.5.2 Managing API User
- 2.5.3 Purge Database
- 2.6 SSL Certificates
- 2.7 Process Queues and Sequences
-
2.1 Checking containers health
- 3 Common Installation Issues
-
4 Common Operation Issues
-
4.1 Services failures
- 4.1.1 ASR worker failed to startup
- 4.1.2 NGINX failed to startup
- 4.1.3 Sphinxsearch failed to startup
- 4.1.4 JumpToWeb returned 502 Bad Gateway
- 4.1.5 API Connection failed
- 4.2 Import Failures
-
4.3 Processing Queues
- 4.3.1 tagger_fail
-
4.1 Services failures
IV Services and Containers
Core services and workers:
Node |
Service Name |
Container Name |
Docker compose file name |
---|---|---|---|
app/all-in-one |
mariadb.service |
mariadb |
docker-compose.mariadb.yml |
app/all-in-one |
vrx_servlet.service |
vrx-servlet |
docker-compose.vrx_servlet.yml |
app/all-in-one |
gearman.service |
gearman |
docker-compose.gearman.yml |
app/all-in-one |
jumptoweb.service |
jumptoweb |
docker-compose.jumptoweb.yml |
app/all-in-one |
jumptoweb_api.service |
jumptoweb-api |
docker-compose.jumptoweb_api.yml |
app/all-in-one |
sphinxsearch.service |
sphinx |
docker-compose.sphinxsearch.yml |
app/worker/all-in-one |
watchtower |
watchtower |
docker-compose.watchtower.yml |
worker/all-in-one |
vad_worker.service |
vad_worker |
docker-compose.vad_worker.yml |
worker/all-in-one |
asr_worker.service |
asr_worker |
docker-compose.asr_worker.yml |
worker/all-in-one |
tritonserver.service |
triton_server |
docker-compose.tritonserver.yml |
Optional features:
Node |
Service Name |
Container Name |
Docker compose file |
---|---|---|---|
app/all-in-one |
gearman_stats.service |
gearman_stats |
docker-compose.gearman_stats.yml |
app/all-in-one |
elasticsearch.service |
elasticsearch |
docker-compose.elasticsearch.yml |
app/all-in-one |
opensearch.service |
opensearch |
docker-compose.opensearch.yml |
worker/all-in-one |
diarization_worker.service |
diar_worker |
docker-compose.diarization_worker.yml |
worker/all-in-one |
tagger_worker.service |
tagger_worker |
docker-compose.tagger_worker.yml |
worker/all-in-one |
sentiment_worker.service |
sentiment_worker |
docker-compose.sentiment_worker.yml |
worker/all-in-one |
transcript_summariser_worker.service |
transcript_summariser_worker |
docker-compose.transcript_summariser_worker.yml |
worker/all-in-one |
ocr_worker.service |
ocr_worker |
docker-compose.ocr_worker.yml |
worker/all-in-one |
lmbuilder_worker.service |
lmbuilder_worker |
docker-compose.lmbuilder_worker.yml |
worker/all-in-one |
fail_worker.service |
fail_worker |
docker-compose.fail_worker.yml |
worker/all-in-one |
voice_biometric_worker.service |
voice_biometric_worker |
docker-compose.voice_biometric_worker.yml |
Operational FAQ
Checking containers health
All components after IV 6.1 are dockerised and the health statuses can be checked by docker commands
docker ps -a
If a container is unhealthy, you can look further by checking health log or docker logs:
docker inspect -f '{{ range .State.Health.Log }}{{printf "%s\n" .}}{{end}}' <container_name>
Logging
Major logs for IV 6.1+ components are sent to container logs and IV installer by default sends all docker logs to syslog. Unlike IV 6.0 or before, there is no local files storing previous logs of the component. When the service is restarted, previous logs are gone.
View container logs:
sudo docker logs <container_name>
After running IV installer, the default docker logs are sent to syslog so the logs can also be seen in syslog (/var/log/syslog
for Ubuntu and /var/log/messages
for Redhat)
Checking or restarting system services
All components after IV 6.1 are dockerised using docker compose files, and most of the services start/stop can be managed by systemctl
.
To check service status:
sudo systemctl status <service_name>
To start/stop/restart a service:
sudo systemctl <start/stop/restart> <service1> <service2>
To enable/disable a service (enable means the service will start automatically after server reboot, disable is the reverse):
sudo systemctl <enable/disable> <service>
Automatic Update enable/disable
Watchtower is a service that will check for updates on existing container images and then update the containers automatically. IV installer enables this feature by default for regular security patch updates. This is not a system service so enabling or disabling is different from other services.
If you do not want automatic updates of container images, disable the watchtower.
-
For IV 6.1.X, watchtower is controlled by docker compose command:
sudo docker compose -f /opt/intelligent-voice/docker-compose.watchtower.yml down
-
For IV 6.2 onwards, watchtower is controlled by system service:
sudo systemctl stop watchtower sudo systemctl disable watchtower
To enable watchtower, run the reverse:
-
For IV 6.1.X, watchtower is controlled by docker compose command:
sudo docker compose -f /opt/intelligent-voice/docker-compose.watchtower.yml up -d
-
For IV 6.2 onwards, watchtower is controlled by system service:
sudo systemctl start watchtower sudo systemctl enable watchtower
Manual Update of container images
When watchtower is disabled, there is no automatic updates of container images and hence manual updates regular is necessary. Here are the steps to update and apply the new container images.
-
Stop all existing services
sudo systemctl stop <worker_services> sudo systemctl stop <app_services> sudo systemctl stop mariadb.service
-
Pull the latest container images
sudo docker pull <image_name>:<image_tag>
-
Start the services
sudo systemctl start mariadb.service sudo systemctl start <app_services> sudo systemctl start <worker_services>
Backup
Configuration files and IV databases are stored in /opt/intelligent-voice
:
-
docker compose files for each service
own by root
store all settings for IV containers (eg. container name, network setting, volume mapping etc.)
-
.env file
own by root
store all environment variables that IV containers need
-
data folder
own by 30000:30000
store files that each IV containers may need (eg. SSL certificates, model files etc.)
store environment specific files
-
database backup
sudo bash -c "docker exec -it mariadb mysqldump -uroot -p********** --all-databases --single-transaction | gzip > /backup/database_full_backup_`date +%F_%H%M`.sql.gz"
Installer files should also be backed up for upgrades in the future. Essential files that will be used in upgrades:
hosts.ini
files/ssl
generated-passwords.yaml
Passwords
When running the installer, there is a file generated automatically: generated-passwords.yaml
. You can also find the passwords in IV environment file (/opt/intelligent-voice/.env
) with similar names. Here are the details of these passwords:
Password types |
Description |
---|---|
drupal_mariadb_password |
Database password for drupal account (JumpToWeb/WebUI dashboard service connection to the database) |
iv_api_auth_passwd |
API password. Default API username: iv |
auth_db_password |
Database password for tomcat_admin account (jumptoweb_api connection to the database) |
webservices_mariadb_password |
Database password for webservices account (backend service vrx-servlet connection to the database) |
mysql_sphinxsearch_password |
Database password for sphinxsearch account (sphinxsearch connection to the database) |
tomcat_script_password |
Database password for root account |
gearman_password |
Database password for gearman account (queuing system connection to the database) |
ev_password |
Not in use |
drupal_admin_password |
JumpToWeb dashboard password. Default username: admin |
elastic_password |
Database password for elastic account (elasticsearch connection to the database) |
red_box_mariadb_password |
for specific IV service |
verint_key_password |
for specific IV service |
Database
Database Login
Usually it is not necessary to login to the database and it is not advised to modify any entries in the database, but some operations like create new API user may need to make changes in the database.
Command to login to the database:
sudo docker exec -it mariadb mysql -u <user> -p <database_name>
Database user (passwords can be found in .env file)
drupal: user account for jumptoweb service to connect to the database, limited access to drupal database
gearman: user account for gearman service to connect to the database, limited access to gearman database
sphinxsearch: user account for sphinxsearch service to connect to the database, limited access to obsilon database
tomcat_admin: user account for jumptoweb_api service to connect to the database, limited access to jumpto_admin
webservices: user account for vrx_servlet service to connect to the database, limited access to drupal and obsilon database
root: root account of the database
Database names
drupal: stores JumpToWeb dashboard required data
gearman: stores gearman queues
jumpto_admin: stores API access data
obsilon: main database of IV, stores all items processing and details
Managing API User
Add a new API User
Refer to the page Adding new API users
Disable an existing API User
Disabling an existing API user is basically changing the permission from allowed to something else. Steps are as below:
Login to the database
-
Change the permission of the specific user
update jumpto_admin.user_roles set role_name = 'disabled-users' where user_name = 'myUser';
Verify the API user login
Re-enable an API User
Re-enabling an API user is basically changing the permission to allowed. Steps are as below:
Login to the database
-
Change the permission of the specific user
update jumpto_admin.user_roles set role_name = 'auth-users' where user_name = 'myUser';
Verify the API user login
Purge Database
Usually IV does not suggest to purge the database because it would lost all records that are already processed or pending. If it is advised or you want a clean environment to start everything over, here are the steps:
-
Stop all IV services
sudo systemctl stop <worker_services> sudo systemctl stop <app_services> sudo systemctl stop mariadb.service
-
Delete the database folder
sudo rm -rf /opt/intelligent-voice/data/mariadb/mysql/
-
(if necessary) Delete cache or import files in the disk
sudo rm -rf /opt/intelligent-voice/data/vrx-servlet/cache/* sudo rm -rf /opt/intelligent-voice/data/vrx-servlet/downloadedFiles/*
-
Start all IV services
sudo systemctl start mariadb.service sudo systemctl start <app_services> sudo systemctl start <worker_services>
SSL Certificates
The default SSL certificates installed by IV installer is self-signed certificates. To change them to any specific certificates or with specific domains for security reasons, you need three files to install a new certificate.
A private key (could be in .pem and .key)
A certificate file (could be in .pem and .crt)
A CA file (also called a chain file or bundle file)
The CA file must be in PEM format. If you receive these files in a different format they can be converted using the openssl tool installed on the IV server like this:
openssl x509 -in ca-cert.crt -out ca-cert.pem
The easiest approach is to create a CA file containing all the certificates in the chain (root and any intermediates), even if the root CA is already contained in the OS trusted cert store. This is because:
some of the IV workers trust only the content of the CA file, not the OS trusted cert store
The default IV install configures the servers like tomcat to serve the ca-cert.pem file for intermediates
From installer v4.0 onwards, nginx service is installed by default which works as a reverse proxy for trusted SSL certificates. For external connection to IV system (eg. JumpToWeb dashboard and API), the default port is port 443 (HTTPS).
The certificates for external connections (UI or API) are in /opt/intelligent-voice/data/ssl/nginx
. The SSL certificates could be in .pem or .crt and must be in a pair with private key file. You can use the following commands to verify the certificates:
openssl verify -CAfile /opt/intelligent-voice/data/ssl/nginx/ca-cert.pem /opt/intelligent-voice/data/ssl/nginx/server-cert.pem openssl x509 -in /opt/intelligent-voice/data/ssl/nginx/server-cert.pem -noout -modulus | openssl md5 openssl rsa -in /opt/intelligent-voice/data/ssl/nginx/server-key.pem -noout -modulus | openssl md5
To use the newly added certificates, you may also need to change the configuration file /opt/intelligent-voice/data/nginx/site.conf
.
cp server-cert.pem /opt/intelligent-voice/data/ssl/nginx cp server-key.pem /opt/intelligent-voice/data/ssl/nginx chown 30000:30000 /opt/intelligent-voice/data/ssl/nginx/*.pem vim /opt/intelligent-voice/data/nginx/site.conf
Process Queues and Sequences
IV uses gearman to queue the jobs sending to workers. The queue name is the same as the worker name. Here are short description about the queues:
vad: Voice Activity Detection, separate voice activities from noises
-
diar: Diarization, separate sentences from different speakers
Job run condition: vad complete
-
asr: Speech Recognition, transcribe detected speech into texts
Job run condition: vad complete
-
tagger: Tagging, extract topics or tags from the transcript
Job run condition: asr complete
-
sentiment: Sentiment detector, calculates sentiment scores from the transcript
Job run condition: asr complete
-
transcript_summariser: extract summary from the result transcripts
Job run condition: asr complete
vrx-healthcheck: Testing jobs could be created in the gearman queue. When consumed normally there is no tasks carried out and no jobs in the queue
*_fail: When worker encounters error and returns fail to IV, the jobs are put to fail queue
Common Installation Issues
Pre-requisites packages installation
These are some common issues that IV was seeing during installation and the corresponding actions taken, they may not be up to date.
Installing Ansible 2.14+ on Ubuntu 20.04LTS
The default Python running on Ubuntu 20.04LTS does not support Ansible 2.14+ so the first thing to do is installing Python 3.9+ and then install ansible under the python
sudo apt install python3.9 python3-pip python3.9 -m pip install --user ansible
Upgrading Ansible using pip
pip3 install --upgrade --user ansible
Installing Ansible collection for community.docker
To install community.docker collection, here is the suggestion:
ansible-galaxy collection install community.docker
Installing Ansible collection for ansible.posix
To install ansible.po collection, here is the suggestion:
ansible-galaxy collection install ansible.posix
Installing Docker Engine
Official page from docker for installing docker engine: https://docs.docker.com/engine/install/
Installing GPU drivers
Nvidia reference for GPU driver: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#overview
Nvidia reference for nvidia-container-runtime: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Running the Ansible playbook
Missing Ansible collections
If community.docker is missing in Ansible collection, there is an error message like below:
ERROR! couldn't resolve module/action 'community.docker.docker_login'. This often indicates a misspelling, missing collection, or incorrect module path. The error appears to be in '/media/install/installer-v4.0/install.yml': line 591, column 9, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: Log into private registry ^ here
To fix this, check the section Installing Ansible collection for community.docker
Missing variables
If the playbook prompts any missing variables like 'default_email_address' is undefined
, please check or add that in the configuration file (hosts.ini
). Here is the example of missing default_email_address:
fatal: [app -> localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'default_email_address' is undefined. 'default_email_address' is undefined\n\nThe error appears to be in '/media/install/installer-v4.0/install.yml': line 173, column 13, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Generate CA key and certificate\n ^ here\n"}
List not recognised
If there is a fatal about Invalid data passed to ‘loop'
when running the playbook, it’s likely the Ansible version is below 2.14. In this case, please upgrade the ansible version.
fatal: [iv-proc1]: FAILED! => {"msg": "Invalid data passed to 'loop', it requires a list, got this instead: [AnsibleUndefined, AnsibleUndefined, AnsibleUndefined, AnsibleUndefined]. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."}
Common Operation Issues
Services failures
ASR worker failed to startup
ASR worker service is the core worker for transcription and it must be up and running for transcribing recordings. Here are some common reasons that ASR worker failed to start:
-
Unable to connect to GPU driver
-
Check the GPU driver with
nvidia-smi
, it should return the GPU driver version and current status as below. If not, please try updating the GPU driver.+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla T4 On | 00000001:00:00.0 Off | Off | | N/A 69C P0 73W / 70W | 6849MiB / 16384MiB | 100% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
-
Check the configuration between docker service and GPU driver (
/etc/docker/daemon.json
)"runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } }, "exec-opts": ["native.cgroupdriver=cgroupfs"]
-
-
Misconfigured to enable GPU for server without GPU
If the server does not have GPU card but the asr_worker is configured for GPU, then the service would fail to start up due to unable to find GPU-
Check configuration file (
/opt/intelligent-voice/.env
) for GPU configurationGPU_ENABLED=false ASR_DOCKER_RUNTIME=runc
-
NGINX failed to startup
-
Port is already allocated
This error is usually found before starting the container. Check command:systemctl status nginx.service
. When the port is already allocated for another service, this service would fail to bring up. For port 8443, please reserve this forvrx-servlet
andnginx
service uses port 443.Error response from daemon: driver failed programming external connectivity on endpoint nginx (f00ec93a2df40e7308e496269bde1ae08df372ff0ab7a8a31a4583b77bc37abe): Bind for 0.0.0.0:8443 failed: port is already allocated
-
Missing entries for containers in
iptables
Usually there are several ACCEPT rules for docker containers external connectivity. Check command:sudo iptables -L
. If the rules are not there, please stop all running services and then restart docker services.Chain DOCKER (3 references) target prot opt source destination ACCEPT tcp -- anywhere 172.18.0.2 tcp dpt:ies-lm ACCEPT tcp -- anywhere 172.18.0.3 tcp dpt:sphinxapi ACCEPT tcp -- anywhere 172.18.0.3 tcp dpt:sphinxql ACCEPT tcp -- anywhere 172.18.0.7 tcp dpt:mysql ACCEPT tcp -- anywhere 172.18.0.11 tcp dpt:teradataordbms ACCEPT tcp -- anywhere 172.18.0.11 tcp dpt:vcom-tunnel ACCEPT tcp -- anywhere 172.18.0.11 tcp dpt:irdmi ACCEPT tcp -- anywhere 172.18.0.12 tcp dpt:https ACCEPT tcp -- anywhere 172.18.0.12 tcp dpt:81 ACCEPT tcp -- anywhere 172.18.0.13 tcp dpt:pcsync-https ACCEPT tcp -- anywhere 172.18.0.13 tcp dpt:webcache ACCEPT tcp -- anywhere 172.18.0.14 tcp dpt:gearman
-
Insufficent permissions for nginx container to access the certificates
After installing a new certificate, please make sure the container has enough permission to read the files.# ls -l /opt/intelligent-voice/data/ssl/nginx/ total 40 -r--r--r-- 1 30000 30000 1359 Oct 23 2023 ca-cert.pem -r--r--r-- 1 30000 30000 899 Jun 27 17:00 intelligentvoice.dev.crt -r--r--r-- 1 30000 30000 227 Jun 27 17:00 intelligentvoice.dev.key
Sphinxsearch failed to startup
If sphinxsearch.service failed to start, here are possible reason:
-
Permissions of the data files (
/opt/intelligent-voice/data/sphinxsearch/sphinxsearch/
) not allowing docker to access.
Solution: change data files owner to user 30000 and group 30000sudo chown -R 30000:30000 /opt/intelligent-voice/data/sphinxsearch/sphinxsearch/
JumpToWeb returned 502 Bad Gateway
This is usually due to nginx container cannot connect to vrx-servlet container. One of the reason is vrx-servlet restarted due to IV nightly patch but nginx is not restarted. It could be solved by restarting nginx.service and avoided by adding dependencies of vrx-servlet container to nginx container.
Adding dependency to nginx:
-
Edit
/opt/intelligent-voice/docker-compose.nginx.yml
labels: com.centurylinklabs.watchtower.depends-on: "vrx-servlet"
-
Restart nginx.service
sudo systemctl restart nginx.service
API Connection failed
Most common reason is nginx container cannot connect to vrx-servlet container. To fix this please check the section JumpToWeb returned 502 Bad Gateway
Import Failures
Item Errors on JumpToWeb page
When you go to the report page in JumpToWeb, the item is marked Item Errors and without a retry button. This is likely the file could not be found inside the vrx-servlet container.
To confirm, please try to list the file inside the container (here taking /mnt/import/test.wav
as an example):
sudo docker exec -it vrx-servlet ls -l /mnt/import/test.wav
If the file does not exist inside the container, please check whether the folder is mounted to the container in /opt/intelligent-voice/docker-compose.vrx_servlet.yml
- "/mnt/import/:/mnt/import/:z"
Or otherwise, please copy the file to the mounted folder (/opt/intelligent-voice/data/vrx-servlet/import/
) and import as /import/test.wav
.
sudo cp /mnt/import/test.wav /opt/intelligent-voice/data/vrx-servlet/import/test.wav sudo chown 30000:30000 /opt/intelligent-voice/data/vrx-servlet/import/test.wav
If the folder is mounted to the container but the file does not exist inside the container, it may happen that the folder was mounted after the container was created. In this case, please restart the container. You may also delay the startup of vrx_servlet.service
to after the folder mount.
Processing Queues
tagger_fail
A common issue for tagger_fail is unable to connect to the sphinxsearch
service. Tagger worker needs sphinxsearch service to add topics to the database. If this connection failed, tagger worker will return fail to the job.
After tagger_worker is fixed, you may need to requeue the jobs with MoveWorker: moving jobs from tagger_fail
queue to tagger
queue.