The Intelligent Voice system is a batch processing system designed to process hours of audio recordings as efficiently as possible. We also offer systems designed for real-time transcription, low latency key word spotting, IoT hub devices and embedded applications and more - for details on running these solutions on Azure please contact us.
Single VM for Evaluation / Lab Use
A single VM is recommended for evaluation and lab use.
For a basic installation of Intelligent Voice 6 with GPU acceleration on AWS we recommend a single GPU instance type with 128GB RAM and 4 CPUs. The minimum storage requirement is 500GB.
Operating systems supported:
- Red Hat Enterprise Linux 9 (recommended) or 8
- Ubuntu 22.04 LTS (recommended) or 20.04 LTS
- Oracle Linux 8
To install on a single VM the g4dn.8xlarge type is recommended.
Multiple VMs for Production Use
Installing on multiple VMs is recommended for production use, to improve resilience and scalability, and to reduce costs.
Production Deployment Example for 10,000+ Hours a Day
An example of a full production system deployment with autoscaling.
This system uses a single application server VM, with Autoscaling Groups deploying AMIs for all the compute instances. The current state of the IV job queue is sent to Cloudwatch and scaling rules created based on the number of jobs.
The database and file store can optionally use Amazon RDS for MariaDB and S3, to support high availability configurations and/or cross-region replication.
This solution is ideal to process 10,000 - 100,000 hours of audio per day. To scale up to 1,000,000 audio hours per day or more, multiple application servers can be run with a load balancer, or traffic can be sharded across multiple IV systems.
This diagram below shows how to configure the system with 9 VM scale sets supporting all optional features.
If you don't use some of the features, they don't need to be installed. For example, a Relativity integration does not require Sentiment, Voice Biometrics or Text from Video.
Note the following dependencies:
-
ASR, Diarization and Voice Biometrics require VAD
-
Summarization and Sentiment require ASR
-
Tagger requires ASR or VideoOCR
-
Voice Biometrics requires Elasticsearch
-
JumpToWeb requires Sphinxsearch (on app server)