The Intelligent Voice system is a batch processing system designed to process hours of audio recordings as efficiently as possible. We also offer systems designed for real-time transcription, low latency key word spotting, IoT hub devices and embedded applications and more - for details on running these solutions on Azure please contact us.
Minimum Requirements
A basic installation of Intelligent Voice 5 with GPU acceleration on Azure requires a single GPU instance type with 128GB RAM and 4 CPUs.
At time of writing in August 2022, the Standard_NC16as_T4_v3 type is recommended. Other supported sizes:
- Standard_NC64as_T4_v3
- Any NCv2-series
- Any NCv3-series
Intelligent Voice 5 has full support for installation on Red Hat Enterprise Linux 7 and Ubuntu 20.04
The minimum storage requirement is 500GB.
Production deployment example
NOTE: see also the AWS page for more details on possible cloud configurations
An example of a full production system deployment with autoscaling.
This system uses a single application server, with an Virtual Machine Scale Set deploying processing node images from a Shared image gallery. The current state of the Gearman Queue is sent to Application Insights using Telegraf and scaling rules created based on the number of jobs.
This solution is ideal to process around 10,000 hours of audio per day. To scale down the solution below 5000 audio hours per day, install the Triton inferencing component on the application server. To scale up the solution to 100,000 audio hours per day, change the instance type of Triton to add GPU support and increase the size of the application server as described below. To scale up to 1,000,000 audio hours per day or to add High Availability, add additional application servers with a load balancer.
App server
Standard B12ms (12 vcpus, 48 GiB memory)
To scale beyond 10,000 hours per day increase the CPU allocation (e.g. Standard_D16as_v5, then Standard_D32as_v5)
Azure images: Red Hat Enterprise Linux 7.8 or Ubuntu 20.04 LTS (with latest updates applied before starting install)
OS Disk: 500GB+ Premium SSD LRS. An example partitioning scheme for RHEL using 610GB:
Filesystem Size Mounted on
/dev/mapper/rootvg-rootlv 40G /
/dev/mapper/rootvg-usrlv 28G /usr
/dev/sda2 494M /boot
/dev/mapper/rootvg-optlv 190G /opt
/dev/sda1 500M /boot/efi
/dev/mapper/rootvg-homelv 6.0G /home
/dev/mapper/rootvg-varlv 150G /var
/dev/mapper/rootvg-tmplv 35G /tmp
(tomcat8 temp files are in default location /opt/apache-tomcat-8.5.61/temp/)
Mount a share or container on the filesystem under /data (see Microsoft guides Mount SMB Azure file share on Linux or How to mount Blob storage as a file system with BlobFuse)
Inference server
Standard B12ms (12 vcpus, 48 GiB memory)
Not required below 5000 audio hours per day - install the component on the app server. From 10-50k audio hours per day use Standard_NC4as_T4_v3, then scale up as described in NVIDIA document Scaling Triton Inference Server.
Azure Database for MariaDB
General purpose Compute Gen 5 Memory optimised with 2 vCPUs, 20GB RAM
This is suitable for 10k-50k hours per day.
Not recommended for 5000 hours per day or fewer - use the default install on the app server instead
Use larger instance sizes to scale up
ASR images
This should be the lowest spot price T4 instance type if available in your region, or else the lowest spot price from other compatible GPU types (listed above). Typically this will be:
All other images
VAD, Tagger, OCR, Diarization, LexiQal Sentiment, LexiQal Credibility images should all be the lowest full utilization (not burst) x86-64 spot price instances meeting the minimum requirement of 2vCPUs and 4GB RAM. Typically this will be:
Standard_F2s_v2
In some regions other instance types such as Standard_D2pls_v5 or Standard_D2as_v5 might have lower spot prices.
Comments
0 comments
Please sign in to leave a comment.