The Intelligent Voice system is a batch processing system designed to process hours of audio recordings as efficiently as possible. We also offer systems designed for real-time transcription, low latency keyword spotting, IoT devices and embedded applications and more - for details on requirements for these please contact us.
Minimum Requirement: single Virtual Machine with GPU acceleration
Intelligent Voice can be installed in a single Virtual machine with a small amount of GPU acceleration. Suitable applications include:
- Low volume batch processing (using NASRv5+ language modes):
- Up to 400 hours of audio recordings per day for single language runs
- Up to 200 hours of audio recordings per day for dual language runs
- Functional system evaluation and compatibility testing
- Software development and integration testing
- Test and QA systems
The minimum requirements are:
- 8 or more x86 vCPUs (with AVX, Intel "Sandy Bridge" or later / AMD "Bulldozer" or later)
- at least 64Gb RAM
- at least 500GB storage*
- at least 1 GPU card**, recommended to use either an NVIDIA L4 or NVIDIA Tesla T4 card
*If the disk is partitioned, it must have sufficient space mounted under /var/lib for container images (320GB if all features are installed) and under /opt for language models (40GB for general data and approximately 1.5GB for each model for models before NASRv5.1).
**Please note, if Diarization (speaker separation) is required, please add one more GPU card to the above. GPU spec to match the above.
Operating systems supported:
- Red Hat Enterprise Linux 9 (recommended) or 8
- Ubuntu 22.04 LTS (recommended) or 20.04 LTS
- Oracle Linux 8
Additional vCPUs, GPUs and storage are required to increase performance.
For help on sizing larger installations please see below or contact us.
High Performance GPU Single Server
To get the best performance from Intelligent Voice we recommend servers with NVIDIA GPU cards.
An example system specifications suitable for production systems:
- 1 x AMD EPYC 9124 3.0Ghz 16-core CPU
- 128 GB RAM
- 2 x 1TB SSD
- 2 x NVIDIA L4 GPU
An example server spec for larger installations:
- 2 x AMD EPYC 75F3 2.95GHz 32-core
- 512GB RAM
- 2 x 2TB NVMe SSD
- 4 x NVIDIA Tesla A100 80GB
Additional storage will be required for storage of audio data and outputs, with the size depending on audio / video file formats, and the required retention period.
High Performance GPU Multiple Servers
Intelligent Voice can scale over any number of GPU servers. An example system for higher volume processing:
1 x application server:
- 2 x AMD EPYC 75F3 2.95GHz 32-core
- 512GB RAM
- 2 x 2TB NVMe SSD
5 x GPU processing node servers:
- 1 x AMD EPYC 9124 3.0Ghz 16-core CPU
- 128 GB RAM
- 2 x 1TB SSD
- 8 x NVIDIA A2 GPU
Larger Clusters & Geographical Distribution
Intelligent Voice can scale to millions of audio hours per day and has options of synchronizing over multiple geographic regions - please contact us for information