Internet fetch support

Vocalizer supports fetching many types of data from web servers and local file systems:

  • Input text when using the Speech Server
  • Digital audio recordings
  • User dictionaries
  • Rulesets
  • ActivePrompt databases (including digital audio recordings referenced by these databases)

For these fetches, Vocalizer supports the HTTP/1.1 protocol with the following features:

  • HTTP access for unencrypted requests
  • HTTPS access using the popular OpenSSL open source toolkit, supporting the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) network protocols. Vocalizer uses the OS-supplied OpenSSL libraries on Linux (it gets dynamically loaded using dlopen), but uses its own build of the OpenSSL libraries on Windows.
  • FILE access for unencrypted local file system access
  • Configurable disk cache for http and https access with HTTP/1.1 compliant caching policies
  • Configurable cookie support for http and https access
  • Configurable base URLs for http and https access
  • Configurable timeouts for http and https access
  • Proxy server support for http and https access
  • Configurable file extension to MIME content type mapping rules for file:// access

Web server configuration

Applications use web servers to return a MIME content type for each fetched document. Most web servers require configuration to support all file extensions commonly used with Vocalizer.

In addition, web servers control the caching policy for all data fetched via HTTP/1.1 response headers. Most web servers require configuration to avoid problems such as re-fetching data every time it is used.

Configuring MIME content types

Vocalizer relies on the receipt of correct MIME content types from web servers. To avoid problems, configure all web servers to return the proper MIME content types for http and https access.

When the web server does not support a file extension used by Vocalizer, it returns one of the following content types:

  • application/octet-stream (HTTP/1.1 compliant method)
  • text/plain (not HTTP/1.1 compliant)

In both cases, the incorrect type leads to incorrect handling of data. Vocalizer might speak SSML markup as plain text, fail to do an audio insertion, or fail to load tuning data.

System administrators must configure web servers to return MIME content types as described in the following table. For local file system access, use the inet_extension_rules parameter to map file extensions to MIME content types. This table shows the default mappings. If applications use different file extensions, update the web server and inet_extension_rules configurations.

MIME content type

Vocalizer data type

Recommended file extension

application/edct-bin-dictionary

Binary format user dictionary

.bdc or .dcb

application/ssml+xml SSML input text .ssml (or.xml)

application/synthesis+ssml

SSML input text

.ssml

application/x-vocalizer-activeprompt-db

ActivePrompt database

(application defined)

application/x-vocalizer-activeprompt-db;mode=automatic

ActivePrompt database, overriding it to work in automatic insertion mode

(application defined)

application/x-vocalizer-rettt+text

User-defined ruleset

(application defined)

audio/basic

audio/basic has two interpretations according to its IETF specification, one where it is headerless and another where it has an AU header. Vocalizer supports both and internally disambiguates between them.

 

Headerless 8kHz mulaw audio recording

.ulaw or .mulaw

Audio recording with a Sun/NeXT AU header containing 8kHz 16-bit linear PCM samples, 8kHz alaw samples, 8kHz law audio samples, or 22kHz 16-bit linear PCM audio samples.

.au or .snd

audio/L16;rate=8000

Headerless 8kHz 16-bit linear PCM audio recording

.L16

audio/L16;rate=22050

Headerless 22kHz 16-bit linear PCM audio recording

(application defined)

audio/x-alaw-basic

Headerless 8kHz alaw audio recording

.alaw

audio/x-nist

Audio recording with a NIST SPHERE header containing 8kHz 16-bit linear PCM samples, 8kHz alaw samples, 8kHz mulaw audio samples, or 22kHz 16-bit linear PCM audio samples. NIST SPHERE shorten and wavpack compression is not supported.

.nis, .nist, or .sph

audio/x-wav

(not audio/wav as some web servers incorrectly return)

Audio recording with a RIFF WAV header containing 8kHz 16-bit linear PCM samples, 8kHz alaw samples, 8kHz mulaw audio samples, or 22kHz 16-bit linear PCM audio samples.

.wav

text/email

Email input text using the default Vocalizer character encoding for the current language.

Not applicable.

text/html

HTML input text using the default Vocalizer character encoding for the current language.

.html

text/plain

Plain input text using the default Vocalizer character encoding for the current language.

.txt

text/plain;charset=euc-jp

Plain input text using the Japanese EUC encoding

(application defined)

text/plain;charset=iso-8859-1

Plain input text using the ISO-8859-1 (Latin 1) encoding

(application defined)

text/plain;charset=shift-jis

Plain input text using the Japanese Shift-JIS encoding

(application defined)

text/plain;charset=utf-16

Plain input text using the Unicode UTF-16 encoding (the best recommended encoding)

(application defined)

text/plain;charset=utf-8

Plain input text using the Unicode UTF-8 encoding (another recommended encoding)

(application defined)

text/plain;charset=windows-1252

Plain input text using the Windows-1252 encoding

(application defined)

text/plain;charset=charset

Plain input text with another character encoding as specified by charset.

(application defined)

Configuring cache policies

Vocalizer relies on Cache-Control headers to define how often to fetch needed data. If a web server does not provide expiration information, Vocalizer calculates a default behavior (using the Last-Modified stamp) which might not be optimal for your system: Vocalizer might re-fetch data more often than needed.

To avoid performance problems, configure all web servers to specify the cache policy for the file types that are commonly used with Vocalizer:

  • Ensure that the web server allows Vocalizer to cache the data by specifying the proper HTTP/1.1 Cache-Control headers. For example, "Cache-Control: max-age=1440" allows Vocalizer to cache fetched data for 24 hours (1440 seconds).
  • Choose the cache duration carefully based on each application’s requirements. Be especially careful with digital audio recording insertions: make the duration long enough so that fetches are infrequent, but short enough so that Vocalizer uses updated data within a reasonable time frame.

When Vocalizer starts, it loads the Internet fetch cache. (The cache does not persist across process restarts.) If the web server specifies a long cache duration, and you require an emergency update, stop and re-start the process that controls Vocalizer.

For information on optimizing Internet fetch performance, see Limiting delays when internet fetching is used.