sgc

The sgc utility precompiles an XML grammar (a *.grxml text file) into a binary grammar (*.gram), or trains a Statistical Language Model (SLM) natural language grammar from a prepared training file. You can also generate a binary output SLM by using the option -slm.

The utility is located in: %SWISRSDK%\amd64\bin

Usage

sgc grammar1.grxml grammar2.grxml ...
   [-baseline path]
   [-batch filename]
   [-language lang]
   [-langver version]
   [-lexicon_uri uri_for_arpa_dictionary]
   [-load_arpa filename]
   [-no_gram]
   [-no_logo]
   [-no_script_verify]
   [-optimize n]
   [-out filename]
   [-slm filename]
   [-test filename]
   [-train filename]

Options

grammar1.grxml grammar2.grxml ...

The locations and names of one or more grammars to be compiled.

-baseline path

Specifies the starting location for all relative file path references.

-batch filename

Specifies a batch file containing instructions to compile several files.

-language lang

Specifies the language to use in the compilation.

-langver version

Specifies the language pack version for compilation. Use this parameter to compile a grammar with the same language version used by the application. (If the application loads a binary grammar compiled for a different language version, Recognizer returns an error.)

  • If your system has only one version of a language installed, this parameter is not required for that language.
  • If your system has more than one version of a language, sgc uses the newest by default. The parameter is required if the application uses a previous version.

This example compiles for a version of US English:

-langver "en-us 9.0.0"

When the grammar covers more than one language, you can specify the version of each:

-langver "en-us 9.0.0,fr-ca 10.0.0"

-lexicon_uri uri_for_arpa_dictionary

Specifies a pronunciation dictionary during compilation training of an ARPA ngram. Ignored unless -load_arpa is also used.

-load_arpa filename

Specifies a file that contains SLM training data written in ARPA format. The resulting binary grammar will be a simple loop over all vocabulary words in the training file. When used, an input grammar is not allowed. Cannot be used with -train.

Note: Recognizer uses the Katz backoff formula, which says that if the n-gram doesn't exist in the language model, use the n-1-gram likelihood with its backoff weight.

-no_gram

This option is available with -train. It suppresses output of the binary grammar file, and is used when configuration parameters inside the training file are writing FSM and wordlist output.

-no_logo

Suppresses the version info when using a script.

-no_script_verify

Does not check ECMAScript when compiling the grammar.

-optimize n

Sets the optimization level for the compilation. Value are 0–12 (but not 10). Generally, lower values compile faster but get slower recognition, and higher values compile slower and recognize faster.

-out filename

Specifies the filename for the compiled output.

-slm filename

Generates a binary output SLM. The input file can be an SLM training file or an slmxml file. For example:

sgc -slm mytraining.xml

generates the SLM mytraining.slm. See Interpolated SLMs.

-test filename

This option specifies an input file of test sentences. The compiler reports perplexity measurements for each sentence provided.

-train filename

Specifies a file that contains SLM training data. This option requires a training file, not an SRGS speech grammar. The resulting output is a binary grammar that is a simple loop over all vocabulary words in the training file. Cannot be used with the -load_arpa option.

Example

> sgc mygrammar.grxml -optimize 9 -no_script_verify