Loading...

Passive Recording

Passive Recording with the PIKA HMP Digital PCIe board

Description:

Passive Recording (tapping) a T1/E1 span means to record the Audio and Signaling channels without the network being able to tell that a device is connected. The logger card does not respond or inject signaling information in to the circuit being monitored. ( In passive recording mode, this card does not “terminate” a call )

Using the on-board HDLCs, and the low level HMP API, it is possible to capture the ISDN messages and record the audio conversations to file. PIKA has created a sample application to demonstrate these capabilities.

Passive Logging Application

Getting Started:

1. Install the PIKA Low Level HMP API ( 2.7.x or newer )
2. Reboot
3. Remove the line jumpers on the PIKA HMP PCIe Digital board
4. Install the board in the PC
5. Follow the cabling diagram to ensure proper connectivity
6. Download, compile and run the sample Digital Logging application

Cabling:

Passive Logging Cabling

Please note that the TX pins on the Logger card are NOT connected. This will prevent the board from transmitting messages on the monitored line or from changing the line characteristics that could affect link quality.

By removing the jumpers on the board, a high impedance circuit is activated to preserve the line quality of the monitored line.

Passive Logging PortNotes:

  1. This sample application can be used in either a Windows or Linux environment
  2. Basic testing of the sample application has been performed with up to 3 PCIe boards in a single system
  3. This functionality is not currently available at the GP High Level API layer because the HDLC API is not present at the GP API layer
  4. The sample application has been written to passively record based on ISDN events.
    (RBS and MFR2 protocols are not implemented in the sample application but could be
    added by customers to log calls based on these protocols)
  5. This application note specifies to use the HMP PCIe board because of the integrated high impedance circuit on the receive pins
  6. It is important to follow the proper cabling diagram, otherwise the tap can be noticed on the network and possibly affect line quality or bring it down the line completely
  7. There is a compile flag to set the Debug Logging level in the application

Support:

For more information please contact PIKA Support:

Email: support@pikatech.com
Forum: http://www.pikatech.com/forum/
Phone: 613.591.1555 x1

Advanced Tone Detection

The advanced tone detection (ATD) application is a block process with a fixed block size of 192 samples, or 24 ms.

The ATD application is a flexible tone detection application that can be configured to detect almost any tone or multi-tone signal that can be found in a telephone network, such as: fax and modem answer tones, call progress tone, special information tones, and multi-frequency tones (e.g. MF/R1).

The concept of tones and tone groups are used to describe the desired operation of an ATD detector. ATD is designed to detect tone groups. A tone group is defined as a group of one to four user defined tones. A default tone descriptor file is provided along with a user-friendly application, PikaATD, which can be used to add additional tones and tone groups. The default tone descriptor file contains the following tone groups:

  • Call progress tone
  • Mode calling tone (1300 Hz)
  • Fax/modem answer tone (2100 Hz)
  • Fax CNG calling tone (1100 Hz)
  • Pulse talk-off rejection
  • Special information tones (SIT) 0-4
  • MF/R1 KP, digits 0-9, ST, Spare 0-2, 2600Hz, Fast 2600 Hz

The default tones can be modified if desired, e.g. call progress tone; however, the default tone groups should not be deleted or re-ordered. Additional tones and tone groups can be appended to the existing default set using PikaATD.
Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Flexible architecture:
    – Up to 63 unique tones can be described with the following parameters:
    Frequency
    Bandwidth (tolerance)
    Level
    Debounce resolution
    – Up to 127 tone groups can be described with the following parameters:
    A combination of 1 to 4 tones
    Signal to noise ratio
    Twist threshold
    Debounce period
  • Efficient use of DSP memory. Memory is dynamically allocated according to the requirements specified in PikaSetup.
  • PikaATD, included with MonteCarlo, is a very intuitive and easy to use application that can be used to modify or expand the tone descriptor file
  • A single tone descriptor file applies to all instances of ATD on a given DSP; however, each instance of ATD is capable of enabling or disabling any subset of tone groups
  • A computationally efficient algorithm is used to perform the tone detection
  • The ATD application can also be used as an energy meter. For example:
    ATD can be configured to report when an energy threshold has been exceeded
    The user can poll ATD to measure the average energy over a user-defined time period

Specifications

  • All tones within a tone group must have the same bandwidth
  • Tone specification parameters:
    Parameter Minimum Maximum
    Frequency (Hz) 350 3500
    Level (detection threshold, dBm0) -60 0

    The following parameters have a pre-defined set of valid value

    Filter Bandwidth (Hz) 36, 72, 109, 145, 218, 291, 437, or 582
    Debounce period resolution (ms) 12 or 24
  • Tone group specification parameters:
    Parameter Minimum Maximum
    Tones in group (number of tones) 1 4
    nSNR (see note 1) 0.000 1
    Twist (dB) (see note 2) 0 100
    Tone present debounce time (ms) 0 32767
    Tone absent debounce time (ms) 0 32767

Resource Requirements

Memory requirements:

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

2203 + (15 x NT) + (10 x NG)

118

NT + NG + (NG x 4)

(NT =Number of Tones, NG =Number of Tone Groups)

Sample MIPS requirements:

Operating Mode MIPS
Detector enabled with moving average 0.20
Call progress tone 0.33
Special Information Tones (SIT) 0.67
MF/ R1 1.16

Note 1: Normalized signal to noise ratio (nSNR) is used as a detection threshold and is expressed as the ratio of the desired signal energy to the total signal energy.
Note 2: Twist is a detection threshold applicable to multi-tone signals. It specifies the maximum energy difference between the tones that make up the tone group.

Audio

The audio application is a block process with a variable block size. PIKA MonteCarlo uses a default block size of 24 ms. The audio format used and the sample rate of the converted data determine legal values for the block size. Other than GSM 6.10, 24 ms is a valid process block size for all modes of operation.

The audio application can be used to:

  • Play audio from the host to a pulse code modulation (PCM) stream
  • Record audio from a PCM stream to the host

The following audio formats are supported: 3 and 4-bit ADPCM, µ-law, A-law, 16-bit linear, 8-bit linear, ETSI and Microsoft GSM 6.10 (see note 1). Furthermore, other than for global systems for mobile communications (GSM), the following sampling rates are supported (see note 2): 4k, 6k, 8k (default) or 11k samples per second. The format of the audio data when written to a PCM stream or read from a PCM stream is 16-bit linear. Subsequent format conversions are carried out by VPOS, PIKA Technologies’ proprietary voice processing operating system.

Full duplex operation requires two audio processes, one to play and one to record. Common applications for audio are: audio logging, IVR systems, text-to-speech applications, voice recognition applications, and host-based VoIP.

In most cases audio data is played from the host or recorded to the host. A buffer is allocated in DSP memory to transfer compressed speech to/from the host. An audio buffer consists of 2-15 sub-buffers, and each sub-buffer contains a configurable number of frames, where the frame size is equal to the process block size, e.g. 24 ms. When recording, once a sub-buffer is filled it is sent to the host. During playback, when a sub-buffer is played, a request is sent to the host indicating an available sub-buffer.

Some factors that should be considered when choosing the number of frames in a sub-buffer and the number of sub-buffers are:

  • DSP memory usage
  • Message transfer rate between the DSP and Host
  • Host response time to service the DSP messages

An additional mode of operation is available called Internet-mode. In the case of Internet mode, it is likely that the audio buffers would contain more sub-buffers, but the size of the sub-buffers would be small, e.g. 24 ms., or 1 frame. Recording in Internet mode is the same as recording in normal mode. Playback in Internet mode provides some additional features. Audio frames can be inserted or skipped, so that the trade-off between latency and lost audio frames can be managed. The process can be configured to use frame reconstruction if a frame is missing. Additional messages can be sent to the host when buffer-underflow occurs, allowing the host to throw away late audio data. MonteCarlo supports two methods of transferring audio data to the DSP, PK_AUDIO_OutputAddBuffer and PK_AUDIO_OutputPassBuffer. In the case of Internet mode, PK_AUDIO_OutputPassBuffer should be used as it incurs less latency.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • DC filtering
  • Automatic gain control (AGC)
  • Programmable gain, for both Play and Record
  • Voice activity detection (VAD)
    – When VAD is enabled, a record process will always record the audio data; however, it will only send audio to the host when the VAD has detected speech
  • Time-stamping
    – When VAD is enabled, an additional message is sent from the DSP to the host containing timing information about the start of a speech segment or the end of a speech segment
  • Pre-speech buffering
    – Pre-speech buffering requires that VAD be enabled, and would normally be used with VAD
    – When speech is detected, this feature allows the user to have any number of sub-buffers preceding the detection speech to be sent to the host
  • GSM 6.10, ETSI and Microsoft variant
    – Not all features of Audio are supported with GSM
    Playback speed control is not supported
    Frame insertion/skipping is not supported with Microsoft GSM
    Sample rate conversion is not supported
    Variable frame size is not supported. GSM requires that the frame size, and therefore the process block size, be 20 ms. Having a process block size of 20 ms implies that a Play Record process configured to support GSM will not be able to support other compression formats where 20 ms is an invalid frame size, e.g. all other formats except 3-bit ADPCM and formats using 6k samples/s
  • Flexible DSP memory usage
    – Configurable frame size, sub-buffer size and number of sub-buffers
    – DSP audio buffers store data as 24-bit words (see note 3)
  • Pitch corrected variable speed playback
  • Audio frame reconstruction during playback
  • Source of recorded data can be pulse code modulation (PCM) buffer or data buffer if pre-processing is required, e.g. echo cancellation
  • Destination of play data can be the PCM buffer or another data buffer if post-processing is required
  • Source of play data can be a shared buffer, i.e. another DSP process, instead of host buffers
  • Transparent mode, which allows a user to record the exact data on a PCM stream by disabling the DC filtering, AGC and input/output gain
  • Host based VoIP:
    – Small audio buffers to minimize latency
    Skipping and insertion of frames for jitter buffer management
    – Frame reconstruction for late or lost frames
  • Two modes of stopping a playback:
    – Stop: stops play back upon receiving the message
    – Stop-friendly: stops playback once all audio buffers have been played
  • Supports the insertion of a tone before starting a recording
  • Nibble-swapping to support Dialogic ADPCM (adaptive differentiated pulse code modulation)
  • Support for pausing during playback
  • Reports overflow and underflow of audio buffers to host. Buffer underflow will cause a playback to pause. Buffer overflow will result in lost recorded data.

Specifications

As alluded to earlier, only certain frame sizes are allowed. The primary reason for the restriction is that the compressed data is packed into 24 bit words. The following table illustrates the legal frame sizes for the various compression schemes.

Encoding
Format
Bits per audio
sample

Legal Frame Sizes

4 k samples/s
(ms)

6 k samples/s
(ms)

8 k samples/s
(ms)

11 k samples/s
(ms)

µ-law 8 6,12,18,24,20 2,4,6,8…30 6,12,18,24,30 6,12,18,24,30
A-law 8 6,12,18,24,20 2,4,6,8…30 6,12,18,24,30 6,12,18,24,30
8-bit linear 8 6,12,18,24,20 2,4,6,8…30 6,12,18,24,30 6,12,18,24,30
3-bit ADPCM 3 2,4,6,8…30 2,4,6,8…30 2,4,6,8…30 8,16,24
4-bit ADPCM 4 6,12,18,24,30 2,4,6,8…30 6,12,18,24,30 6,12,18,24,30
GSM GSM is a frame-based compression algorithm with a frame size of 20 ms (see note 4)

Parameter Specifications

 

Min Value

Max Value

Typical Value

Input Gain

-40 dB

+24 dB

0 dB

Output Gain

-40 dB

+24 dB

0 dB

AGC target gain constant

-15 dBm0

Min AGC gain

-40 dB

+24 dB

+18 dB

Max AGC gain

-40 dB

+24 dB

-6 dB

AGC attack rate

170 ms

AGC decay rate

750 ms

Speech Detect Threshold

-36 dBm0

Play rate (for variable speed playback)

0.25

4.0

1.0

VAD min frame energy threshold

-120 dBm0

+3 dBm0

-40 dBm0

VAD debounce period, in frames

1

32767

3

VAD holdover energy threshold

-120 dBm0

+3 dBm0

-40 dBm0

VAD holdover debounce period, in frames

1

32767

41

Number of pre-speech sub-buffers

1

Number of SubBuffers-2

2

Resource Requirements

Memory requirements: (for version 1.3.5)

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

6725

193

See PK_AUDIO_GetBufferSize
(see note 5)

Host Allocated Memory = max (bits per audio sample)/24 x 8 (k samples/s) x FrameSize(ms) x FramesPerSubBuffer x SubBuffersPerFrame.

For standard GSM:
Host Allocated Memory = 11 x FramesPerSubBuffer x SubBuffersPerFrame.

For Microsoft GSM:
Host Allocated Memory = 22 x FramesPerSubBuffer x SubBuffersPerFrameMIPS Requirements

MIPS Requirements

 

A-law

µ-law

3-bit
ADPCM

4-bit
ADPCM

8-bit
linear

GSM
Mips per Audio Resource Enabled
(see note 6)

0.72
(overhead incurred whenever a PCM stream is enabled)

Mips per Active Play Process

0.46

0.48

0.57

0.57

0.38

TBD

Mips per Active Record Process

0.46

0.44

0.73

0.67

0.30

TBD

Notes

Note 1: GSM is supported through OpenVPOS. The standard sre file for Audio does not support GSM; a special sre file must be loaded. The Microsoft variant packs two frames into 65 bytes, whereas the ETSI variant packs each frame into 33 bytes.

Note 2:
The sampling rate of the input audio data is determined by the hardware codec, and is always 8000 samples per second. In order to support the other sampling rates, the DSP will perform a sample rate conversion. This implies that using a sampling rate of 11 kHz does not provide higher fidelity.

Note 3:
In the case of 16-bit linear, MonteCarlo hides the fact that the DSP will actually be asked for µ-law or A-law, and MonteCarlo will convert it to 16-bit linear. This allows better use of DSP memory, since 16-bit samples do not fit well in 24-bit DSP words, and the conversion is not computationally extensive for the host.

Note 4:
Microsoft GSM combines two frames for more efficient data packing. This effectively changes the frames size to 40 ms, since it takes two 20ms frames to fill the Microsoft compatible data structure.

Note 5
: Example: A-law with a 24ms frame size, 5 frames per sub-buffer and 2 sub-buffers would require: ((24 x 8) / 3) x 5 x 2 = 640 DSP words

Note 6
: This applies to any DSP application that requires PCM I/O, but is currently described here since Audio is the flagship DSP application. Whenever an audio resource is enabled, processing is required by VPOS. VPOS cannot differentiate between channel 0, 32, 64, or 96. Therefore, when a channel is enabled in the first frame, i.e. channels 0 to 31, the respective channel in all frames is also enabled. So, from VPOS’ point of view, channels on a PrimeNet MM are enabled in groups of 4 and channels on a Daytona MM are enabled in groups of 2. The implication here is that the choice of channel has an impact on DSP real-time. If 4 channels were required on a PrimeNet MM, then it would be wise to choose the same channel in each frame, i.e. x, x+32, x+48, x+96. Choosing channels 0 to 3 would actually enable processing on 16 channels, 0-3, 32-35, 48-51, and 96-99, wasting almost 9% of the available Mips on the DSP.

CAS

The channel associated signaling (CAS) application (see note 1) is a block process with a variable process block size. PIKA MonteCarlo uses a default process block size of 6 ms.

Only one CAS process needs to run on a DSP, as one process can be used to process the signaling for all channels, provided there is enough DSP memory and DSP real-time available.

The signaling bits, ABCD bits, are presented to the DSP by the hardware framer (see note 2) on a pulse code modulation (PCM) channel. In the case of T1, the framer implements the robbed bit signaling (RBS), robbing the least significant bit on every 6th frame for each channel. In the case of E1, RBS is not used; timeslot 16 is used to carry the ABCD signaling bits. The DSP application does not need to know how the signaling is being implemented, E1 or T1, since the framer always presents the signaling bits to the DSP in the same manner.

The functionality of the CAS application is as follows:

  • Detect and report signaling transitions on the incoming ABCD signaling bits
  • Generate signaling transitions on the outgoing ABCD signaling bits

Transitions are detected and generated on a per bit basis. A template is used to describe a transition sequence on an individual bit. Up to 8 templates per signaling bit can be defined by the host. It is possible to activate a subset of the templates as a mask is used to enable each template. The templates contain the following type of information:

  • Enumeration indicating whether the first transition is to 0 or 1
  • The minimum or maximum number of transitions depending on whether we are detecting or generating the signaling
  • Timing information, measured in 125 us time increments

A signal generator operates independently on each signaling bit. To generate a desired signaling pattern it is possible to synchronize multiple signal generators.
A detector will report signaling transitions that match the template. Transitions that do not match a template are reported as isolated transitions.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA DSP and HMP architectures
  • Written primarily in C with some Motorola assembly for the DSP563xx
  • This DSP application uses 24-bit arithmetic mode

Features

  • Flexible architecture allows the generation and detection of the ABCD signaling for various T1 and E1 protocols
  • Up to 8 templates per signaling bit can be defined
  • Templates can be set and changed at run-time
  • Signaling bit states can be polled
  • An Output channel can be detached from the CAS process when the default VPOS563 pre-fill value is correct for the output channel. This allows for significant DSP real-time savings.
  • Error conditions are reported by the DSP
  • The host can abort current or flush pending signaling transitions, with acknowledgements from the DSP
  • Timing of all signaling events is reported
  • Timeout conditions following a signaling event are reported

Specifications

  • In order to limit potential real-time bursts, the CAS process is limited to processing 10 messages per activation period, i.e. process block size
  • The resolution of the specified time increments is 1 sample, or 125 us
  • The maximum time period that can be reported is 0x7FFFF, approximately 65.5 s
  • A single DSP can support up to 240 channels of signaling (240 channels = 8 E1s)

Resource Requirements

Memory requirements:

Application Memory (In DSP) words Process Memory Host Allocated Memory (see note 3)

3042

423

(34 x G) + (39 x D)

G = Number of generator channels, D = Number of detector channels. In general G=D

MIPS Requirements
Approximately 0.36 + (n × 0.221) Mips, n = number of channels

Notes

Note 1: This application was formerly referred to as RBS; this name is misleading since RBS is a T1 protocol and the DSP application can be used to support CAS for both T1 and E1.

Note 2: The RBS application is designed to work with the Comet PM4351 E1/T1 Transceiver Framer. It should work with any framer that uses a dedicated H.100 channel for each signaling channel and places the ABCD bits in the 4 least significant bit positions of the byte.

Note 3: The values in the formula are the current sizes of the structures. Upon receiving the initialize message, the DSP application will respond with a message that specifies the size of this block of memory.

Voice Conferencing

The voice conferencing application is a stream process that runs every 0.125 ms (or 1 sample).

Voice conferencing involves adding several parties to a phone conversation. One process of the PIKA voice conferencing application can support a number of conferees and conferences simultaneously. The conference process is controlled by a parameter structure that is downloaded from the host.

Each conference can run in one of two modes. One is the sample switching mode that should be used for building large conferences. When running in this mode, the conferencing process broadcasts the loudest speaker to all other conferees in the conference, and the channel containing the loudest speaker receives the second loudest speaker. The other operating mode is the sample summation mode that is mainly used for mixing audio signals, building small conferences or simultaneously logging both sides of a duplex conversation. When running in the summation mode, the conferencing process adds all input signals to get the conference sum. The output for each conferee is then determined by subtracting the conferee input signal from the conference sum.

Conferences running on different DSPs can be interconnected to form a larger conference. In order to bridge two conferences, each conference needs to use one of its channels (conferees) to connect with the other conference.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Each channel has programmable input and output gains
  • Any of the conferees can be a member of any conference
  • The application supports two operating modes: sample switching and sample summation. The operating mode can be selected on a per conference basis.
  • DSP real time is only used for active conferees and conferences
  • Larger conferences can be created by connecting conferences between DSPs
  • Parameter of gain robustness is used to control the conference sensitivity to high loop gain for sample switching mode
  • Parameter of long distance robustness is used to control the conference sensitivity to echoes with a significant delay for sample switching mode
  • Supports coaching mode
  • Supports talk only mode and listen only mode
  • DTMF discrimination and clamping (see note 1)

Specifications

  • The numbers of conferees and conferences are limited only by the amount of available real time, the number of available channels, and the amount of available memory. Therefore, the PIKA DSP resource calculator should be used to determine the maximum number of conference groups and members that can be used on a DSP.
  • The input gains can be individually set to any level between +6 dB and -40 dB in 0.1 dB steps. The output gains can be individually set to any level between +24 dB and -40 dB in 0.1 dB steps.
  • Maximum delay is 0.5 ms

Gain Robustness

Typical Maximum Gain (dB)

0

-3

1

0

2

3

3

6

4

9

  • Gain robustness is a parameter with values 0 ~ 4 (see note 2). A value of 0 is for minimum robustness and a value of 4 is for maximum robustness.
  • Long distance robustness is a parameter with values 0 ~ 9. A value of 0 is for minimum robustness and a value of 9 is for maximum robustness.

Long Distance Robustness

Typical Maximum Distance (km) (see note 3)

0

400

1

800

2

1200

3

1600

4

2000

5

2400

6

2800

7

3200

8

3600

9

4000

Resource Requirements

Memory requirements:

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

427

92 + 14 (*Nc)

96 + 13 (*Nc)

(* NC = Number of conferees)

MIPS Requirements

Operating Mode MIPS
Voice Conference overhead (sample switching only) 5.31
Voice Conference overhead (sample summation only) 3.13
Voice Conference overhead (sample switching adn sample summation) 5.96
Voice Conference per conferee (sample switching) 0.445
Voice Conference per conference (sample switching) 0.556
Voice Conference per conferee (sample summation) 0.264
Voice Conference per conference (sample summation) 0.122

Notes
Note 1: The help of the host is required.
Note 2: The gain robustness can be as much as 9. However, when the gain robustness increases from 4 to 9, no significant improvement will be gained with regard to robustness.
Note 3: The typical maximum distances apply when echo cancellers are not provided in the circuit.

Dial Pulse Detection

The dial pulse detection feature allows applications to detect audible clicks when a number is dialed from a rotary or pulse phone. The clicks are then used as if they were dual-tone multi-frequency (DTMF) tones

PIKA Technologies’ dial pulse detection solution consists of two modules: a click detector and a digit detector. The digit detector is integrated into MonteCarlo and runs on the host. The click detector runs on the DSP.

The pulse detection on the host computer application, i.e. the click detector, is a block process with a fixed block size of 192 samples, or 24 ms.

The click detector detects and parameterizes incoming clicks. The digit detector analyzes the incoming clicks and reports detected digits to the application. The digit detector is trained to a global database of pulse digits.

The digit detector uses parameters found in a “score” file to make decisions on the presence of dial-pulse digits. If different score files are available, they can be selected in PikaSetup.

Platform Support

  • MonteCarlo 6.2, or later
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx, using the 16-bit arithmetic mode

Features

  • Digit detection is parameterized by training to a database of dial-pulse digits. The parameters are stored in a score file
  • Flexible DSP memory usage

Specifications

  • Specifications are determined by the score file
  • Dial pulse detector is suitable for detecting dial-pulse digits with a nominal rate of 10 pulses per second

Resource Requirements

Application Memory
(In DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

2221

234

0

MIPS Requirements

Operating Mode MIPS
Detector enabled 1.0

DTMF

The dual-tone multi-frequency (DTMF) application is a block process with a variable process block size; however, the process block size must be a multiple of 80 or 96 samples, 10 or 12 ms. PIKA MonteCarlo uses a default process block size of 192 samples, or 24 ms.

The DTMF receiver detects DTMF signals and reports the detected signals to the host and/or another DSP process, e.g. a real-time protocol (RTP) process in a VoIP application.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx
  • This DSP application uses 16-bit arithmetic mode

Features

  • Detect and report the following DTMF digits:

DTMF Digit

Reported Code

Row Frequency (Hz)

Column Frequency (Hz)

1

0x01

697

1209

2

0x02

697

1336

3

0x03

697

1477

4

0x04

770

1209

5

0x05

770

1336

6

0x06

770

1477

7

0x07

852

1209

8

0x08

852

1336

9

0x09

852

1477

0

0x0A

941

1336

*

0x0B

941

1209

#

0x0C

941

1477

A

0x0D

697

1633

B

0x0E

770

1633

C

0x0F

852

1633

D

0x0O

941

1633

  • Detection events, which specify the tone number and tone duration, are reported at the end of a DTMF signal
  • Can optionally enable the reporting of the start of a detected signal, including the elapsed time since the detection of the last DTMF signal
  • The maximum power of the detected signal is also reported. In some applications this feature can be used to determine the source of the DTMF signal.
  • Can be used to support RFC2833 for VoIP applications. Events are reported to another DSP process instead of the host. For VoIP applications the process block size of the DTMF detector should be the same as the codec.
  • Several parameters are available to be modified, giving a user the ability to tune the DTMF detector to their application

Specifications

Specifications for generating DTMF signals can be found in Reference Document [1].

Operation

Non-Operation

Frequency Tolerance

< = 1.5%

> + 3.5%

Signal Duration

> = 36 ms

< 23 ms

In general the following parameters do not need to be modified. Changing a value will generally improve the performance in one area at the expense of the performance in a different but related area.

# Parameter Default Value
(converted DSP values)
Minimum Maximum
0 Detector debounce period (ms) (Steps of 12 ms)

40
(2)

28,40, 52, 64, 76…

1

Silence debounce period (ms) (Steps of 12 ms)

40
(4)

4, 16, 28, 40, 52, 64….

2 Row frequency level detect threshold (dBm0)

-43
(0x0028c31d)

-55

-16

3

Column frequency level detect threshold (dBm0)

-43
(0x0028c31d)
-55

-55

-16

4

Total energy level detect threshold (dBm0)

-36
(0x00cc4b73)

-55

-16

5

Total signal to noise (see note 1) threshold (10,000=1.0)

9400
(0x1275)

0

9999

6

Row signal to noise (see note 1) threshold (10,000=1.0)

7588
(0x6120)

0

9999

7

Column signal to noise (see note 1) threshold (10,000=1.0)

7588
(0x6120)

0

9999

8

Forward twist threshold (dB)

12
(0x0814)

0

30

9

Reverse twist threshold (dB)

8
(0x1449)

0

30

10

Row energy bridging ratio (10,000=1.0)

9999
(32440)

0

9999

11

Column energy bridging ration (10,000=1.0)

9999
(32440)

0

9999

12

Ignore inter-digit guard time if tone changes (state – true (1) or false (0))

True
(1)

N/A

N/A

Embedded within MonteCarlo are equations that convert parameter values to the DSP values. If sending messages directly to the DSP application, e.g. using OpenVPOS, then it will be necessary to use the DSP values. An application exists for carrying out the conversion.

Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

2001

157

0

MIPS Requirements

Operating Mode MIPS
Idle2001 0.0073
Enabled 0.71

Notes

Note 1: A normalized signal to noise ratio is expressed as the ratio of the desired signal energy to the total signal energy.

Reference Document
[1] ITU-T Recommendation Q.23 Technical Features of Push-Button Telephone Sets

Echo Cancellation

The echo canceller application is a block process with a variable block size. MonteCarlo uses a default process block size of 24ms. The block size can vary from 6ms to 30ms, in steps of 2ms.

It is designed to perform hybrid echo cancellation in conformance with G.168, with some exceptions (see note 1), supporting a echo path length of up to 128 ms. It is designed to be used primarily in conjunction with other DSP or host based applications, such as VoIP, pulse detection, speaker verification, audio conferencing, and audio conferencing with coaching. It can also be used as a network echo canceller, but is not optimized for this type of application as it will introduce a delay equal to at least one and a half times the process block size.

The echo canceller uses two input channels and may use one output channel. The two input channels are the signal channel that contains the signal with echo, and the reference channel (see Figure 1). The output channel is optional. The output can be written either to any pulse code modulation (PCM) channel, or to a specified location in DSP memory. This way the echo canceller output can be easily read by another DSP application and at the same time minimize channel usage.

Figure 1: Echo cancellation diagram

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA DSP and HMP architectures
  • Written in Motorola assembly for the DSP563xx

Features

  • ITU-T G.168 compliant (see note 1)
  • Fast convergence
  • Robust double-talk detection
  • Low divergence during double-talk
  • Can be used to detect changes in the echo path
  • Support for VoIP applications
  • Cut-through support for speech recognition and IVR applications
  • Pause/Resume feature
    – This allows the echo canceller to save its state and stop the echo cancellation process. The input signal is written to the output unchanged. If the echo canceller needs to be disabled and re-enabled during a call, then the pause/resume feature avoids having to re-train the echo canceller in the middle of the call.
  • Echo suppressor
    – The echo suppressor removes the residual echo
  • Comfort noise generator
    – When the echo suppressor is active, comfort noise is generated that matches the noise received from the signal channel, so as to prevent the line from sounding dead
  • Flexible output interface
    – Cancelled output signal can be placed on a PCM channel or at a specified DSP memory location, so that it can be used as input by another DSP application
  • Flexible architecture
    – A number of parameters are visible to the user and, if necessary, they can be altered to adjust the echo cancellers’ sensitivity to specific conditions:
    Double-talk threshold:
    The ratio of the signal level to the reference signal level. If the signal is below this threshold, it is considered to be echo and the prediction filter coefficient may be updated. If it is above this threshold, it is considered to be double-talk and the coefficient updating is disabled. Default value is –6.0 dB.
    Speech-present threshold:
    When the energy of the reference signal is less than the speech-present threshold, the reference source is considered to be silent and thus the adaptive filter coefficients are not updated. The default level is -40 dBm0.
    Echo-suppression threshold:
    Echo suppression is enabled when the difference between the reference and the cancelled signal level is less than this threshold:
    Adaptation rate
    Adaptation stage duration
    Echo-path change index threshold
    Echo-path change detection duration
    Signal channel speech present threshold

Specifications

Parameter

Minimum

Maximum

Default Value

Compensated echo path length (ms)

1

128

Block Size (samples)

48

240

Double-talk threshold (dB)

-6

Speech-present threshold (dBm0)

-65.9

3.2

-40.0

Echo-suppression threshold (dB)

-18.0

Adaptation rate

0

524288

Stage 1 = 524288
Stage 2 = 340736
Stage 3 = 183552

Adaptation stage duration (ms)

0

4096 (sum of both stages)

Stage 0 = 750
Stage 1 = 2500

Echo-path change index threshold

0

8388607

104858

Echo-path change detection duration (ms)

5000

Signal channel speech present threshold (dB)

-65.9

-17.8

-40.0

Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

1624

113

2M – 1 + Q

Where:
M = 8..1024(desired tail length in samples)
Q = M (only if echo path detection is enabled, otherwise equal to 0)
Simple MIPS Requirements

Operating Mode

MIPS

Tail length=6ms

Tail length=12ms

Tail length=32ms

Tail length=128ms

Block size 10 ms

3.1

4.4

8.8

29.8

Block size 24 ms

2.7

3.9

8.0

27.7

Block size 30 ms

2.6

3.8

7.9

27.4

Table 1: Real-time MIPS requirements for the echo canceller with updates activated.

Operating Mode

MIPS

Tail length=6ms

Tail length=12ms

Tail length=32ms

Tail length=128ms

Block size 10 ms

1.4

1.9

3.7

12.2

Block size 24 ms

1.1

1.5

3.0

10.1

Block size 30 ms

1.0

1.5

2.9

9.9

Table 2: Real-time MIPS requirements for the echo canceller with updates frozen.

Notes

Note 1: The echo disabler tone and maximum delay requirement of 1ms are not supported, as these features apply to network echo cancellers. The tests specified in G.168 for the use of the echo canceller with low speed V. series modems have not been performed. However, there is no reason to expect the echo canceller to fail in these tests.

FAX

PIKA Technologies provides 4 free fax ports per system in the MonteCarlo SDK.

PIKA’s fax bundle is based on Commetrex’s fax solution. It consists of three modules: a T.30 protocol engine, an image conversion library, and a fax modem application.

The T.30 protocol engine implements ITU-T T.30 Recommendation, which defines the procedures that are necessary for document transmission between two facsimile terminals in the public switched telephone network (PSTN). It is used to establish and manage communications between two fax modems. It can be described by five separate and consecutive phases: call establishment, pre-message procedure, message transmission, post-message procedure and call release.

The image conversion library is used to support real-time conversions required by T.30 and a limited set of off-line conversions.

The fax modem application provides ITU-T V.17, V.34 (HMP only), V.27ter, V.29, V.21 fax modems and the modem controller. V.21 is a robust low-speed modem and is used during the pre- and post-message procedures. The message, or image data, is transmitted using the higher speed modems: V.27ter, V.29, V.17 ad V.34. The modem controller configures the modems and controls the modem operation. It processes modem events and data and forwards these results to the T.30 protocol engine. It also handles commands and data from the T.30 protocol engine.

Platform Support

  • PIKA DSP platform
  • PIKA HMP platform

Features

Modem Features

  • Adheres to all the mandatory features of ITU-T V.17, V.29, V.27ter, and V.21
  • High speed V.34 fax support (HMP platform only)
  • Optimized implementation
  • High noise immunity
  • High speed modems are capable of detecting the presence of a V.21 signal
  • Large receiver dynamic range

Fax Features

  • Supports all the mandatory features of T.30
  • Implements T.30 Annex A to support Error Correction Mode (ECM)
  • Automatic detection of incoming fax calls
  • Allows both sender and receiver to add a header and footer to each sent or received document
  • Supports subscriber ID
  • Supports third-party fax viewers or editors
  • Multi-channel capability
  • The Commetrex Conversion Library (CCL) supports all T.4 and T.6 formats including:
    o Metric and inch based page sizes
    o All resolutions including 100×100, 200×200, 300×300 and 400×400
    o MH, MR, and MMR coding
  • o TIFF-F Format
  • Extensive compatibility testing against more than 130 different emulated fax devices with thousands of different configurations and settings, using the QualityLogic’s FAXLAB

Interoperability

PIKA Technologies’ Fax offering interoperates with a large number of devices from different vendors

PIKA – Fax Interoperability

Specifications

Modem

Speed (bps)

V.21

300

V.27ter

2400

V.27ter

4800

V.29

7200

V.29

9600

V.17

7200

V.17

9600

V.17

12000

V.17

14400

V.34

16800

V.34

19200

V.34

21600

V.34

24000

V.34

26400

V.34

28800

V.34

31200

V.34

33600

Receiver dynamic range: -10 ~ -50 dBm

GainPad

The GainPad application is a stream process that runs every 0.125 ms (or 1 sample). It is an efficient means of amplifying or attenuating.

A GainPad process can be configured to handle any number of simultaneous channels, limited only by the real-time and memory limitations.

It can be used to establish a loss and level plan for a PC-PBX component of an application.

In addition, since the process input and output channel numbers are specified separately, the GainPad application can also be used to perform switching within a single stream. For example, it can be used to perform switching on a PIKA InLine MM, where there is no hardware switch.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Low delay
  • Amplifies or attenuates digital input signals
  • Supports switching
  • Low DSP real time usage

Specifications

Parameter

Minimum Value

Maximum Value

Signal gain (in 0.1 dB increments)

-40.0

24.0

Signal delay (ms)

0.5


Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Additional memory required
for each channel

196

63

9

MIPS Requirements

Operating Mode MIPS
Gain Pad Idle 0.212
Gain Pad real time for one active stream and channel 1.345
Real time for each additional active stream 0.320
Real time for each additional active channel 0.104

GFSK

The generic frequency shift keying (GFSK) DSP application is a block process with a fixed block size of 192 samples, or 24 ms.

FSK is a robust digital modulation technique used by modems in which two different frequencies are used to represent the binary states 0 and 1. FSK modems can be found in legacy data networks and are also used to support Caller ID.

The PIKA GFSK is a generic FSK modem where the characteristics of the modem can be defined through the DSP API. Both full and half-duplex modes of operation are supported.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Flexible architecture. The following are examples of parameters that can be specified by a user:
    – Transmitter baud rate. Since FSK is a binary modulation scheme, the baud rate is equivalent to the data rate.
    – Mark frequency, used to represent a binary “1”
    – Space frequency, used to represent a binary “0”
    – Transmit level
    – Expected receiver baud rate along with a valid range
    – Energy threshold used to enable the receiver
    – Energy threshold used to disable the receiver
    – Parameters to define the receiver filters
  • Full and half duplex modes supported, i.e. one DSP process can perform as a transmitter, receiver or both
  • The DSP application does not process the raw data; therefore, any desired protocols can be implemented on the Host
  • Templates are available for common modems: ITU V.21, ITU V.23, BELL-103, BELL-202, North American Caller ID, Japanese Caller ID, European Caller ID
  • A MonteCarlo API, modem_design(), is available that can be used to generate all the required parameters for a user-defined FSK modem
  • A finite impulse response (FIR) band pass filter (BPF) is used in the receiver to suppress interference such as power line noise, rings and, in the case of full-duplex operation, the echo from the transmitted FSK signal.

Specifications

Parameter

Minimum Value

Maximum Value

Transmit level (dBm0)

-70

3.14

Receive enable threshold (dBm0)

-70

3.14

Receive disable threshold (dBm0)

-70

3.14

Baud Rate

75

2400

Transmit Mark Frequency (Hz)

350

3500

Transmit Space Frequency (Hz)

350

3500

Receive Mark Frequency (Hz)

350

3500

Receive Space Frequency (Hz)

350

3500

Examples

Modem Type Baud Tx Space (Hz) Tx Mark (Hz) Rx Space (Hz) Rx Mark (Hz)

V.21 Originate

300

1180

980

1850

1650

V.21 Answer

300

1850

1650

1180

980

V.23 Forward

1200

2100

1300

2100

1300

V.23 Backward

75

450

390

450

390

Bell 102 Originate

300

1070

1270

2025

2225

Bell 102 Answer

300

2025

2225

1070

1270

Bell 203

1200

2200

1200

2255

1200

North American Caller ID

1200

2200

1200

2288

1200

Japanese Caller ID

1200

2100

1200

2100

1300

European Caller ID

1200

2100

1200

2100

1300

Resource Requirements

Memory requirements

 

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

2288

292

See PK_FSK_GetTxBufferSize and
PK_FSK_GetRXBufferSize

Sample MIPS Requirements

 

Operating Mode MIPS for various modems

V.21

V.23 Forward

V.23 Backward

Bell 102

Bell 203

Caller ID

Transmit

0.72

0.88

0.67

0.72

0.88

0.88

Receive

1.24

1.44

1,18

1.24

1.44

1.44

MFR2

The multi-frequency R2 (MFR2) application is a block process with a variable block size of multiples of 12 ms (96 samples). PIKA MonteCarlo uses a default block size of 12ms.

MFR2 is a scheme of inter-register signaling (passing phone numbers). It is typically used between exchanges (registers) and is described in CCITT Q.441 and Q.442. Inter-register signaling uses forward and backward in-band multi-frequency signals to transfer called and calling party numbers, as well as the calling party category.

The MFR2 transceiver consists of an R2 signal detector and an R2 tone generator, to detect and transmit both forward and backward tone signals. An MFR2 process can act as part of an outgoing or incoming register.

This application supports part of the R2 inter-register signaling, compelled type, protocol described in the ITU standards Q.440 to Q.490, and it is compatible with all the different variants of MFR2 international implementations. Low-level signaling procedures, required to achieve a high signaling throughput are also provided on the DSP. The MonteCarlo driver and host application support the majority of the signaling procedures.

The DSP application has two modes of operation to support low-level signaling procedures:

  • Manual mode: Tone generation and detection is under the control of the host. The only action taken by the DSP application when a tone is received, is to report the detection of the tone to the host.
  • Automatic mode: This mode is used to detect or generate a sequence of tones without host interference. Therefore, it reduces the number of messages exchanged between the DSP and the host, and ensures that maximum signaling throughput is achieved as long as the host responds to messages within about 100 ms.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Flexible architecture. Should the need arise, the following are examples of parameters that can be modified by the user:
    – Tone present debounce period
    – Tone absent debounce period
    – Normalized detector noise threshold
    – Twist threshold
    – Signal total energy threshold
    – Per tone signal energy threshold
    – Tone output level
  • Operates as part of both incoming and outgoing register
  • Efficient and robust receiver design

Specifications

Parameter

Minimum Value

Maximum Value

Default

Transmit level (dBm0)

-30

-3.0

-8.5

Tone present debounce period (ms)

7 (reject), 30 (accept)

7 + 12*N (reject)
30 + 12*N (accept)

7, 30

Tone absent debounce period (ms)

7 (reject), 30 (accept)

7 + 12*N (reject)
30 + 12*N (accept)

7, 30

Normalized detector noise threshold

0.5

1.0

0.92

Twist threshold (dB)

4.0

16.0

9.0

Signal total energy threshold

-43.0

-18.0

-33.0

Per tone signal energy threshold

-48.5

-23.5

-38.5

Where:
N = 1..32766
M = 1..32765
Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

3664

182

32 (see note 1)

Sample MIPS Requirements

Operating Mode MIPS
MFR2 Transceiver idle 0.001
MFR2 tone generator active and tone detector idle 0.264
MFR2 tone detector active and tone generator idle 0.952
MFR2 tone generator and tone detector active 1.064

Note
Note 1: PIKA MonteCarlo uses a default value of 32 for the generated tone buffer.

Speech Detector

The Speech Detector application is a block process with a fixed block size of 24 ms (192 samples).

It uses the changing energy levels and absolute level of the incoming signal to detect the presence of speech. The detection state is sent to the host. A built-in tone detector is used to avoid the detection of tones as speech.

The speech detector is commonly used in applications where speech needs to be distinguished, such as:

  • Audio logging:
    – Detect the start and end of a callTurn on and off the recording during speech and silence respectively
  • Call center applications:
    – Detect the start of a call and switch from auto-dialer to an agent
    – Detect the presence of an answering machine

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx

Features

  • Flexible architecture. A number of parameters are visible to the user and if necessary, they can be altered to adjust the sensitivity of the detector. The following is a list of tradeoffs that can be adjusted:
    – Noise sensitivity and speech detection difficulty level
    – Fast tone detection and false detection due to speech.
  • Built-in tone detection to avoid false detection of tone signals

Specifications

Parameter

Min. Value

Max Value

Default

Speech detection threshold (dBm0)

-60

0

-53

Timeout duration (ms)

1

32767

0 (disabled)

Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

930

92

0

Sample MIPS Requirements

Operating Mode MIPS
Detector Idle 0.001
Detector enabled 0.1

Tone Generator

The tone generator is a block process with a fixed process block size of 192 samples, or 24 ms.

A tone generator process can be used to generate any single or dual-frequency tone. Tones are generated using a 500 sample look-up-table. Upon initialization, a set of tone templates is downloaded to the DSP. To play a tone, the index to the appropriate template is sent to the DSP.

The tone description table can hold up to 64 tone templates. Each tone template describes a single or a dual frequency tone, including the frequency and amplitude of each tone, duration of the tone(s), and the minimum duration of the ensuing silence.

The first 60 tone templates are common for all instances of the tone generator application. These would normally be loaded at initialization. This implies that any change made to one of the first 60 templates will affect all tone generator processes on a given DSP. The remaining 4 tone templates are not common and must be set for each individual process, giving a user 4 custom tone templates per tone generator process. The custom templates can be changed during run-time so that an application can load the template before playing a custom tone. Since templates can be dynamically defined during run-time, there is no effective limit to the number of custom tone templates.

Platform Support

  • PIKA MonteCarlo 6.x
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in Motorola assembly for the DSP563xx
  • This DSP application uses 16-bit arithmetic mode

Features

  • Can generate single or dual frequency tones
  • Up to 64 tone description templates are supported, 60 common and 4 custom templates
  • Up to 8 tones can be queued for playing
  • A tone can be aborted immediately or when the current tone has been played. When a tone is aborted the play list is also flushed.
  • A message is sent from the DSP to the host acknowledging that a tone has been played
  • A message is sent from the DSP to the host acknowledging that a tone has been aborted
  • Silence generation. A tone generator has a mode where it generates silence. In general this is not required on the 56303, since VPOS563 pre-fills the output buffer with silence; however, the mode exists and could be used to overwrite the output of another process with silence, provided the processes are scheduled in the right order.

Specifications

Parameter

Minimum

Maximum

Resolution

Tone(s) Duration (ms)

0

32767

1

Silence Duration (ms) (see note 1)

0

32767

1

Tone Frequency (Hz) (see note 2)

-300

-3500

1

Tone Amplitude (dBm0)

(Silence, no tone)

3.155

Linear value from 0 to 32767


Resource Requirements

Memory requirements:

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

1469

119

0

MIPS Requirements:

Operating Mode

MIPS

Idle

0.005

Enabled

0.25

Notes

Note 1: The silence duration refers to the minimum amount of silence that will be played between tones when several tones are on the play list. The silence will be longer when the play list is empty.

Note 2: The theoretical range is 0 to 4000 Hz, however the line interface will generally limit the effective range.

G.711

The G.711 application is a block process with a fixed block size of 80 samples, or 10ms.

G.711 is the international standard for encoding telephone audio on a 64 kbps channel. It is a pulse code modulation (PCM) scheme operating at an 8 kHz sample rate. This standard provides two encoding laws: A-law and µ-law. A-law G.711 PCM encoder converts 13-bit linear PCM samples into 8 bit compressed PCM samples, and the decoder does the conversion vice versa. µ-law G.711 PCM encoder converts 14 bit linear PCM samples into 8 -bit compressed PCM samples. µ-law is used in North America and Japan, while A-law is used in most of other countries.

The G.711 application consists of an encoder and a decoder. The encoder reads its input from a PCM channel, and the decoder could write to either a PCM channel or a reference channel of an echo canceller.

The G.711 process can interact with PIKA Technologies’ real-time protocol (RTP) application. It can recover missing frames by using a proprietary error concealment technique. It also can skip and insert frames for jitter buffer management. G.711 is one of the codecs included in the PIKA VoIP suite.


Platform Support

  • PIKA MonteCarlo 6.2
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written primarily in C with some Motorola assembly for the DSP563xx

Features

  • Fully compliant with ITU-T G.711 recommendation
  • Optimized implementation
  • Configurable encoder input gain and decoder output gain
  • Supported by the PIKA VoIP RTP application
  • Recovers lost or late frames using a proprietary error concealment technique
  • Supports skipping and insertion of frames
  • Zeroing of encoder input data to support out-of-band DTMF (RFC2833)

Specifications

  • 8kHz sampling rate
  • Coding rate: 64 kbps
  • Encoder input: 13-bit linear PCM for A-law, 14-bit linear PCM for Mu-law (see note 2)
  • Encoder output: A-law or µ-law encoded samples
  • Decoder input: A-law or µ-law encoded samples
  • Decoder output: 13-bit linear PCM for A-law, 14-bit linear PCM for µ-law
  • Range of input/output gain: – ~ +24dB
  • Mean Opinion Score (MOS): 4.1

Resource Requirements

Memory requirements:

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

1598

323

0 (see note 2)

MIPS Requirements

Operating Mode

MIPS

Idle

0.01

Encoder

0.35

Decorder

1.04

Encoder & Decorder

1.39

Notes

Note 1: Only the 13 or 14 most significant bits are used for 16-bit linear PCM.
Note 2: The host allocated memory is handled by other applications, such as the RTP.

G.726

The G.726 application is a block process with a fixed block size of 80 samples, or 10ms.

ITU-T G.726 Recommendation specifies speech compression and decompression at rates of 16, 24, 32 and 40 kbps based on Adaptive Differential Pulse Code Modulation (ADPCM). Annex A of the G.726 Recommendation extends the algorithm to allow use of a linear PCM interface at input and output.

The G.726 application complies with G.726 Annex A. It consists of an encoder and a decoder. The encoder takes its linear PCM input samples from a PCM channel. The encoded samples are packed into the encoder output buffer. The coding rate can be controlled by the host. The decoder reads the encoded samples from the data buffer allocated by the host, decodes them and writes the decoded samples to the output PCM channel. The host or another process can write to the data buffer to provide encoded samples for the decoder.

The G.726 process is designed to interact with PIKA Technologies’ real-time protocol (RTP) application. It can recover missing frames by using a proprietary error concealment technique. It can also skip and insert frames for jitter buffer management. G.726 is one of the codecs included in the PIKA VoIP suite.

Platform Support

  • PIKA MonteCarlo 6.2
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written in C and Motorola assembly for the DSP563xx
  • This DSP application uses 16-bit arithmetic mode

Features

  • Fully bit exact with ITU-T G.726 Annex A
  • Optimized implementation
  • Configurable coding rate
  • Configurable encoder input gain and decoder output gain
  • Recovers lost or late frames using a proprietary error concealment technique
  • Supports skipping and insertion of frames
  • Zeroing of encoder input data to support out-of-band DTMF (RFC2833)

Specifications

  • 8kHz sampling rate
  • Coding rate: 40, 32, 24, or 16 kbps
  • Encoder input: 16-bit linear PCM (see note 1)
  • Encoder output: packed ADPCM code words
  • Decoder input: packed ADPCM code words
  • Decoder output: 16-bit linear PCM
  • Range of input/output gain: – ~ +24dB
  • Mean Opinion Score (MOS): 3.85 (@32 kbps)

Resource Requirements

Memory requirements:

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

4535

454

0 (see note 2)

MIPS Requirements

Operating Mode

MIPS

Idle

0.01

Encoder

6.58

Decoder

5.84

Encoder + Decoder

12.2

Notes

Note 1: Annex A of G.726 defines a 14-bit linear PCM interface. Therefore, the encoder input is limited to 14 bits internally before encoding and the decoder output is shifted back to 16 bits before it is sent to the output PCM buffer.
Note 2: The host allocated memory is handled by other applications, such as the RTP.

RTP

The Real-Time Protocol (RTP) application is a block process with a variable block size. The process block size mast mach the block size of the codec. It is possible to change the block size after initialization.

It provides end-to-end transport functions suitable for applications transmitting interactive real-time data, such as audio. RTP is commonly used in Internet telephony applications.

RTP does not in itself guarantee real-time delivery of multimedia data. It is often encapsulated within the User Datagram Protocol (UDP). In general, UDP is an unreliable transport mechanism where packet delivery is not guaranteed, and message duplication is possible. Furthermore, UDP does not guarantee sequencing. The RTP header includes timestamps, as well as sequencing numbers. This allows the receiver to detect if there is any lost, duplicated or out of sequence packets. In addition, this information can be used to compensate for any packet delay jitter.

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

V

P

X

CCount

M

Payload type

Sequence number

Timestamp

Synchronization source (SSRC) identifier

Contributing source (CSRC) identifier

Figure 1: RTP Header Format

Figure 1 shows the RTP header fields. The first twelve bytes are present in every RTP packet. Below is a brief description of each field and its use. For more details please refer to [4]

Version (V): This field identifies the version of the RTP packet. This application supports version 2.

Padding (P): Used to indicate hether there are additional padding octets at the end of the packet, which are not part of the payload.

Extension (X): To indicate whether the header is extended by exactly one more header.

CSRC Count (Ccount): Contains the number of CSRC identifiers that follow the fixed header.

Marker (M): The marker bit is defined by the profile. For audio applications that send either no packet or comfort-noise, the marker bit is used to identify the first packet of a talk spurt.

Payload type (PT): This field identifies the format of the RTP payload and determines its interpretation by the application. It can be changed to adapt to a variation in bandwidth.

Sequence Number: The receiver can use this to restore the packet sequence or to detect lost packets. The initial value of the sequence number is a random value. It increments by one for each RTP packet transmitted.

Timestamp: The timestamp is incremented monotonically and linearly in time. The initial value of the timestamp is a random value. The resolution of the clock depends on the format of the payload. It could be used to detect different delay jitter within a single stream and compensate for it.

SSRC: This identifier is chosen randomly, with the intent that no two sources within the same RTP session will have the same SSRC identifier. This identifies the originator of the frame.

CSRC: The CSRC list identifies the contributing sources for the payload contained in the RTP packet. The number of identifiers is given by Ccount.

The RTP process interacts with other processes such as a codec, DTMF receiver and Tone Generator. The currently supported codecs are G.711 (PCMA and PCMU), G.726 (all rates). The RTP process interacts with the DTMF receiver and Tone Generator to control out of band DTMF signaling. The out of band signaling can be accomplished using the RTP protocol described in RFC 2833 [2] or via a TCP/IP based protocol managed by the host via the RTP process.

The RTP application has two modes of operation: transmitting and receiving. The transmitter creates RTP packets and the receiver terminates RTP packets. These packets may be either audio codec packets or tone packets as described in RFC 2833 [2]. An RTP process can work either as a transmitter or receiver, not both. In addition, the RTP process fulfills the requirements of the Real-Time Control Protocol (RTCP) to monitor data delivery. The transmit process provides information required to send RTCP sender reports, and the receive process provides information required to generate receiver reports.

The RTP receiver supports two modes of jitter buffer (see note 1) management:

  • Fixed delay:
    The delay is specified by the host and can change during the call. This can be used in conjunction with the host implementing its own adaptive jitter buffering algorithm.
  • Adaptive:
    The latency introduced by the jitter buffer changes dynamically with the network conditions. This is accomplished by inserting or skipping a number of frames as needed. Overall, this mechanism helps in keeping the packet losses to the desired level and ensures the delay due to the jitter buffer to be minimum.

In cases where packets are lost over the network or arrive late, the RTP receiver designates the corresponding frames in the jitter buffer as invalid and leaves it up to the voice decoder to recover them.

The RTP application is designed in accordance to the specifications described in [2], [3], [4], and [5] with the following exceptions:

  • This application does not transmit or receive RTCP packets.
  • Out of band signaling can only be used to send DTMF tones. Most other tones such as call progress tones are not designed for machine detection, and can therefore be effectively sent in band. Unrecognized Named Tone Events can be passed to the host if desired.

Platform Support

  • PIKA MonteCarlo 6.2
  • VPOS, PIKA Technologies’ proprietary voice processing operating system, which supports the PIKA AllOnBoard architecture
  • Written primarily in C with some Motorola assembly for the DSP563xx
  • This DSP application uses 16-bit arithmetic model

Features

  • Supports G.711 (PCMA and PCMU), G.729ab, G.726 (40, 32, 24 and 16 Kbps) and G.723.1
  • Supports 1ms frame resolution for G.711 and G.726 reception
  • Supports dynamic payload types
  • Supports out-of-band DTMF signaling as specified in RFC2833
  • Supports fixed delay and adaptive jitter buffer management
  • Creates data for RTCP reports
  • Disables DTMF detection while generating a digit on the receiver side in order to prevent false detection of echoed DTMF signals
  • Flexible architecture. A number of parameters are visible to the user and if necessary, they can be altered to adjust the performance of the application:
    – Jitter buffer sizePacketization rate: The number of encoder frames per RTP packet
    – Initial Latency: This parameter indicates the number of frames to be placed in the jitter buffer before starting the decoder
    – Enable/disable out-of-band DTMF signaling
    – Play tone delay: This is used to delay the generation of a DTMF tone when the RFC 2833 protocol is being used. Delaying the start of a tone ensures that tone generation does not end prematurely when an RFC 2833 RTP packet is late or lost.

Specifications

Codec
Supported

Min. RTP packet size

Max. RTP packet size

Resolution (ms)

Transmit (ms)

Receive (ms)

Transmit (ms)

Receive (ms)

Transmit (ms)

Receive (ms)

G.711

10

0

200

200

10

1

G.726

10

0

200

200

10

1

Jitter buffer size = 4sec max (limited by the 16-bit DSP word)
Resource Requirements

Memory requirements

Application Memory
(in DSP words)

Process Memory
(per process in DSP words)

Host Allocated Memory
(per process in DSP words)

4419

228

Transmitter = 41+(12+M*80)/2
Receiver = 806 + (5*N)

Where:
M = CodecFramesPerRTPPacket
N = The size of the Jitter Buffer in milliseconds.

Note: If G711 is not going to be supported, then refer to the Interface Specification documents of the codecs supported to minimize the host allocated memory.

Sample MIPS Requirements

Operating Mode

MIPS
(10ms frame size, 2 frames per RTP packet; 30 adn 1 for G723 respectively)

G711

G726

Transmit

0.24

0.22

Receive

0.63

0.72

Idle

Notes

Note 1: The jitter buffer removes the jitter in the arrival of the packets caused by the IP networks. But it does so at the cost of increase in the overall delay. There is a trade-off between the delay caused by the jitter buffer and the packet loss. A large jitter buffer causes increase in the delay and decreases the packet loss. A small jitter buffer decreases the delay but increases the packet loss. The size of the jitter buffer depends on the condition of the network.
Reference Documents

[1] H.225.0, “Call Signaling Protocols and Media Stream Packetization for Packet-Based Multimedia Communication Systems”, ITU-T, Nov. 2000.

[2] RFC 2833 draft, “RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals”, IETF May 2000.

[3] RFC 1890 draft, “RTP Profile for Audio and Video Conferences with Minimal Control”, IETF: draft-ietf-profile-nex-10.txt, Mar 2001. See http://www.ietf.org/rfc.html for most recent versions of this document. A more recent version may also exist in draft form, see http://www.ietf.org/ids.by.wg/avt.html (draft-ietf-avt-profile-new-xx).

[4] RFC 1889 draft, “RTP A Transport Protocol for Real-Time Applications”, IETF: draft-ietf-avt-rtp-new-09.txt, Mar 2001. See http://www.ietf.org/rfc.html for most recent versions of this document. A more recent version may also exist in draft form, see http://www.ietf.org/ids.by.wg/avt.html (draft-ietf-avt-rtp-new-xx).

[5] RFC 2833 draft, “RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals”, IETF May 2000.
The PIKA Plus Advantage