# Decoders

## DecodeStream[[tokenizers.decoders.DecodeStream]]
#### tokenizers.decoders.DecodeStream[[tokenizers.decoders.DecodeStream]]

Class needed for streaming decode

steptokenizers.decoders.DecodeStream.step[{"name": "tokenizer", "val": ""}, {"name": "id", "val": ""}]- **tokenizer** ([Tokenizer](/docs/tokenizers/v0.22.2/en/api/tokenizer#tokenizers.Tokenizer)) --
  The tokenizer to use for decoding0`Optional[str]`The next decoded string chunk, or None if not enough
tokens have been provided yet.
Streaming decode step

id (`int` or *List[int]*):
The next token id or list of token ids to add to the stream

**Parameters:**

tokenizer ([Tokenizer](/docs/tokenizers/v0.22.2/en/api/tokenizer#tokenizers.Tokenizer)) : The tokenizer to use for decoding

**Returns:**

``Optional[str]``

The next decoded string chunk, or None if not enough
tokens have been provided yet.

## BPEDecoder[[tokenizers.decoders.BPEDecoder]]

#### tokenizers.decoders.BPEDecoder[[tokenizers.decoders.BPEDecoder]]

BPEDecoder Decoder

**Parameters:**

suffix (`str`, *optional*, defaults to `&amp;lt;/w>`) : The suffix that was used to characterize an end-of-word. This suffix will be replaced by whitespaces during the decoding

## ByteFallback[[tokenizers.decoders.ByteFallback]]

#### tokenizers.decoders.ByteFallback[[tokenizers.decoders.ByteFallback]]

ByteFallback Decoder
ByteFallback is a simple trick which converts tokens looking like ``
to pure bytes, and attempts to make them into a string. If the tokens
cannot be decoded you will get � instead for each inconvertible byte token

## ByteLevel[[tokenizers.decoders.ByteLevel]]

#### tokenizers.decoders.ByteLevel[[tokenizers.decoders.ByteLevel]]

ByteLevel Decoder

This decoder is to be used in tandem with the [ByteLevel](/docs/tokenizers/v0.22.2/en/api/pre-tokenizers#tokenizers.pre_tokenizers.ByteLevel)
[PreTokenizer](/docs/tokenizers/v0.22.2/en/api/pre-tokenizers#tokenizers.pre_tokenizers.PreTokenizer).

## CTC[[tokenizers.decoders.CTC]]

#### tokenizers.decoders.CTC[[tokenizers.decoders.CTC]]

CTC Decoder

**Parameters:**

pad_token (`str`, *optional*, defaults to `&amp;lt;pad>`) : The pad token used by CTC to delimit a new token.

word_delimiter_token (`str`, *optional*, defaults to `|`) : The word delimiter token. It will be replaced by a &amp;lt;space>

cleanup (`bool`, *optional*, defaults to `True`) : Whether to cleanup some tokenization artifacts. Mainly spaces before punctuation, and some abbreviated english forms.

## Fuse[[tokenizers.decoders.Fuse]]

#### tokenizers.decoders.Fuse[[tokenizers.decoders.Fuse]]

Fuse Decoder
Fuse simply fuses every token into a single string.
This is the last step of decoding, this decoder exists only if
there is need to add other decoders *after* the fusion

## Metaspace[[tokenizers.decoders.Metaspace]]

#### tokenizers.decoders.Metaspace[[tokenizers.decoders.Metaspace]]

Metaspace Decoder

**Parameters:**

replacement (`str`, *optional*, defaults to `▁`) : The replacement character. Must be exactly one character. By default we use the *▁* (U+2581) meta symbol (Same as in SentencePiece). 

prepend_scheme (`str`, *optional*, defaults to `"always"`) : Whether to add a space to the first word if there isn't already one. This lets us treat *hello* exactly like *say hello*. Choices: "always", "never", "first". First means the space is only added on the first token (relevant when special tokens are used or other pre_tokenizer are used).

## Replace[[tokenizers.decoders.Replace]]

#### tokenizers.decoders.Replace[[tokenizers.decoders.Replace]]

Replace Decoder

This decoder is to be used in tandem with the `~tokenizers.pre_tokenizers.Replace`
[PreTokenizer](/docs/tokenizers/v0.22.2/en/api/pre-tokenizers#tokenizers.pre_tokenizers.PreTokenizer).

## Sequence[[tokenizers.decoders.Sequence]]

#### tokenizers.decoders.Sequence[[tokenizers.decoders.Sequence]]

Sequence Decoder

**Parameters:**

decoders (`List[Decoder]`) : The decoders that need to be chained

## Strip[[tokenizers.decoders.Strip]]

#### tokenizers.decoders.Strip[[tokenizers.decoders.Strip]]

Strip normalizer
Strips n left characters of each token, or n right characters of each token

## WordPiece[[tokenizers.decoders.WordPiece]]

#### tokenizers.decoders.WordPiece[[tokenizers.decoders.WordPiece]]

WordPiece Decoder

**Parameters:**

prefix (`str`, *optional*, defaults to `##`) : The prefix to use for subwords that are not a beginning-of-word 

cleanup (`bool`, *optional*, defaults to `True`) : Whether to cleanup some tokenization artifacts. Mainly spaces before punctuation, and some abbreviated english forms.

The Rust API Reference is available directly on the [Docs.rs](https://docs.rs/tokenizers/latest/tokenizers/) website.

The node API has not been documented yet.

