Let’s begin with a quite simple instance and construct it up.

## Instance-1: Symmetric uint8 quantization

Let’s say we want to map the floating level vary [0.0 .. 1000.0] to the quantized vary [0 .. 255]. The vary [0 .. 255] is the set of values that may slot in an unsigned 8-bit integer.

To carry out this transformation, we need to rescale the floating level vary in order that the next is true:

Floating level 0.0 = Quantized 0

Floating level 1000.0 = Quantized 255

That is referred to as symmetric quantization as a result of the floating level 0.0 is quantized 0.

Therefore, we outline a scale, which is the same as

The place,

On this case, scale = 3.9215

To transform from a floating level worth to a quantized worth, we will merely divide the floating level worth by the dimensions. For instance, the floating level worth 500.0 corresponds to the quantized worth

On this easy instance, the 0.0 of the floating level vary maps precisely to the 0 within the quantized vary. That is referred to as symmetric quantization. Let’s see what occurs when this isn’t the case.

## Instance-2: Affine uint8 quantization

Let’s say we want to map the floating level vary [-20.0 .. 1000.0] to the quantized vary [0 .. 255].

On this case, we’ve a unique scaling issue since our *xmin* is totally different.

Let’s see what the floating level quantity 0.0 is represented by within the quantized vary if we apply the scaling issue to 0.0

Properly, this doesn’t fairly appear proper since, based on the diagram above, we might have anticipated the floating level worth -20.0 to map to the quantized worth 0.

That is the place the idea of zero-point is available in. **The zero-point acts as a bias for shifting the scaled floating level worth and corresponds to the worth within the quantized vary that represents the floating level worth 0.0.** In our case, the zero level is the destructive of the scaled floating level illustration of -20.0, which is -(-5) = 5. The zero level is all the time the destructive of the illustration of the minimal floating level worth for the reason that minimal will all the time be destructive or zero. We’ll discover out extra about why that is the case within the part that explains instance 4.

Each time we quantize a price, we’ll all the time add the zero-point to this scaled worth to get the precise quantized worth within the legitimate quantization vary. In case we want to quantize the worth -20.0, we compute it because the scaled worth of -20.0 plus the zero-point, which is -5 + 5 = 0. Therefore, quantized(-20.0, scale=4, zp=5) = 0.

## Instance-3: Affine int8 quantization

What occurs if our quantized vary is a signed 8-bit integer as a substitute of an unsigned 8-bit integer? Properly, the vary is now [-128 .. 127].

On this case, -20.0 within the float vary maps to -128 within the quantized vary, and 1000.0 within the float vary maps to 127 within the quantized vary.

The way in which we calculate zero level is that we compute it as if the quantized vary is [0 .. 255] after which offset it with -128, so the zero level within the new vary is

Therefore, the zero-point for the brand new vary is -123.

Up to now, we’ve checked out examples the place the floating level vary consists of the worth 0.0. Within the subsequent set of examples, we’ll check out what occurs when the floating level vary doesn’t embody the worth 0.0

## The significance of 0.0

Why is it necessary for the floating level worth 0.0 to be represented within the floating level vary?

When utilizing a padded convolution, we count on the border pixels to be padded utilizing the worth 0.0 in the commonest case. Therefore, it’s necessary for 0.0 to be represented within the floating level vary. Equally, if the worth X goes for use for padding in your community, you’ll want to ensure that the worth X is represented within the floating level vary and that quantization is conscious of this.

## Instance-4: The untold story — skewed floating level vary

Now, let’s check out what occurs if 0.0 isn’t a part of the floating level vary.

On this instance, we’re making an attempt to quantize the floating level vary [40.0 .. 1000.0] into the quantized vary [0 .. 255].

Since we will’t characterize the worth 0.0 within the floating level vary, we have to prolong the decrease restrict of the vary to 0.0.

We will see that some a part of the quantized vary is wasted. To find out how a lot, let’s compute the quantized worth that the floating level worth 40.0 maps to.

Therefore, we’re losing the vary [0 .. 9] within the quantized vary, which is about 3.92% of the vary. This might considerably have an effect on the mannequin’s accuracy post-quantization.

This skewing is important if we want to ensure that the worth 0.0 within the floating level vary may be represented within the quantized vary.

One more reason for together with the worth 0.0 within the floating level vary is that effectively evaluating a quantized worth to verify if it’s 0.0 within the floating level vary may be very beneficial. Consider operators equivalent to ReLU, which clip all values under 0.0 within the floating level vary to 0.0.

It can be crucial for us to have the ability to **characterize the zero-point utilizing the identical information kind** (signed or unsigned int8) **because the quantized values**. This allows us to carry out these comparisons rapidly and effectively.

Subsequent, let’s check out how activation normalization helps with mannequin quantization. We’ll particularly concentrate on how the standardization of the activation values permits us to make use of your entire quantized vary successfully.