Binary Floating Point-8 Bit (bf8) Specification

Overview

The bf8 format is designed as a compact floating-point representation technique. It aims to provide a balance between range, precision, computational requirements, making it ideal for applications with limited memory or bandwidth.

Bit Allocation

bf8 format represents a floating-point number using a total of 8 bits:

Representation

A bf8 number N can be represented using formula: N = (-1)^sign * 1.fraction * 2^(exponent – bias)

Special Numbers

Denormalized Numbers

When the exponent field is all-zero (0), the number is to be interpreted in a denormalized form: Ndenormalized = (-1) ^ sign * 0.fraction * 2^(1 – bias)

Rounding

Rounding should adhere to the IEEE 754 standard, including the round-to-nearest-even rule (banker’s rounding).

Operations

The basic operations for addition, subtraction, multiplication, and division should conform to the IEEE 754 standard rules, which include handling overflows, underflows, and exceptions accordingly.

Implementation

This specification should be implemented in a low-level language to interface with the hardware instruction sets. It should also include functionalities for handling compound operations such as fused multiply-add.

Application

bf8 format is ideal for memory constrained applications or architectures, including embedded devices, IoT devices, and machine learning applications.

Disclaimer

The bf8 format can lack precision due to its reduced number of bits and should therefore not be used for applications that require high precision.