emFloat
The floating-point library
emFloat is an IEEE 754 compliant floating-point library designed from the ground up for embedded systems.
Overview
Developed and honed for more than two decades, emFloat is a highly optimized component of emRun, SEGGER's C runtime library, and also a part of SEGGER Embedded Studio.
Designed for plug-and-play, emFloat can replace a default floating-point library, delivering better performance with less code. Very fast and very small, it delivers FPU-like performance in pure software. Where available, it even boosts the performance of an FPU for complex mathematical functions.
It is available stand-alone, in source code form, for developers who wish to increase performance or reduce the code size of their application without replacing the entire runtime library supplied with their toolchain.
emFloat can also be licensed for inclusion in third-party IDEs. An example is Microchip choosing to include emFloat in the Microchip XC32 V4.0 Compiler Toolchain.
Benchmarking for both floating-point and runtime libraries can be done quickly and easily using Embedded Studio, which is readily available at no cost for evaluation and non-commercial usage under SEGGER’s Friendly License.
For details on why using a thoughtfully designed runtime library is important, refer to the emRun page.
Key features
- Small code size, high performance
- Plug-and-play: Can easily replace the default floating point library, delivering better performance with less code.
- Flexible licensing, for integration into user applications or toolchains.
- C-Variant can be used on any 8/16/32/64-bit CPU.
- Hand-coded, assembly-optimized variants for RISC-V and ARM
- Fully reentrant
- No heap requirements
Licensing
emFloat is available for integration into specific projects by end users, as well as to toolchain providers that want to deliver a top-of-the-line runtime and/or floating-point library to their users.
Licensing options are available to fit any such needs, usually with a single payment and no royalty obligation.
The library is delivered in source code, with optional rights for redistribution in object code form. All delivered C and assembly language source files are fully commented.
SEGGER software is not covered by an open source or required attribution license, and can be integrated into any commercial or proprietary product, without the obligation to disclose the combined source.
Variants
emFloat is available in a universal variant written in C, and specific variants for different CPUs. The specific variants include modules written in assembly language, optimized for the CPU architecture, and deliver a higher performance than the universal C variant.
Universal C Variant:
The universal version is written in C. The performance is highly optimized and much higher than the performance of comparable, C-coded open source implementations.
Supported CPUs: The universal variant can be used on any platform, including 8-, 16-, 32, and 64- bit processors.
Arm Variant:
The ARM-optimized variant is fully coded in Assembly language, conforming to the AEABI.This means it is compatible with any (A)EABI compliant tool chain, including any GCC, LLVM/Clang based tool chains as well as Arm's own compiler (incl. Keil) and IAR and can replace the default runtime library or parts of it.
Supported CPUs: The Arm Variant supports any 32-bit ARM CPU, starting from ARM Architecture V4. This includes Cortex-M, Cortex-A, and Cortex-R.
RISC-V Variant:
The RISC-V Variant is written in assembly language, providing functions compatible with the EABI. It can easily be used to replace the default runtime library of EABI compliant toolchains.
Supported CPUs: The RISC-V Variant supports RV32I and RV32E with architecture-specific acceleration. It supports faster multiply and divide with the M (multiply/divide) extension. It also supports fast division even if if the M extension lacks a divide instruction.
Silicon vendor buyout option
SEGGER offers the possibility to license emFloat for redistribution to your customers under your own terms. Please contact us to complete your offerings with a proven commercial solution.
Silicon vendors for emFloat
Implementation and design
emFloat consists of two parts:
- Arithmetic functions
implementing functionality similar to that of the FPU, such as floating-point add, subtract, multiply, divide, comparisons, and conversions
- Mathematical functions
using the most efficient, modern algorithms, benefiting systems with or without an FPU
While all mathematical functions are written in C, the arithmetic functions for the Arm and RISC-V variants are hand-coded in assembly language. For other processor architectures the library has a portable C implementation.
emFloat is optimized on both, the design level (using efficient algorithms) as well as the implementation level (running on different architectures). The source code contains options to fine-tune it for high performance or small code size or a balance of the two, delivering excellent performance in all cases.
It provides a consistent execution environment which ensures that infinities, not-a-number, and zero results with correct sign are accepted as inputs and generated as outputs. To be consistent with floating-point units executing in fast mode, the library elects to flush subnormals to a correctly-signed zero. Because subnormals do not typically occur in embedded systems, this optimization enables significant code size reduction.
Integration and use
emFloat provides all well-known floating-point-related API functions of C standard libraries, as well as floating-point-operation functions defined by the architecture's EABI which are implicitly called and added by the compiler.
The floating-point Library can either be integrated into a toolchain to replace the existing standard library implementation, or it can be used side-by-side. The side-by-side use enables selective calls to the Floating-Point Library, while retaining the toolchain's standard library, which makes the integration and use easy and simple.
Example: To return the sine of a value x:
- With the integrated use, call sin(x).
- With the side-by-side use, call SEGGER_sin(x).
To multiply two float values A and B without the use of an FPU:
- With the integrated use, call A * B, for which the compiler will implicitly call __mulsf3(A, B) or __aeabi_fmul(A, B).
- With the side-by-side use, call SEGGER_fmul(A,B)
Configuration options
emFloat is configurable for small code size or increased execution speed or a combination. Optimizing for code-size or execution-speed or a balance of both does not cause any loss of accuracy. Calculated results are identical in all modes.
In the source distribution, the library can be configured and tuned to favor faster or smaller code with different levels of optimization:
-2 - Favor size at the expense of speed
-1 - Favor size over speed
0 - Balanced
+1 - Favor speed over size
+2 - Favor speed at the expense of size
The sections below show the performance of the high-level explicit functions and the low-level implicit floating-point functions of different architectures. For more information please refer to the blog post Floating-point face-off, part 2.
Note: Due to the specific features of each architecture, the performance values should not be compared to each other. Instead these values are generated for comparison with other floating-point libraries.
Architecture-specific optimization
The assembly language variants of the emFloat take advantage of processor-specific features. Each architecture has its own fine-tuned implementation.
In the Arm variant, the floating-point support makes use of the 32-bit Arm and Thumb-2 instruction set and uses divide instructions and extended multiply instructions, when available, which results in a smaller and faster implementation. Pure Thumb instruction sets, such as on Cortex-M0 and Cortex-M23 processors, are, of course, supported entirely, too.
In the RISC-V variant, software floating-point is supported on all RV32I and RV32E architectures. The floating-point implementation takes full advantage of processors that have the M extension, and processors that have the M extension but without a divide instruction. The C extension is supported to select registers that make the best use of the compact instruction encoding, achieving smaller code.
The assembly language versions have multiple implementations of arithmetic operations, using architectural details and tailored algorithms to make the best use of the available instruction set. For instance, for single precision division:
- Use a division instruction to iteratively develop a quotient, if available.
- If no division instruction, but there is a multiplication instruction, use an initial reciprocal approximation refined by Newton-Raphson iterations with a final correction and multiplication.
- If no division and no multiplication instruction, use a non-restoring division algorithm.
Similar optimizations apply to double-precision division.
API functions — Explicit & implicit
Explicit Functions: emFloat implements all standard library functions which are usually exposed through math.h. These functions are always explicitly called by a user application.
With the integrated use of the library, the standard function can simply be called (no prefix), with the side-by-side use, the functions of this library can be called instead of the standard library implementation by adding the prefix "SEGGER_". The API interface is compatible.
Function List:
int SEGGER_isinff (float x); int SEGGER_isinf (double x); int SEGGER_isnanf (float x); int SEGGER_isnan (double x); int SEGGER_isfinitef (float x); int SEGGER_isfinite (double x); int SEGGER_isnormalf (float x); int SEGGER_isnormal (double x); int SEGGER_signbitf (float x); int SEGGER_signbit (double x); int SEGGER_classifyf (float x); int SEGGER_classify (double x); float SEGGER_cosf (float x); double SEGGER_cos (double x); float SEGGER_sinf (float x); double SEGGER_sin (double x); float SEGGER_tanf (float x); double SEGGER_tan (double x); float SEGGER_acosf (float x); double SEGGER_acos (double x); float SEGGER_asinf (float x); double SEGGER_asin (double x); float SEGGER_atanf (float x); double SEGGER_atan (double x); float SEGGER_atan2f (float y, float x); double SEGGER_atan2 (double y, double x); float SEGGER_frexpf (float x, int *exp); double SEGGER_frexp (double x, int *exp); float SEGGER_ldexpf (float x, int exp); double SEGGER_ldexp (double x, int exp); float SEGGER_scalbnf (float x, int exp); double SEGGER_scalbn (double x, int exp); float SEGGER_logf (float x); double SEGGER_log (double x); float SEGGER_log10f (float x); double SEGGER_log10 (double x); float SEGGER_fmodf (float x, float y); double SEGGER_fmod (double x, double y); float SEGGER_modff (float x, float *iptr); double SEGGER_modf (double x, double *iptr); float SEGGER_powf (float x, float y); double SEGGER_pow (double x, double y); float SEGGER_sqrtf (float x); double SEGGER_sqrt (double x); float SEGGER_cbrtf (float x); double SEGGER_cbrt (double x); float SEGGER_ceilf (float x); double SEGGER_ceil (double x); float SEGGER_fabsf (float x); double SEGGER_fabs (double x); float SEGGER_fminf (float x, float y); double SEGGER_fmin (double x, double y); float SEGGER_fmaxf (float x, float y); double SEGGER_fmax (double x, double y); float SEGGER_floorf (float x); double SEGGER_floor (double x); float SEGGER_hypotf (float x, float y); double SEGGER_hypot (double x, double y); float SEGGER_coshf (float x); double SEGGER_cosh (double x); float SEGGER_sinhf (float x); double SEGGER_sinh (double x); float SEGGER_tanhf (float x); double SEGGER_tanh (double x); float SEGGER_expf (float x); double SEGGER_exp (double x); float SEGGER_expm1f (float x); float SEGGER_acoshf (float x); double SEGGER_acosh (double x); float SEGGER_asinhf (float x); double SEGGER_asinh (double x); float SEGGER_atanhf (float x); double SEGGER_atanh (double x); float SEGGER_fmaf (float x, float y, float z); double SEGGER_fma (double x, double y, double z); float SEGGER_exp2f (float x); double SEGGER_exp2 (double x); float SEGGER_exp10f (float x); double SEGGER_exp10 (double x); float SEGGER_expm1f (float x); double SEGGER_expm1 (double x); float SEGGER_log1pf (float x); double SEGGER_log1p (double x); float SEGGER_log2f (float x); double SEGGER_log2 (double x); float SEGGER_logbf (float x); double SEGGER_logb (double x);
Implicit Functions: When there is no hardware support for basic operations, such as multiplication of two floats, the compiler adds calls to helper functions emulating the operation with available resources. These are the implicit functions, defined by the toolchain's and architecture's EABI.
With the integrated use of the specific variants of emFloat, the compiler will use its implicit functions. When used side-by-side, the functions can be explicitly called instead of writing the standard operation in code.
Function List:
float SEGGER_addf (float, float); double SEGGER_add (double, double); float SEGGER_subf (float, float); double SEGGER_sub (double, double); float SEGGER_mulf (float, float); double SEGGER_mul (double, double); float SEGGER_divf (float, float); double SEGGER_div (double, double); int SEGGER_ltf (float, float); int SEGGER_lt (double, double); int SEGGER_lef (float, float); int SEGGER_le (double, double); int SEGGER_gtf (float, float); int SEGGER_gt (double, double); int SEGGER_gef (float, float); int SEGGER_ge (double, double); int SEGGER_eqf (float, float); int SEGGER_eq (double, double); int SEGGER_nef (float, float); int SEGGER_ne (double, double); float SEGGER_float_int (int); double SEGGER_double_int (int); float SEGGER_float_llong (long long); double SEGGER_double_llong (long long); float SEGGER_float_uint (unsigned); double SEGGER_double_uint (unsigned); float SEGGER_float_ullong (unsigned long long); double SEGGER_double_ullong (unsigned long long); int SEGGER_int_float (float); int SEGGER_int_double (double); long long SEGGER_llong_float (float); long long SEGGER_llong_double (double); unsigned SEGGER_uint_float (float); unsigned SEGGER_uint_double (double); unsigned long long SEGGER_ullong_float (float); unsigned long long SEGGER_ullong_double (double); float SEGGER_float_double (double); double SEGGER_double_float (float);
Explicit function performance
For verification and benchmark of the explicit functions, the IEEE-754 Floating-point Library Benchmark application is available. It measures performance and precision of the implementation. For each function, significant values have been chosen to get best coverage.
The tables below show the results of the benchmark application running the C implementation on different architectures.
Performance on Arm: The benchmark has been done on an Arm Cortex-M4 microcontroller (NXP K66FN2M0), running from RAM. Detailed results and test cases are available on the SEGGER Wiki.
sinf() | Bit Error | Cycles |
---|---|---|
sin(1e-4) | 0.00 | 21 |
sin(1e-3) | 0.00 | 55 |
sin(1e-2) | 0.00 | 55 |
sin(1e-1) | 0.00 | 54 |
sin(1) | 0.00 | 139 |
sin(1.47264147) | 0.00 | 138 |
sin(1.57079089) | 0.00 | 138 |
sinf(3.14158154) | 0.00 | 106 |
sin(39.0735703) | 0.00 | 148 |
sin(355) | 0.00 | 152 |
sin(1048582.75) | 0.00 | 176 |
sin(100000*Pi) | 0.00 | 151 |
sin(1e10) | 0.00 | 187 |
sin(1e38) | 0.00 | 186 |
Total | 0.00 | 1706 |
cosf() | Bit Error | Cycles |
---|---|---|
cos(1e-4) | 0.00 | 3 |
cos(1e-3) | 0.00 | 48 |
cos(1e-2) | 0.00 | 48 |
cos(1e-1) | 0.00 | 48 |
cos(1) | 0.00 | 136 |
cos(1.47264147) | 0.00 | 103 |
cos(1.57079780) | 0.00 | 103 |
cos(6.28319073) | 0.00 | 136 |
cos(355) | 0.00 | 180 |
cos(100000*Pi) | 0.00 | 180 |
cos(1e10) | 0.00 | 183 |
cos(1e38) | 0.00 | 182 |
Total | 0.00 | 1350 |
tanf() | Bit Error | Cycles |
---|---|---|
tan(1e-4) | 0.00 | 25 |
tan(1e-3) | 0.00 | 74 |
tan(1e-2) | 0.00 | 74 |
tan(1e-1) | 0.00 | 73 |
tan(1) | 0.00 | 258 |
tan(6.45840693) | 0.00 | 258 |
tan(355) | 0.00 | 282 |
tan(100000*Pi) | 0.00 | 273 |
tan(1e10) | 0.00 | 304 |
tan(1e38) | 0.00 | 321 |
Total | 0.00 | 1942 |
expf() | Bit Error | Cycles |
---|---|---|
expf(0) | 0.00 | 3 |
expf(1e-5) | 0.00 | 44 |
expf(1e-4) | 0.00 | 44 |
expf(2e-4) | 0.00 | 44 |
expf(4e-4) | 0.00 | 43 |
expf(4.5e-4) | 0.00 | 44 |
expf(1e-3) | 0.00 | 44 |
expf(0.25123) | 0.00 | 81 |
expf(0.55123) | 0.00 | 80 |
expf(8.1) | 0.00 | 81 |
expf(16.1) | 0.00 | 81 |
Total | 0.00 | 589 |
sinhf() | Bit Error | Cycles |
---|---|---|
sinhf(1e-5) | 0.00 | 22 |
sinhf(1e-4) | 0.00 | 23 |
sinhf(2e-4) | 0.00 | 23 |
sinhf(4e-4) | 0.00 | 60 |
sinhf(4.5e-4) | 0.00 | 59 |
sinhf(1e-3) | 0.00 | 60 |
sinhf(0.25123) | 0.00 | 60 |
sinhf(0.55123) | 0.00 | 119 |
sinhf(8.1) | 0.00 | 121 |
sinhf(16.1) | 0.00 | 108 |
Total | 0.00 | 655 |
coshf() | Bit Error | Cycles |
---|---|---|
coshf(1e-5) | 0.00 | 28 |
coshf(1e-4) | 0.00 | 28 |
coshf(2e-4) | 0.00 | 29 |
coshf(4e-4) | 0.00 | 48 |
coshf(4.5e-4) | 0.00 | 48 |
coshf(1e-3) | 0.00 | 47 |
coshf(0.25123) | 0.00 | 48 |
coshf(0.55123) | 0.00 | 111 |
coshf(8.1) | 0.00 | 114 |
coshf(16.1) | 0.00 | 100 |
Total | 0.00 | 601 |
tanhf() | Bit Error | Cycles |
---|---|---|
tanhf(0.25) | 0.00 | 66 |
tanhf(1) | 0.00 | 108 |
tanhf(10) | 0.00 | 18 |
Total | 0.00 | 192 |
logf() | Bit Error | Cycles |
---|---|---|
logf(1e-5) | 0.00 | 158 |
logf(1024) | 0.00 | 100 |
logf(4177.25) | 0.00 | 140 |
Total | 0.00 | 398 |
Performance on RISC-V: The benchmark has been done on an RISC-V RV32IMAC microcontroller (GigaDevice GD32VF103), running from RAM. Detailed results and test cases are available on the SEGGER Wiki.
sinf() | Bit Error | Cycles |
---|---|---|
sin(1e-4) | 0.00 | 8 |
sin(1e-3) | 0.00 | 70 |
sin(1e-2) | 0.00 | 67 |
sin(1e-1) | 0.00 | 67 |
sin(1) | 0.00 | 182 |
sin(1.47264147) | 0.00 | 193 |
sin(1.57079089) | 0.00 | 196 |
sinf(3.14158154) | 0.00 | 153 |
sin(39.0735703) | 0.00 | 193 |
sin(355) | 0.00 | 219 |
sin(1048582.75) | 0.00 | 236 |
sin(100000*Pi) | 0.00 | 214 |
sin(1e10) | 0.00 | 255 |
sin(1e38) | 0.00 | 248 |
Total | 0.00 | 2301 |
cosf() | Bit Error | Cycles |
---|---|---|
cos(1e-4) | 0.00 | 10 |
cos(1e-3) | 0.00 | 50 |
cos(1e-2) | 0.00 | 43 |
cos(1e-1) | 0.00 | 43 |
cos(1) | 0.00 | 186 |
cos(1.47264147) | 0.00 | 158 |
cos(1.57079780) | 0.00 | 161 |
cos(6.28319073) | 0.00 | 190 |
cos(355) | 0.00 | 252 |
cos(100000*Pi) | 0.00 | 251 |
cos(1e10) | 0.00 | 245 |
cos(1e38) | 0.00 | 257 |
Total | 0.00 | 1846 |
tanf() | Bit Error | Cycles |
---|---|---|
tan(1e-4) | 0.00 | 7 |
tan(1e-3) | 0.00 | 92 |
tan(1e-2) | 0.00 | 87 |
tan(1e-1) | 0.00 | 86 |
tan(1) | 0.00 | 403 |
tan(6.45840693) | 0.00 | 397 |
tan(355) | 0.00 | 444 |
tan(100000*Pi) | 0.00 | 430 |
tan(1e10) | 0.00 | 458 |
tan(1e38) | 0.00 | 483 |
Total | 0.00 | 2887 |
expf() | Bit Error | Cycles |
---|---|---|
expf(0) | 0.00 | 10 |
expf(1e-5) | 0.00 | 45 |
expf(1e-4) | 0.00 | 41 |
expf(2e-4) | 0.00 | 38 |
expf(4e-4) | 0.00 | 38 |
expf(4.5e-4) | 0.00 | 38 |
expf(1e-3) | 0.00 | 38 |
expf(0.25123) | 0.00 | 86 |
expf(0.55123) | 0.00 | 89 |
expf(8.1) | 0.00 | 88 |
expf(16.1) | 0.00 | 86 |
Total | 0.00 | 597 |
sinhf() | Bit Error | Cycles |
---|---|---|
sinhf(1e-5) | 0.00 | 14 |
sinhf(1e-4) | 0.00 | 14 |
sinhf(2e-4) | 0.00 | 13 |
sinhf(4e-4) | 0.00 | 67 |
sinhf(4.5e-4) | 0.00 | 59 |
sinhf(1e-3) | 0.00 | 59 |
sinhf(0.25123) | 0.00 | 59 |
sinhf(0.55123) | 0.00 | 137 |
sinhf(8.1) | 0.00 | 133 |
sinhf(16.1) | 0.00 | 112 |
Total | 0.00 | 667 |
coshf() | Bit Error | Cycles |
---|---|---|
coshf(1e-5) | 0.00 | 26 |
coshf(1e-4) | 0.00 | 24 |
coshf(2e-4) | 0.00 | 24 |
coshf(4e-4) | 0.00 | 50 |
coshf(4.5e-4) | 0.00 | 50 |
coshf(1e-3) | 0.00 | 50 |
coshf(0.25123) | 0.00 | 50 |
coshf(0.55123) | 0.00 | 140 |
coshf(8.1) | 0.00 | 139 |
coshf(16.1) | 0.00 | 126 |
Total | 0.00 | 679 |
tanhf() | Bit Error | Cycles |
---|---|---|
tanhf(0.25) | 0.00 | 89 |
tanhf(1) | 0.00 | 145 |
tanhf(10) | 0.00 | 14 |
Total | 0.00 | 248 |
logf() | Bit Error | Cycles |
---|---|---|
logf(1e-5) | 0.00 | 265 |
logf(1024) | 0.00 | 183 |
logf(4177.25) | 0.00 | 240 |
Total | 0.00 | 688 |
Implicit function performance
The following tables show the performance and code size of the Arm and RISC-V EABI floating-point functions.
The performance benchmark runs the speed-optimized implementation of the floating-point library (__SEGGER_RTL_OPTIMIZE +2).
The code size has been measured with size optimization (__SEGGER_RTL_OPTIMIZE -2). The speed-optimized configuration requires slightly more code.
Performance on Arm: The benchmarks have been done on an Arm Cortex-M4 microcontroller (NXP K66FN2M0), running from RAM, compiled with Embedded Studio (GCC).
Function | Average Cycles | |
---|---|---|
Float, Math | __aeabi_fadd | 31.0 |
__aeabi_fsub | 39.9 | |
__aeabi_frsub | 39.9 | |
__aeabi_fmul | 26.0 | |
__aeabi_fdiv | 53.0 | |
Float, Compare | __aeabi_fcmplt | 13.0 |
__aeabi_fcmple | 13.0 | |
__aeabi_fcmpgt | 13.0 | |
__aeabi_fcmpge | 13.0 | |
__aeabi_fcmpeq | 7.0 | |
Double, Math | __aeabi_dadd | 54.5 |
__aeabi_dsub | 71.2 | |
__aeabi_drsub | 71.2 | |
__aeabi_dmul | 56.4 | |
__aeabi_ddiv | 134.0 | |
Double, Compare | __aeabi_dcmplt | 14.0 |
__aeabi_dcmple | 14.0 | |
__aeabi_dcmpgt | 14.0 | |
__aeabi_dcmpge | 14.0 | |
__aeabi_dcmpeq | 14.0 | |
Float, Conversion | __aeabi_f2iz | 9.0 |
__aeabi_f2uiz | 6.0 | |
__aeabi_f2lz | 13.5 | |
__aeabi_f2ulz | 12.0 | |
__aeabi_i2f | 10.5 | |
__aeabi_ui2f | 7.5 | |
__aeabi_l2f | 19.0 | |
__aeabi_ul2f | 13.8 | |
__aeabi_f2d | 9.0 | |
Double, Conversion | __aeabi_d2iz | 10.0 |
__aeabi_d2uiz | 8.0 | |
__aeabi_d2lz | 16.5 | |
__aeabi_d2ulz | 13.5 | |
__aeabi_i2d | 12.0 | |
__aeabi_ui2d | 8.0 | |
__aeabi_l2d | 17.9 | |
__aeabi_ul2d | 12.9 | |
__aeabi_d2f | 11.0 |
EABI function performance on RISC-V
The benchmarks have been done on a GD32VD107 (RV32IMAC), running from Flash, compiled with Embedded Studio (GCC), optimized for speed.
Function | Cycles, Min | Cycles, Max | Cycles, Avg | |
---|---|---|---|---|
Float, Math | __addsf3 | 45 | 60 | 49.5 |
__subsf3 | 42 | 84 | 62.2 | |
__mulsf3 | 37 | 57 | 39.3 | |
__divsf3 | 67 | 70 | 67.0 | |
Float, Compare | __ltsf2 | 11 | 15 | 11.0 |
__lesf2 | 10 | 14 | 10.0 | |
__gtsf2 | 10 | 17 | 10.0 | |
__gesf2 | 11 | 14 | 11.0 | |
__eqsf2 | 10 | 13 | 10.0 | |
__nesf2 | 10 | 10 | 10.0 | |
Double, Math | __adddf3 | 52 | 89 | 62.8 |
__subdf3 | 60 | 123 | 82.8 | |
__muldf3 | 68 | 88 | 75.0 | |
__divdf3 | 192 | 204 | 197.2 | |
Double, Compare | __ltdf2 | 15 | 20 | 16.0 |
__ledf2 | 15 | 19 | 16.0 | |
__gtdf2 | 15 | 20 | 16.1 | |
__gedf2 | 15 | 19 | 16.1 | |
__eqdf2 | 14 | 17 | 14.0 | |
__nedf2 | 14 | 14 | 14.0 | |
Float, Conversion | __fixsfsi | 14 | 14 | 14.0 |
__fixunssfsi | 13 | 13 | 13.0 | |
__fixsfdi | 20 | 29 | 23.2 | |
__fixunssfdi | 15 | 23 | 18.9 | |
__floatsisf | 28 | 47 | 32.6 | |
__floatunsisf | 28 | 42 | 33.0 | |
__floatdisf | 39 | 66 | 49.1 | |
__floatundisf | 35 | 58 | 44.1 | |
__extendsfdf2 | 14 | 18 | 14.1 | |
Double, Conversion | __fixdfsi | 9 | 20 | 16.8 |
__fixunsdfsi | 9 | 14 | 13.8 | |
__fixdfdi | 9 | 34 | 26.9 | |
__fixunsdfdi | 9 | 25 | 21.5 | |
__floatsidf | 28 | 47 | 31.6 | |
__floatunsidf | 19 | 32 | 23.9 | |
__floatdidf | 30 | 73 | 45.1 | |
__floatundidf | 27 | 62 | 39.3 | |
__truncdfsf2 | 25 | 36 | 25.1 |
EABI function code size on RISC-V
For function code size, the floating-point library has been compiled with optimization for size, targeting RV32IMC.
Function | Code Size [Bytes] | |
---|---|---|
Float, Math | __addsf3 | 410 |
__subsf3 | 10 | |
__mulsf3 | 178 | |
__divsf3 | 184 | |
Float, Compare | __ltsf2 | 58 |
__lesf2 | 54 | |
__gtsf2 | 50 | |
__gesf2 | 62 | |
__eqsf2 | 44 | |
__nesf2 | -- | |
Double, Math | __adddf3 | 724 |
__subdf3 | 10 | |
__muldf3 | 286 | |
__divdf3 | 278 | |
Double, Compare | __ltdf2 | 70 |
__ledf2 | 70 | |
__gtdf2 | 70 | |
__gedf2 | 70 | |
__eqdf2 | 52 | |
__nedf2 | -- | |
Float, Conversion | __fixsfsi | 74 |
__fixunssfsi | 50 | |
__fixsfdi | 146 | |
__fixunssfdi | 98 | |
__floatsisf | 66 | |
__floatunsisf | 52 | |
__floatdisf | 96 | |
__floatundisf | 70 | |
__extendsfdf2 | 64 | |
Double, Conversion | __fixdfsi | 84 |
__fixunsdfsi | 54 | |
__fixdfdi | 146 | |
__fixunsdfdi | 96 | |
__floatsidf | 46 | |
__floatunsidf | 34 | |
__floatdidf | 128 | |
__floatundidf | 106 | |
__truncdfsf2 | 130 |
Notes: __subsf3 tail-calls __addsf3, __subdf3 tail-calls __adddf3. __nesf2 is an alias of __eqsf2, __nedf2 is an alias of __eqdf2.
Size comparison
To demonstrate how competitive the floating-point library is from a size perspective, a level -playing-field benchmark is available. Quite simply, it calls a selection of the explicit floating-point library functions from main().
For the Arm variant and similarly for RISC-V, a single application and startup code can successfully link against multiple vendor-provided runtime systems thanks to all compilers conforming to the core-specific EABI. This simplifies the swapping of libraries both in the benchmark and in other projects.
The benchmark project uses exactly the same object modules with different runtimes for each vendor. The project uses Embedded Studio and a standard project template to build for different architectures.
The Arm applications are built with the SEGGER Compiler and the SEGGER Linker. The RISC-V applications use the GNU tools which are included in Embedded Studio.
#include "math.h"
volatile float vf;
volatile double vd;
int main(void) {
float f;
double d;
//
f = vf;
//
f = sinf(f);
f = cosf(f);
f = tanf(f);
f = asinf(f);
f = acosf(f);
f = atanf(f);
f = sinhf(f);
f = coshf(f);
f = tanhf(f);
f = asinhf(f);
f = acoshf(f);
f = atanhf(f);
vf = f;
//
d = vd;
d = sin(d);
d = cos(d);
d = tan(d);
d = asin(d);
d = acos(d);
d = atan(d);
d = sinh(d);
d = cosh(d);
d = tanh(d);
d = asinh(d);
d = acosh(d);
d = atanh(d);
vd = d;
//
return 0;
}
Size comparison on Arm
emFloat has been tested against:
- IAR Embedded Workbench 8.50
- GNU Arm Embedded 9-2020-q2-update
- Arm Compiler 6.14, standard Arm libraries (flush-to-zero mode)
- Arm Compiler 6.14, MicroLib (non-conforming IEEE implementation)
- TI Code Composer 20.2.1 LTS
The table shows the results for ARMv7M with full software floating point.
Library | ROM Usage | |
---|---|---|
SEGGER | 10,628 bytes | |
IAR | 17,656 bytes | |
AC6 MicroLib | 18,668 bytes | |
AC6 | 21,514 bytes | |
GNU | 33,809 bytes | |
CCS | 34,274 bytes |
The overhead from the benchmark application is 306 bytes Flash.
Size comparison on RISC-V
For RISC-V emFloat has been tested against:
- standard 2019-08-gcc-8.3.0 toolset (maintained by SiFive)
These are the results for RV32IMC.
Library | ROM Usage | |
---|---|---|
SEGGER | 12,644 bytes | |
GNU | 47,176 bytes |