Floating Point Multiplication and Division Without Hardware Support

» Skip to the Extras

I began my career as a hardware engineer many years ago, but then spent 12 years writing Java software for large commercial applications; far removed from the underlying hardware on which the applications run. Only after reading all the back issues of Nuts & Volts and MAKE magazines available in my local library, have I had the urge to get back to the basics and write some software that runs directly on top of the hardware. The hardware, in this case, being small microprocessors or microcontrollers. I could tell from reading the various articles the authors were having fun experimenting with the small computers and controlling just about anything they could think of. It had been a long time since I had designed hardware or programmed such small computers. You could say I have come full circle.

The event that finally pushed me over the edge and down this slippery slope was an ad for a microcontroller development kit from Texas Instruments (TI) called the eZ430-F2013 that could be had for $20 plus shipping. At first I didn’t believe it was possible. I thought the ad was a misprint, but it wasn’t. That did it. I had to have one. It turns out this development kit is an incredible deal, perfect for experimentation with what turns out to be the “world’s lowest power” microcontroller family. A $20 development kit does truly bring microcontroller programming and experimentation to the masses. See the sidebar for more information about the eZ430-F2013 development kit.

Then the only question was, “What will I do with it?” It turns out the answer to this question came about as fast as the development kit came in the mail. For years, I have wanted to design a state-of-the-art color organ similar to the ones I built in my younger days, but this time do it to the extent possible in the digital domain (instead of the analog).

For those not familiar with what a color organ is, it is a device that splits music up into numerous frequency bands and modulates colored lights according to the musical content. Typical color organs are three or four channels with one color of light associated with each frequency band. With a color organ, you can see the music, as well as hear it.

Further fueling the desire for a state-of-the-art color organ was the availability of inexpensive super bright LEDs. The color organs I built in the past used incandescent lights which introduced a time lag into the musical response as the light’s filaments had to heat up and cool down. If LEDs were used instead, not only would this time lag be eliminated, but due to the life expectancy of today’s LEDs, you would probably never have to replace one.

So, I had my development kit in hand which included a bunch of documentation, a single target board, an emulator, C compiler, assembler, linker, and debugger and I had my idea of building a digital color organ. Could my grandiose idea be implemented on such a small microcontroller? My initial answer to this question was a definite maybe. Take a look at Tables 1 and 2 and let’s compare the capabilities of the MSP430-F2012 microcontroller to the requirements of my application and see.

Attribute	Description
Clock Speed	Up to 16 MHz with many instructions executing in one clock cycle.
Flash Memory	2K bytes
RAM	128 bytes
ADC	A 10 bit ADC is onboard with built-in voltage references.
PWM	TimerA can be configured for pulse width modulation that can be routed to selected pins of the device.
Timers	TimerA and a watchdog timer are onboard. The watchdog timer can be used as an interval timer if the watchdog function isn’t required by the application.
Power	Claimed to be the world’s lowest power device.
I/O Capabilities	A single eight bit port is available with individual control over direction and function of each pin in the port.
Hardware Multiplication	No

TABLE 1. MSP430-F2012 Capabilities.

Attribute	Description
At least four channels of frequency selective lighting control	This would require at least four digital filters to achieve. The filters would need to be at least third order for adequate separation between channels. Digital filters require floating point arithmetic for their implementation.
Noise gate	A noise gate will be required to eliminate room and/or circuit noise in the absence of musical material.
AGC	An automatic gain control (AGC) is needed to allow the color organ to adjust itself to changes in musical content. I didn’t want any user controls being necessary.
Audio inputs	I wanted a built-in microphone with preamp in addition to a stereo line input for direct connection of the color organ to any sound source. An analog-to-digital (ADC) converter is a requirement of the microcontroller used. The ADC converts the mic or line analog signals into digital samples for processing by the microcontroller.
Output lighting	I wanted to drive at least 10 super bright LEDs per channel for the size display box I had in mind. This equates to about 200 mA output drive capability per channel. The application required PWM capabilities so the brightness of the LEDs in each channel could be controlled independently.
Internal timing control	Multiple hardware timers would be required for control of functions internal to the color organ. This includes display sequencing and PWM generation.

TABLE 2. Application Requirements.

After contrasting the controller’s capabilities with my application’s requirements, it was immediately apparent that if all of the required functionality were going to fit in the Flash memory, the overhead imposed by use of a high level language (HLL) could not be tolerated. This was true both in terms of the space required by a runtime library and/or for the performance hit of an interpreted language like some Basics. If this color organ was going to work, I would need to code in assembly language and do so as efficiently as possible.

The first hurdle I had to face was that digital filters of the IIR (infinite impulse response) variety are typically implemented using difference equations of the form:

y[n] = A * y[n-1] + B * y[n-2]

where y is a three element array of output history and A and B are floating point filter coefficients that are determined from the type and response of the filter being implemented. To use such filters, I needed to be able to multiply floating point numbers on a microcontroller that doesn’t even know how to multiply integers. Floating point arithmetic is something one takes for granted when using a HLL, but as I mentioned this was not an option here.

Horner’s Method to the Rescue

In my investigation of how floating point arithmetic might be done, I stumbled across a TI application note (see Resources) that described Horner’s method for floating point multiplication and division. Horner’s method provides reasonably accurate results while only requiring shift/rotate and add instructions; something the MSP430 controller family has and does very well in a single clock cycle.

The remainder of this article describes this method along with a technique called Canonical Sign Digit or CSD that can be used to optimize the Horner results. Finally, I will present a program I wrote called Horner.java, that generates Horner equations and can even generate the MSP430 code for performing floating point multiplications and divisions. Note: Even though the focus here is on the MSP430 family of microcontrollers, the techniques presented are applicable to any processor.

I won’t delve into the theory behind Horner’s method; instead, I will show you how the theory is applied. (The Resources sidebar has some pointers to arithmetic theory for those interested in reading further.) Suffice it to say, Horner’s method allows multiplication and division of signed and/or unsigned floating point numbers using a rather elegant approach which I will illustrate. Horner’s method is not without its drawbacks, however, which include:

When multiplying and dividing, the multiplier or divisor must be known in advance and cannot be changed dynamically. This is not an issue with most DSP applications because most coefficients are known at design time and don’t change at runtime. (This is true in general, but probably not true of everyone’s applications.)
A division remainder is not available using Horner’s method like it is in most other division techniques. (Why this is so will be shown later.)
The results of multiplication and division are rarely 100% accurate and the errors vary depending upon the types of numbers being processed. (I will illustrate this with examples shortly.)

For my color organ application, none of these drawbacks turned out to be significant.

The process of using Horner’s method can be broken down into the following series of steps. Given a known multiplier or divisor:

If dividing, invert the divisor so that it becomes a multiplier.
Convert the multiplier to its binary representation of the required bit length; usually 10, 12, or 16 bits.
Optionally apply CSD to the binary representation to optimize it. (CSD will be described later.)
Generate the Horner equations for the multiplication from the binary representation.
Generate the computer code which implements the Horner equations.

The example that follows will show how this is done. It is actually kind of fun to generate the equations and the code by hand the first couple of times to aid in really understanding the process. Horner.java came about because I had to generate a lot of multipliers and the fun quickly turned to tedium.

Let’s generate the equations for the multiplier 0.123456. We will do this using 12 bits and no CSD. The 12 bit binary representation of 0.123456 is 0.000111111001. Starting from the rightmost bit which is the Least Significant Bit (LSB) and moving towards the MSB (Most Significant Bit), find the first one bit in the binary representation. This occurs at the 2^-12 bit LSB position. From there, count the number of places to the next one bit which, in this case, is three. The number of places between the one bits is called the distance and a distance of three results in the first Horner equation:

T1 = X * 2^-3 + X

Here, X represents the number the multiplier will be multiplied by. T1 is just a temporary accumulator for the calculation. The final X is added in because there are more one bits remaining in the representation. This process is repeated until all one bits in the multiplier’s representation have been traversed. The complete set of Horner equations for our multiplier is then:

T1 = X * 2^-3 + X
T2 = T1 * 2^-1 + X
T3 = T2 * 2^-1 + X
T4 = T3 * 2^-1 + X
T5 = T4 * 2^-1 + X
T6 = T5 * 2^-1 + X
Result = T6 * 2^-4

The 2^-4 in the final equation is the bit weight of the leftmost one in the binary representation.

Let’s check our results. If we were to set X to 1,000, the result we would expect on a calculator, for example, would be 123.456. Using 1,000 in the Horner equations yields 123 for an error of 0.456, which is less than one least significant bit. Not bad accuracy for many applications.

Division, as mentioned, is accomplished by multiplication using the inverse or reciprocal of the divisor. That is X / Y is the same as X * 1/Y. So, if you were trying to divide 100/10 with Horner, you would actually be multiplying 100 * 0.1. Now you see why the remainder from the division isn’t available; there really is no division.

Separate sets of Horner equations are generated for a multiplier or divisor that has both a decimal and a fractional part. The decimal portion of the number is processed from the MSB to the LSB, just opposite the direction the fractional part of the number is processed. Another difference is that the distances between ones in the decimal portion of the number are positive values which result in positive powers of two in the Horner equations. Consider a multiplier of 654.321.

Decimal representation:
001010001110
Fractional representation: 010100100010

The equations for the decimal portion of the multiplier are:

T1 = X * 2² + X
T2 = T1 * 2⁴ + X
T3 = T2 * 2¹ + X
T4 = T3 * 2¹ + X
Decimal result = T4 * 2¹

Notice the positive powers of two in the above equations.

The equations for the fractional portion of the multiplier are:

T1 = X * 2^-4 + X
T2 = T1 * 2^-3 + X
T3 = T2 * 2^-2 + X
Fraction result = T3 * 2^-2

Total Result = Decimal result + Fraction result

Using a calculator to again check our results when X is equal to 29, the product should be 18975.309. Running 29 through the Horner equations yields 18966 for the decimal result and 9 for the fractional result; their sum being 18975. Here, the error is again less than the least significant bit.

Generating code to implement these equations is easy. A rotate right instruction (“rra” for the MSP430 family) is used for every negative power of two and a rotate left instruction (“rla”) is used for each positive power of two. For every + X operation, an “add” instruction is used. For every – X operation (discussed in the context of CSD), a “sub” instruction is used. Assuming the symbols “x,” “out,” and “acc” are register aliases, the code is as shown in Listing 1.

mov #29,x ; Load x with 29 mov x,acc ; Do the decimal portion of the multiply rla acc rla acc add x,acc ; acc = x * 2^2 + x rla acc rla acc rla acc rla acc add x,acc ; acc = acc * 2^4 + x rla acc add x,acc ; acc = acc * 2^1 + x rla acc add x,acc ; acc = acc * 2^1 + x rla acc ; acc = acc * 2^1 mov acc,out ; Save the decimal result in out mov x,acc ; Reload x for the fractional part
rra acc rra acc rra acc rra acc add x,acc ; acc = acc * 2^-4 + x rra acc rra acc rra acc add x,acc ; acc = acc * 2^-3 + x rra acc rra acc add x,acc ; acc = acc * 2^-2 + x rra acc rra acc ; acc = acc * 2^-2 add acc,out ; out = dec result + frac result

LISTING 1.

This may seem like a lot of instructions, but remember, each register operation executes in one clock cycle on the MSP430 family of controllers. The above multiplication executes in about 32 clock cycles. If the processor is running at 16 MHz, the total execution time is a respectable two microseconds.

Of course, a C programmer would be able to perform the multiplication in a single line of code such as:

double result = 29 * 654.321;

and get exact results. Such is the plight of the assembly language programmer.

Canonical Sign Digit

Canonical Sign Digit or CSD can be used to reduce the number of arithmetic operations (rotates and adds) required to implement the Horner equations. CSD is an optimization resulting in fewer instruction executions required to achieve the same result. CSD is a processing step that is inserted in the process flow where previously described. If a multiplier or divisor consists of decimal and fractional parts, the CSD process is applied across the complete binary representation. CSD processing proceeds from the LSB bit to the MSB and works by grouping adjacent one bits in the binary representation and replacing them with a simpler term.

To do this, it is required that our binary representation consisting of ones and zeros be replaced with a trinary representation consisting of three symbols (0, 1, and -1). The -1 is a special symbol that indicates some number of consecutive ones (a run of ones) have been replaced. After the application of CSD, there won’t be any consecutive ones left in the binary representation of a number. It is necessary to restart CSD processing after each run replacement as a new run of ones may be created by the insertion of the replacement.

CSD processing is performed in a loop as follows:

While there are consecutive ones in the representation Starting at the LSB of the binary representation search leftward for a run of ones. Create and insert a replacement string for the run of ones. WhileEnd

An example will illustrate the point. Say we have a multiplier M. Its binary and CSD representations are shown next:

Binary representation: 0.0011100000 CSD representation: 0.0100-100000

As you can see, the -1 was inserted at the start of the run of ones. Zeros replace the ones in the run and finally a new one bit replaces the zero that broke the run of ones. You can probably surmise by looking at the trinary representation that the CSD conversion will result in one fewer Horner equation with its rotate and add. Therein lies the optimization.

Generating the Horner equations for a CSD optimized multiplier is the same as was described previously, except instead of adding + X to each equation, a – X is added for each -1 in the representation.

Here are the CSD optimized Horner equations for our previous example of 654.321:

Binary representation:
001010001110.010100100010

CSD representation:
0010100100-10.010100100010

Horner equations:
T1 = X * 2² + X
T2 = T1 * 2³ + X
T3 = T2 * 2³ - X
Decimal result = T3 * 2¹

T1 = X * 2^-4 + X
T2 = T1 * 2^-3 + X
T3 = T2 * 2^-2 + X
Fraction result = T3 * 2^-2

Total Result = Decimal result + Fraction result

You can see there is one less equation as a result of the CSD optimization.

Horner.java

As mentioned, Horner.java is a program I wrote to quickly generate Horner equations and executable code for performing floating point multiplication and/or division on the MSP430 family of microcontrollers. Check out the sidebar at the end that shows how to run this program and what program options are available. The source code for Horner.java can be extracted from the included jar file listed in the sidebar. You may like to look at the code to see how signed multiplications are performed.

Conclusions

Horner’s method can be used to provide floating point support on microprocessors or microcontrollers that don’t have support built in. It is ideal for applications like mine, that aren’t written using a high level language. Horner’s method was just what I needed for my color organ application. After developing the digital filters using Horner’s method, I can say that the code for the color organ project can fit in the 2K bytes of Flash memory provided by the F2012 microcontroller (though just barely).

It may seem like a lot of work to use the techniques presented here, but often optimizing performance and/or memory usage of an application usually is work. NV

Running Horner.java

To run Horner.java, you will need to have a Java Development Kit (JDK) installed on your computer. The program was written for use with Java 5 which is available for free from Sun Microsystems at http://java.sun.com/javase/downloads/index_jdk5.jsp. The newer version 6 of Java should also work. An executable version of the Horner.java program as well as the source code are available with this article’s downloads.

The code can be run in a couple of ways. First, the source file Horner.java can be extracted from the jar file, compiled, and then run using a command shell (cmd.exe) with the following procedure:

jar xf Horner.jar This command extracts the source file from the jar javac Horner.java This command compiles the source file java Horner <command line arguments> This is how you run the program

Alternately, the code can be run directly from the jar file without having to extract anything. This is done as follows:

java -jar Horner.jar <command line arguments>

If neither of these methods work successfully, make sure the bin directory of the JDK installation is on the Path in your command shell. Alternately, you can change directories to the bin directory and run the commands from there.

Horner.java has quite a few command line argument possibilities. Help is available by executing the program (using either of the above methods) without specifying any arguments. You should see the following:

No command line arguments specified Horner.java V1.0 copyright 2007 by: Craig A. Lindley Generates Horner equations and MSP430 code for multiplication using only shifts and adds Program usage is as follows: java Horner [nocsd] [csd] [bin] [16bit] [12bit] [10bit]

[equ] [both] [help | ?] multiplier Where: nocsd - use binary not csd numbers in the calculations csd - use csd optimizations in the calculations bin - show binary representation of multiplier 16bit - use 16 bit binary precision 12bit - use 12 bit binary precision 10bit - use 10 bit binary precision code - generate the MSP430 code for the multiplication equ - generate Horner equations for the multiplication both - generate the MSP430 code and the equations for the multiplication help or ? - displays this message multiplier - the floating point number being multiplied. The multiplier must be the final argument. Program defaults: 16bit csd both Please try again

To see the Horner equations and the code generated for a multiplier, specify the multiplier as the final command line arguments as in the following: java -jar Horner.jar 0.12345 which will produce the following results: Multiplier: 0.12345 Bits of precision: 16 Horner equations: T1 = X * 2^-2 - X T2 = T1 * 2^-2 + X T3 = T2 * 2^-2 - X T4 = T3 * 2^-6 + X Fraction result = T4 * 2^-3 MSP430 code: Code conventions: Register “in” has the X value to be multiplied Register “acc” is the accumulator used for temporary values Register “out” holds final result mov in,acc rra acc rra acc sub in,acc rra acc rra acc add in,acc rra acc rra acc sub in,acc rra acc rra acc rra acc rra acc rra acc rra acc add in,acc rra acc rra acc rra acc mov acc,out You can play around with the program using the various command line arguments to see what their effects are.




The TI eZ430-F2013 Development Kit
Yes it’s true. You can get a complete TI MSP430 microcontroller development kit for $20 plus shipping. It’s called the eZ430-F2013 and is in the form of a USB stick (envision a jump drive). Amazingly, both the debugger interface and a target board fit inside the stick. This development system works with all MSP430F20xx devices. Even someone as frugal as myself could not pass up a deal like this. This offer puts microcontroller development within everyone’s reach.
The following information was extracted from the TI website.
Description

The eZ430-F2013 is a complete MSP430 development tool including all the hardware and software to evaluate the MSP430F2013 and develop a complete project in a convenient USB stick form factor. The eZ430-F2013 uses the IAR Embedded Workbench Integrated Development Environment (IDE) to provide full emulation with the option of designing with a stand-alone system or detaching the removable target board to integrate into an existing design. The USB port provides enough power to operate the ultra low power MSP430 so no external power supply is required.
Features

eZ430-F2013 development tool including a USB debugging interface and detachable MSP430F2013 target board
LED indicator
Removable USB stick enclosure
Debugging interface supports development with all MSP430F20xx devices
Integrated IAR Kickstart user interface which includes an assembler, linker, simulator, source-level debugger, and limited C compiler
Full documentation on CD-ROM

What’s Included

CD-ROM including software and documentation
IAR Embedded Workbench (Kickstart Version) IDE
eZ430-F2013 Development Tool

In addition to the development system, a set of three MSP430-F2012 target boards is available for $10. TI’s part number for these boards is eZ430-T2012.  These boards are about the size of a quarter cut in half. The F2012 processor itself is about the size of the fingernail on your little finger. I used one of these target boards for my color organ project.
The included C compiler is limited to the production of 4K of code which means it cannot be used on the largest of the MSP430F20xx devices. TI makes available a free C compiler on its website that can generate up to 8K of code. Since my first MSP430 project was written entirely in assembly language, I didn’t try out this alternative C compiler.



Resources
Java version 5 is available at http://java.sun.com/javase/downloads/index_jdk5.jsp.
A Texas Instruments application note discussing Horner’s method is available at https://www.ti.com/sitesearch/en-us/docs/universalsearch.tsp?langPref=en-US&searchTerm=slaa329a&nr=2#q=slaa329a&numberOfResults=25.
For information on computer arithmetic and algorithms see the book Computer Organization by Carl Hamacher, Zvonko Vranesic and Safawat Zaky, 3rd edition, McGraw-Hill Publications, 1990.



Downloads
 download
Floating Point Multiplication and Division (Java File)