page 없어질 까봐 스크랩 함.
Hi All,
A code or algorithm must be correct. When it is, after some debugging, we can make a compromise between its code size and speed. There is always a compromise, but sometimes it does not matter. When either speed or size is a factor, there are methods to boost performance. There might be a 'Small, Smaller, Smallest' flavor of coding for a given language/hardware combination, but in this kind of thread I am looking for very fast codes. As, we have these days the luxury of more and more code space, but of time, we don't.
Even when divide hardware is available, it is quite common to find that divide instructions take more than twice as long as multiply instructions. When division and multiplication are done by software, dividing is usually slower than multiplying, too. However, we can improve speed by noticing, that many integer division operations found in real programs involves division by a constant. When we know the numbers we are dividing by in advance, we can substitute the divisions with multiplications and with some bit shifts.
When the constant to divide by is a power of two, we can shift the number to be divided to the right by the exponent, and that will complete division
This bit shift is only one operation, four ticks with the Propeller. Can we do something similar when the constant is not a power of two? Yes. The·basic idea is to approximate the ratio (1/constant) by another rational number (numerator/denominator) with a power of two as the denominator. Instead of dividing, we can then multiply by the numerator, and shift by the exponent in the denominator. In case of 10, for example, the (1/10) ratio can be substituted with· (205/2048). The (some_integer/10) division can be calculated with a multiplication (some_integer*205), then the product is shifted right 11 bits to get the result of the division by 10. In short notation
To divide with 100, the rational number (41/4096) works nicely
For 1000 we can realize that it is 8*125. So, if we first shift 'some_integer' to the right by 3, we only need to divide the result by 125. When (1/125) is approximated as (67109/2^23)=(67109/8388608), then the simple formula
works for integers up to almost 400_000 with high accuracy.
To figure out the (numerator/denominator(=2^N)) approximation for an arbitrary (1/constant) factor, first we have to fix the required relative accuracy of the division. When 1% relative accuracy is adequate for the application
- we multiply the constant with 100.
- Then we find the smallest power of two, that is bigger than the product.·For (1/2009), 262144 is that, so the denominator is 2^18.
- Then, we·obtain the numerator by dividing 2^18 by the constant 2009. That gives 130.48. After rounding to the nearest integer, we have··the·rational approximation·of (1/2009) as (130/2^18).·
It goes, in short
Or, doing an evident simplification, it is
For a ten times better relative accuracy of 0.1%, the formula is
After simplification, it is
The previous simplications decrease the numerator and are useful, since the (some_integer*numerator) product has to be smaller than 2^32, to avoid overflow with the 32-bit multiplication.· 2009 is an odd number. For even constants we can shift out the power of two factor from their primes, before the division. So, to divide with 2010 at 0.1% accuracy, one can use
The initial right shift for an even constant increases the range of validity of the algorithm at the cost of an additional operation.I wonder wether all these beat unrolled division?
We have just substituted a division by a multiplication, but elimination of multiply instructions is itself desirable. For example, according to the Pentium instruction timings, the base time for a 32-bit integer add or subtract on that processor (discounting operand fetch and other overhead) is 1 clock cycle, and both can sometimes be paired with other instructions for concurrent execution. In contrast, a 32-bit multiply takes 10 cycles and a divide takes 41 cycles, and neither can be paired. Clearly, these instructions should be avoided wherever possible.
In case of multiplication with a constant we may ask, that
- Can we can do faster integer multiplication with a constant with only· bit shifts and additions? (The answer is probably: yes. And remember, numerators were contants...)
- Can the (1/constant) ratio be calculated even faster with only bit shifts and additions?
I think the answer for the 2nd. question answers the 3rd, too.··
Trespassing the Regime of Fixed-point math (or not?)...
-------------------------------------------------------------
Fixed-point representations, in which the point is implicitly placed between any bits of the binary representation of a number, have been used since the dawn of the computer age. In the PC world fixed-point arithmetic has become a lost art among programmers. Since the introduction of the 486 CPU, a fast floating-point processor is an integrated part of the CPU, and therefore there is rarely a need to work with fixed-point numbers anymore. In embedded applications fixed-point arithmetic has still its places, and there is no excuse to use inefficient or imprecise methods with those limited resources. Moving away from integer arithmetic to fixed-point numbers is one step forward to close the gap between the speed of integer math and the ease of use of floating point arithmetic.
When fixed-point numbers are used, all arithmetic operations use integer arithmetic, and this leads to a considerable performance advantage. This comes from that a binary fixed point number is, for all purposes, an integer number. You simply cannot tell the difference by looking at it, and neither can the machine level math functions. For those math functions, the numbers are just integers. That's the beauty of it: you use the same machine level functions you would for integer math, plus a little bit of housekeeping for the point at multiplications and divisions. (John Von Neumann said once, that a competent programmer never needs floating point hardware because he should always be able to keep the point in his head. Well, his words are just remembered by others, he never wrote them down...)
With fixed-point arithmetic we can choose the precision of the numbers, because we can distribute bit usage between fractional and integer parts of a fixed number in any way we like. So, we can adapt a very fast code to very different accuracy demands.
In fixed-point, the simple integer division is the worst basic operation, because it loses precision the most. To somewhat remedy that, we can use multiplication by the reciprocal of that number (1/number) instead. To whet your appetite for fast and effective fixed point ASM code, I show a way to divide by 10 with faster, more efficiently and·more accurately, than·with the discussed acceleration method.
To make it even faster
If you know other tricks or methods to speed up integer or fixed-point division or multiplication, let us know, too. SPIN or PASM test codes to check, improve or maybe to bust these ideas, are welcome.
Cheers,
istvan
Post Edited (cessnapilot) : 7/29/2009 7:10:30 PM GMT
Comments List
A % B == 0인 게 알려져 있을 때 A / B를 곱셈 한 번(+ B가 짝수일 경우 시프트 한 번)으로 계산하는 방법은 알려져 있습니다(Z/(B)Z에 대한 B의 역수를 구해서 곱하면 됩니다). A가 임의의 숫자라면 A % B를 먼저 구해서 (A - A % B) / B를 구하는 방법을 쓸 수는 있는데, 이 경우에는 B가 작은 경우가 아니면 그다지 간단한 편은 아닙니다. Hackers' Delight 같은 책도 찾아 보세요.
감사합니다. 요즘에 개발로 보직을 변경해서.. 이제껏 좀 깊이 파고 들어가고 있습니다. 다만 스스로 찾아서 하려니 쉽지 않네요. 좋은 정보 감사합니다.