Find first set

{{Short description|Family of related bitwise operations on machine words}} {{Use dmy dates|date=January 2020|cs1-dates=y}} In computer software and hardware, '''find first set''' ('''ffs''') or '''find first one''' is a bit operation that, given an unsigned machine word,<ref group="nb" name="NB1"/> designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is '''count trailing zeros''' ('''ctz''') or '''number of trailing zeros''' ('''ntz'''), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is '''log base 2''', so called because it computes the binary logarithm {{math|⌊log{{sub|2}}(x)⌋}}.<ref name="Anderson_1"/> This is closely related to '''count leading zeros''' ('''clz''') or '''number of leading zeros''' ('''nlz'''), which counts the number of zero bits preceding the most significant one bit.<ref group="nb" name="NB2"/> There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1,<ref name="Linux_2012_FFS3"/> herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Most modern CPU instruction set architectures provide one or more of these as hardware operators; software emulation is usually provided for any that aren't available, either as compiler intrinsics or in system libraries.

== Examples == Given the following 32-bit word:

: 0000 0000 0000 0000 1000 0000 0000 1000

The count trailing zeros operation would return 3, while the count leading zeros operation returns 16. The count leading zeros operation depends on the word size: if this 32-bit word were truncated to a 16-bit word, count leading zeros would return zero. The find first set operation would return 4, indicating the 4th position from the right. The truncated log base 2 is 15.

Similarly, given the following 32-bit word, the bitwise negation of the above word:

: 1111 1111 1111 1111 0111 1111 1111 0111

The count trailing ones operation would return 3, the count leading ones operation would return 16, and the find first zero operation ffz would return 4.

If the word is zero (no bits set), count leading zeros and count trailing zeros both return the number of bits in the word, while ffs returns zero. Both log base 2 and zero-based implementations of find first set generally return an undefined result for the zero word.

== Hardware support == Many architectures include instructions to rapidly perform find first set and/or related operations, listed below. The most common operation is count leading zeros (clz), likely because all other operations can be implemented efficiently in terms of it (see Properties and relations).

{|class="wikitable" ! Platform !! Mnemonic !! Name !! Operand widths !! Description !! On application to 0 |- | ARM (ARMv5T architecture and later) except Cortex-M0/M0+/M1/M23 || clz<ref name="ARM_2012_CLZ"/> || Count Leading Zeros || 32 || clz || 32 |- | ARM (ARMv8-A architecture) || clz || Count Leading Zeros || 32, 64 || clz || Operand width |- | AVR32 || clz<ref name="Atmel_AVR32"/> || Count Leading Zeros || 32 || clz || 32 |- | rowspan=2 | DEC Alpha | ctlz<ref name="Compaq_2002_Alpha"/> || Count Leading Zeros || 64 || clz || 64 |- | cttz<ref name="Compaq_2002_Alpha"/> || Count Trailing Zeros || 64 || ctz || 64 |- | rowspan=2 | Intel 80386 and later | bsf<ref name="Intel_64-32_DM-2A"/> || Bit Scan Forward || 16, 32, 64 || ctz || Undefined; sets zero flag |- | bsr<ref name="Intel_64-32_DM-2A"/> || Bit Scan Reverse || 16, 32, 64 || Log base 2 || Undefined; sets zero flag |- | x86 supporting BMI1 or ABM || lzcnt<ref name="AMD_2011_AMD64"/> || Count Leading Zeros || 16, 32, 64 || clz || Operand width; sets carry flag |- | x86 supporting BMI1 || tzcnt<ref name="AMD_2013_AMD64"/> || Count Trailing Zeros || 16, 32, 64 || ctz || Operand width; sets carry flag |- | Itanium || clz<ref name="Intel_Itanium_DM-3"/> || Count Leading Zeros || 64 || clz || 64 |- | rowspan=2 | MIPS32/MIPS64 | clz<ref name="MIPS_2011_32"/><ref name="MIPS_2011_64"/> || Count Leading Zeros in Word || 32, 64 || clz || Operand width |- | clo<ref name="MIPS_2011_32"/><ref name="MIPS_2011_64"/> || Count Leading Ones in Word || 32, 64 || clo || Operand width |- | Motorola 68020 and later || bfffo<ref name="Motorola_1992"/> || Find First One in Bit Field || Arbitrary || Log base 2 || Field offset + field width |- | PDP-10 || jffo || Jump if Find First One || 36 || clz || 0; no operation (i.e., jumps on nonzero) |- | POWER/PowerPC/Power ISA || cntlz/cntlzw/cntlzd<ref name="Frey_PowerPC"/> || Count Leading Zeros || 32, 64 || clz || Operand width |- | Power ISA 3.0 and later || cnttzw/cnttzd<ref name="IBM_PowerISA"/> || Count Trailing Zeros || 32, 64 || ctz || Operand width |- | rowspan=2 | RISC-V ("B" Extension) | clz<ref name="Wolf_2019_RISC-V-B"/> || Count Leading Zeros || 32, 64 || clz || Operand width |- | ctz<ref name="Wolf_2019_RISC-V-B"/> || Count Trailing Zeros || 32, 64 || ctz || Operand width |- | SPARC Oracle Architecture 2011 and later || lzcnt (synonym: lzd)<ref name="Oracle_2011_SPARC"/> || Leading Zero Count || 64 || clz || 64 |- | VAX || ffs<ref name="DEC_1987_VAX"/> || Find First Set || 0–32 || ctz || Operand width; sets zero flag |- | rowspan=3 | IBM z/Architecture | flogr<ref name="IBM_Z_C22"/> || Find Leftmost One || 64 || clz || 64 |- | vclz<ref name="IBM_Z_C22"/> || Vector Count Leading Zeroes || 8, 16, 32, 64 || clz || Operand width |- | vctz<ref name="IBM_Z_C22"/> || Vector Count Trailing Zeroes || 8, 16, 32, 64 || ctz || Operand width |}

On some Alpha platforms CTLZ and CTTZ are emulated in software.

==Tool and library support== A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above:

{|class="wikitable" !Tool/library!!Name!!Type!!Input type(s)!!Notes!!On application to 0 |- |C23 standard library<ref>{{cite book |last1=Meneide |first1=JeanHeyd |last2=Wiedijk | first2=Freek |title=Information technology — Programming languages — C, N3220 working draft |date=2024-02-22 |publisher=ISO/IEC |pages=305–308 |url=https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf |access-date=7 August 2025}}</ref><ref>{{cite web |title=Standard library header <stdbit.h> (C23) |url=https://en.cppreference.com/w/c/header/stdbit.html |website=cppreference.com |access-date=7 August 2025}}</ref>||<code>stdc_first_trailing_one[_uc,_us,_ui,_ul,_ull]</code> <code>stdc_trailing_zeros[_uc,_us,_ui,_ul,_ull]</code> <code>stdc_first_leading_one[_uc,_us,_ui,_ul,_ull]</code> <code>stdc_leading_zeros[_uc,_us,_ui,_ul,_ull]</code>||Library function||unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long ||Functions for <code>zero</code> and <code>ones</code> are also provided.||0 (bit index) Operand width (count) |- |C++20 standard library<ref>{{cite book |last1=Smith |first1=Richard |title=N4861 Working Draft, Standard for Programming Language C++ |date=2020-04-01 |publisher=ISO/IEC |pages=1150–1153 |url=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4861.pdf |access-date=25 May 2020}}</ref><ref name="cppreference_header_bit">{{cite web |title=Standard library header <bit> |url=https://en.cppreference.com/w/cpp/header/bit |website=cppreference.com |access-date=25 May 2020}}</ref>||<code>bit_ceil bit_floor</code> <code>bit_width</code> <code>countl_zero countl_one</code> <code>countr_zero countr_one</code>||Library function||unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long|| || |- |POSIX.1 compliant libc 4.3BSD libc OS X 10.3 libc<ref name="Linux_2012_FFS3" /><ref name="Apple_1994_FFS3" />||<code>ffs</code>||Library function||int||Includes glibc. POSIX does not supply the complementary log base 2 / clz.||0 |- |FreeBSD 5.3 libc OS X 10.4 libc<ref name="Apple_1994_FFS3" />||<code>ffsl</code> <code>fls</code> <code>flsl</code>||Library function||int, long||fls("find last set") computes (log base 2) + 1.||0 |- |FreeBSD 7.1 libc<ref name="FreeBSD_2012_FFS3" />||<code>ffsll</code> <code>flsll</code>||Library function||long long|| ||0 |- |GCC 3.4.0<ref name="GCC_2015_Functions" /><ref name="GCC_2015_Changes" /> Clang 5.x<ref name="LLVM_Clang_Extensions" /><ref name="LLVM_Clang_Sources" />||<code>__builtin_ffs[l,ll,imax]</code> <code>__builtin_clz[l,ll,imax]</code> <code>__builtin_ctz[l,ll,imax]</code> ||Built-in functions||unsigned int, unsigned long, unsigned long long, uintmax_t||GCC documentation considers result undefined clz and ctz on 0.||0 (ffs) |- |Visual Studio 2005||<code>_BitScanForward</code><ref name="Microsoft_2008_Intrinsics_1" /> <code>_BitScanReverse</code><ref name="Microsoft_2008_Intrinsics_2" />||Compiler intrinsics||unsigned long, unsigned __int64||Separate return value to indicate zero input||Undefined |- |Visual Studio 2008||<code>__lzcnt</code><ref name="Microsoft_2008_Intrinsics_3" />||Compiler intrinsic||unsigned short, unsigned int, unsigned __int64||Relies on hardware support for the lzcnt instruction introduced in BMI1 or ABM.||Operand width |- |Visual Studio 2012||<code>_arm_clz</code><ref name="Microsoft_2012_Intrinsics_1" />||Compiler intrinsic||unsigned int||Relies on hardware support for the clz instruction introduced in the ARMv5T architecture and later.||? |- |Intel C++ Compiler||<code>_bit_scan_forward</code> <code>_bit_scan_reverse</code><ref name="Intel_Intrinsics_Guide" /><ref name="Intel_2006_Intrinsics" />||Compiler intrinsics||int|| ||Undefined |- |rowspan=2|Nvidia CUDA<ref name="NVIDIA_2010_CUDA" />||<code>__clz</code>||rowspan=2|Functions||rowspan=2|32-bit, 64-bit||rowspan=2|Compiles to fewer instructions on the GeForce 400 series||32 |- |<code>__ffs</code>||0 |- |LLVM||<code>llvm.ctlz.*</code> <code>llvm.cttz.*</code><ref name="LLVM_Intrinsic" />||Intrinsic||8, 16, 32, 64, 256||LLVM assembly language||Operand width, if 2nd argument is 0; undefined otherwise |- |GHC 7.10 (base 4.8), in <code>Data.Bits</code>{{citation needed|date=May 2020}}||<code>countLeadingZeros</code> <code>countTrailingZeros</code>||Library function||<code>FiniteBits b => b</code>||Haskell programming language||Operand width |- |Rust standard library<ref>{{Cite web |title=i32 - Rust |url=https://doc.rust-lang.org/std/primitive.i32.html |access-date=2025-11-18 |website=doc.rust-lang.org}}</ref><ref>{{Cite web |title=u32 - Rust |url=https://doc.rust-lang.org/std/primitive.u32.html |access-date=2025-11-18 |website=doc.rust-lang.org}}</ref><ref>{{Cite web |title=NonZero in std::num - Rust |url=https://doc.rust-lang.org/std/num/struct.NonZero.html |access-date=2025-11-18 |website=doc.rust-lang.org}}</ref>||<code>highest_one</code> <code>leading_zeros</code> <code>lowest_one</code> <code>trailing_zeros</code>||Library method||All primitive integer types and other integer types such as <code>NonZero<T></code>||<code>highest_one</code> and <code>lowest_one</code> are not yet stable.|| |- |Zig<ref>{{Cite web |title=Documentation - The Zig Programming Language |url=https://ziglang.org/documentation/0.15.2/#clz |access-date=2025-11-18 |website=ziglang.org}}</ref><ref>{{Cite web |title=Documentation - The Zig Programming Language |url=https://ziglang.org/documentation/0.15.2/#ctz |access-date=2025-11-18 |website=ziglang.org}}</ref>||<code>@clz</code> <code>@ctz</code>||builtin function||integer or vector|||| |}

== Properties and relations == If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by {{math|1=ctz(''x'') = ffs(''x'') − 1}} (except when the input is zero). If bits are labeled starting at {{math|0}}, then count trailing zeros and find first set are exactly equivalent operations. Given {{math|''w''}} bits per word, the {{math|log2}} is easily computed from the {{math|clz}} and vice versa by {{math|1=log2(''x'') = ''w'' − 1 − clz(''x'')}}.

As demonstrated in the example above, the find first zero, count leading ones, and count trailing ones operations can be implemented by negating the input and using find first set, count leading zeros, and count trailing zeros. The reverse is also true.

On platforms with an efficient log2 operation such as M68000, {{math|ctz}} can be computed by:

: {{math|ctz(''x'') {{=}} log2(''x'' & ''−x'')}}

where {{math|&}} denotes bitwise AND and {{math|''−x''}} denotes the two's complement of {{math|''x''}}. The expression {{math|''x'' & ''−x''}} clears all but the least-significant {{math|1}} bit, so that the most- and least-significant {{math|1}} bit are the same.

On platforms with an efficient count leading zeros operation such as ARM and PowerPC, {{math|ffs}} can be computed by:

: {{math|ffs(''x'') {{=}} w − clz(''x'' & ''−x'')}}.

Conversely, on machines without {{math|log2}} or {{math|clz}} operators, {{math|clz}} can be computed using {{math|ctz}}, albeit inefficiently:

: {{math|clz {{=}} ''w'' − ctz(2⌈log{{sub|2}}(''x'')⌉)}} (which depends on {{math|ctz}} returning {{math|w}} for the zero input)

On platforms with an efficient Hamming weight (population count) operation such as SPARC's <code>POPC</code><ref name="SPARC_1992_A41"/><ref name="Warren_2013"/> or Blackfin's <code>ONES</code>,<ref name="AD_2001"/> there is:

: {{math|ctz(''x'') {{=}} popcount((''x'' & ''−x'') − 1)}},<ref name="Dietz"/><ref name="Isenberg"/> or {{math|{{pipe escape|ctz(''x'') {{=}} popcount(~(''x'' | ''−x''))}}}}, : {{math|ffs(''x'') {{=}} popcount(''x'' ^ ~−''x'')}}<ref name="SPARC_1992_A41"/> : {{math|clz {{=}} 32 − popcount(2⌈log{{sub|2}}(''x'')⌉ − 1)}}

where {{math|^}} denotes bitwise exclusive-OR, {{math|{{pipe escape||}}}} denotes bitwise OR and {{math|~}} denotes bitwise negation.

The inverse problem (given {{math|''i''}}, produce an {{math|''x''}} such that {{math|ctz(''x'') {{=}} ''i''}}) can be computed with a left-shift ({{math|1 << ''i''}}).

Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all-zero (for {{math|ffs}}, {{math|ctz}}, {{math|clz}}) or not all-one (for {{math|ffz}}, {{math|clo}}, {{math|cto}}) is encountered. A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this.

==Software emulation== Most CPUs dating from the late 1980s onward have bit operators for ffs or equivalent, but a few modern ones like some of the ARM-Mx series do not. In lieu of hardware operators for ffs, clz and ctz, software can emulate them with shifts, integer arithmetic and bitwise operators. There are several approaches depending on architecture of the CPU and to a lesser extent, the programming language semantics and compiler code generation quality. The approaches may be loosely described as linear search, binary search, search+table lookup, de Bruijn multiplication, floating point conversion/exponent extract, and bit operator (branchless) methods. There are tradeoffs between execution time and storage space as well as portability and efficiency.

Software emulations are usually deterministic. They return a defined result for all input values; in particular, the result for an input of all zero bits is usually 0 for ffs, and the bit length of the operand for the other operations.

If one has a hardware clz or equivalent, ctz can be efficiently computed with bit operations, but the converse is not true: clz is not efficient to compute in the absence of a hardware operator.

===2n=== The function {{math|2⌈log{{sub|2}}(x)⌉}} (round up to the nearest power of two) using shifts and bitwise ORs<ref name="Anderson_2"/> is not efficient to compute as in this 32-bit example and even more inefficient if we have a 64-bit or 128-bit operand:

'''function''' pow2(x): '''if''' x = 0 return ''invalid'' // ''invalid'' is implementation defined (not in [0,63]) x ← x - 1 '''for each''' y '''in''' {1, 2, 4, 8, 16}: x ← x | (x >> y) '''return''' x + 1

===FFS=== Since ffs = ctz + 1 (POSIX) or ffs = ctz (other implementations), the applicable algorithms for ctz may be used, with a possible final step of adding 1 to the result, and returning 0 instead of the operand length for input of all zero bits.

===CTZ=== {{anchor|ctz1}}The canonical algorithm is a loop counting zeros starting at the LSB until a 1-bit is encountered:

'''function''' ctz1 (x) '''if''' x = 0 '''return''' w t ← 1 r ← 0 '''while''' (x & t) = 0 t ← t << 1 r ← r + 1 '''return''' r

This algorithm executes ''O''(''w'') time and operations, and is impractical in practice due to a large number of conditional branches.

An exception is if the inputs are uniformly distributed. In that case, we can rely on the fact that half the return values will be 0, one quarter will be 1, and so on. The average number of loop iterations per function call is 1, and the algorithm executes in ''O''(1) average-case time.

{{anchor|ctz2}}A lookup table can eliminate most branches:

table[1..2n-1] = ctz(i) '''for''' i '''in''' 1..2n-1 '''function''' ctz2 (x) '''if''' x = 0 '''return''' w r ← 0 '''while''' (x & (2n−1)) ≠ 0 x ← x >> n r ← r + n '''return''' r + table[x & (2n−1)]

The parameter ''n'' is fixed (typically 8) and represents a time–space tradeoff. The loop may also be fully unrolled. But as a linear lookup, this approach is still O(n) in the number of bits in the operand.

{{anchor|ctz2a}}If ''n'' = 4 is chosen, the table of 16 2-bit entries can be encoded in a single 32-bit constant using SIMD within a register techniques:

''// binary {{gaps|00|01|00|10|00|01|00|11|00|01|00|10|00|01|00|xx}}'' table ← 0x12131210 '''function''' ctz2a (x) '''if''' x = 0 '''return''' w r ← 0 '''while''' (x & 15) = 0 x ← x >> 4 r ← r + 4 '''return''' r + ((table >> 2*(x & 15)) & 3);

This technique is quite practical in applications such as the binary GCD algorithm, where the ''x'' values are uniformly distributed, so the iterative loop is not needed {{frac|15|16}} of the time, and suffers minimal branch misprediction penalty.

{{anchor|ctz3}}A binary search implementation takes a logarithmic number of operations and branches, as in this 32-bit version:{{r|Warren_2013_5-3|Warren_2013_5-4}}

'''function''' ctz3 (x) '''if''' x = 0 '''return''' 32 n ← 0 '''if''' (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16 '''if''' (x & 0x000000FF) = 0: n ← n + 8, x ← x >> 8 '''if''' (x & 0x0000000F) = 0: n ← n + 4, x ← x >> 4 '''if''' (x & 0x00000003) = 0: n ← n + 2, x ← x >> 2 '''if''' (x & 0x00000001) = 0: n ← n + 1 ''// Equivalently, n ← n + 1 - (x & 1)'' '''return''' n

This algorithm can be assisted by a table as well, replacing the last 2 or 3 if statements with a 16- or 256-entry lookup table using the least significant bits of {{code|x}} as an index.

{{anchor|ctz4}}As mentioned in {{slink||Properties and relations}}, if the hardware has a clz operator, the most efficient approach to computing ctz is thus:

'''function''' ctz4 (x) '''if''' x = 0 '''return''' w ''// Isolates the LSB'' x ← x & −x '''return''' w − 1 − clz(x)

{{anchor|ctz4a}}A similar technique can take advantage of a population count instruction:

'''function''' ctz4a (x) '''if''' x = 0 '''return''' w ''// Makes a mask of the least-significant bits'' x ← x ^ (x − 1) '''return''' popcount(x) − 1

{{anchor|ctz5}}An algorithm for 32-bit ctz uses a de Bruijn sequence to construct a minimal perfect hash function that eliminates all branches.{{r|Leiserson_1998|Busch_2009}} This algorithm assumes that the result of the multiplication is truncated to 32 bit.

'''for''' i '''from''' 0 '''to''' 31: table[ 0x077CB531 <> 27 & 31 ] ← i ''// table [0..31] initialized'' '''function''' ctz5 (x) '''if''' x = 0 '''return''' 32 '''return''' table[((x & −x) * 0x077CB531) >> 27 & 31]

The expression {{code|(x & −x)}} again isolates the least-significant 1 bit. There are then only 32 possible words, which the unsigned multiplication and shift hash to the correct position in the table. This algorithm is branch-free if it does not need to handle the zero input.

The technique can be extended to 64-bit words.{{r|Isenberg2}}

A minor variant uses the expression {{code|(x & (x−1))}} to compute a mask of ''n'' trailing ones, as in the popcount method, and a different minimal perfect hash multiplier.{{r|Isenberg2}} This allows the lookup table to be shared with a count leading zeros implementation.

This also permits a ''folded'' implementation using a half-width multiply.{{r|Isenberg3}} After computing the mask, the bitwise xor of the top and bottom halves is sufficient to uniquely identify the original mask. A suitable multiplier produces a minimal perfect hash which may be decoded by a lookup table: '''for''' i '''from''' 0 '''to''' 31: table[ ((0xffff0000 >> i) * 0x70a7) >> 11 & 31 ] ← 31 - i ''// table [0..31] initialized'' '''function''' ctz5 (x) '''if''' x = 0 '''return''' 32 x ← x ^ (x - 1) x ← x ^ (x >> 16) '''return''' table[(x * 0x70a7) >> 11 & 31] ''// 0x70d3 also works''

This is advantageous if the narrower multiply saves more time than the folding step adds. The technique can also be extended to 64-bit words, using the 32-bit multipliers {{code|0x783a9b23}}, {{code|0x78291d9b}}, {{code|0x782c8d4f}}, or {{code|0x78291acf}}.{{r|Isenberg3}}

===CLZ=== {{anchor|clz1}}The canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is found, as shown in this example. It executes in O(n) time where n is the bit-length of the operand, and is not a practical algorithm for general use.

'''function''' clz1 (x) '''if''' x = 0 '''return''' w t ← 1 << (w - 1) r ← 0 '''while''' (x & t) = 0 t ← t >> 1 r ← r + 1 '''return''' r

{{anchor|clz2}}An improvement on the previous looping approach examines eight bits at a time then uses a 256 (28) entry lookup table for the first non-zero byte. This approach, however, is still O(n) in execution time.

'''function''' clz2 (x) '''if''' x = 0 '''return''' w t ← 0xff << (w - 8) r ← 0 '''while''' (x & t) = 0 t ← t >> 8 r ← r + 8 '''return''' r + table[x >> (w - 8 - r)]

{{anchor|clz3}}Binary search can reduce execution time to O(log2n):

'''function''' clz3 (x) '''if''' x = 0 '''return''' 32 n ← 0 '''if''' (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16 '''if''' (x & 0xFF000000) = 0: n ← n + 8, x ← x << 8 '''if''' (x & 0xF0000000) = 0: n ← n + 4, x ← x << 4 '''if''' (x & 0xC0000000) = 0: n ← n + 2, x ← x << 2 '''if''' (x & 0x80000000) = 0: n ← n + 1 '''return''' n

The fastest portable approaches to simulate clz are a combination of binary search and table lookup: an 8-bit table lookup (28=256 1-byte entries) can replace the bottom 3 branches in binary search. 64-bit operands require an additional branch. A larger width lookup can be used but the maximum practical table size is limited by the size of L1 data cache on modern processors. Saving a branch is more than offset by the latency of an L1 cache miss.

{{anchor|clz4}}An algorithm similar to de Bruijn multiplication for CTZ works for CLZ, but rather than isolating the most-significant bit, it rounds up to the nearest integer of the form 2''n''−1 using shifts and bitwise ORs:<ref name="Anderson_3"/> table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31} '''function''' clz4 (x) '''for each''' y '''in''' {1, 2, 4, 8, 16}: x ← x | (x >> y) '''return''' table[((x * 0x07C4ACDD) >> 27) % 32]

{{anchor|clz5}}For processors with deep pipelines, like Prescott and later Intel processors, it may be faster to replace branches by bitwise AND and OR operators (even though many more instructions are required) to avoid pipeline flushes for mispredicted branches (and these types of branches are inherently unpredictable):

'''function''' clz5 (x) r = (x > 0xFFFF) << 4; x >>= r; q = (x > 0xFF ) << 3; x >>= q; r |= q; q = (x > 0xF ) << 2; x >>= q; r |= q; q = (x > 0x3 ) << 1; x >>= q; r |= q; r |= (x >> 1); '''return''' r;

{{anchor|clz6}}On platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros. Corrections are needed to account for rounding errors.<ref name="Warren_2013_5-3"/><ref name="Anderson_4"/> Floating point conversion can have substantial latency. This method is highly non-portable and not usually recommended. <syntaxhighlight lang="c"> int x; int r; union { unsigned int u[2]; double d; } t;

t.u[LE] = 0x43300000; // LE is 1 for little-endian t.u[!LE] = x; t.d -= 4503599627370496.0; r = (t.u[LE] >> 20) - 0x3FF; // log2 r++; // CLZ </syntaxhighlight>

==Applications== The count leading zeros (clz) operation can be used to efficiently implement ''normalization'', which encodes an integer as ''m'' × 2''e'', where ''m'' has its most significant bit in a known position (such as the highest position). This can in turn be used to implement Newton–Raphson division, perform integer to floating point conversion in software, and other applications.<ref name="Warren_2013_5-3"/><ref name="Sloss_2004"/>

Count leading zeros (clz) can be used to compute the 32-bit predicate "x = y" (zero if true, one if false) via the identity {{tt|clz(x − y) >> 5}}, where ">>" is unsigned right shift.<ref name="Warren_2013_2-11"/> It can be used to perform more sophisticated bit operations like finding the first string of ''n'' 1 bits.<ref name="Warren_2013_6-2"/> The expression {{tt|1 << (16 − clz(x − 1)/2)}} is an effective initial guess for computing the square root of a 32-bit integer using Newton's method.<ref name="Warren_2013_11-1"/> CLZ can efficiently implement ''null suppression'', a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes.<ref name="Schlegel_2010"/> It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers.<ref name="Warren_2013_5-3"/>

The log base 2 can be used to anticipate whether a multiplication will overflow, since {{math|⌈log{{sub|2}}(xy)⌉ ≤ ⌈log{{sub|2}}(x)⌉ + ⌈log{{sub|2}}(y)⌉}}.<ref name="Warren_2013_2-12"/>

Count leading zeros and count trailing zeros can be used together to implement Gosper's loop-detection algorithm,<ref name="Gosper_1972"/> which can find the period of a function of finite range using limited resources.<ref name="Warren_2013_5-4"/>

The binary GCD algorithm spends many cycles removing trailing zeros; this can be replaced by a count trailing zeros (ctz) followed by a shift. A similar loop appears in computations of the hailstone sequence.

A bit array can be used to implement a priority queue. In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently. The Linux kernel real-time scheduler internally uses <code>sched_find_first_bit()</code> for this purpose.<ref name="Aas_2005"/>

The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move ''k'', disk number ctz(''k'') is moved the minimum possible distance to the right (circling back around to the left as needed). It can also generate a Gray code by taking an arbitrary word and flipping bit ctz(''k'') at step ''k''.<ref name="Warren_2013_5-4"/>

==See also== * {{Annotated link|Bit Manipulation Instruction Sets}} * Trailing zero * Leading zero * Trailing digit * Leading digit * Bit-length

==Notes== {{Reflist|group="nb"|refs= <ref group="nb" name="NB1">Using bit operations on other than an unsigned machine word may yield undefined results.</ref> <ref group="nb" name="NB2">These four operations also have (much less common) negated versions: * '''find first zero''' ('''ffz'''), which identifies the index of the least significant zero bit; * '''count trailing ones''', which counts the number of one bits following the least significant zero bit. * '''count leading ones''', which counts the number of one bits preceding the most significant zero bit; * find the index of the most significant zero bit, which is an inverted version of the binary logarithm.</ref> }}

==References== {{Reflist|refs= <ref name="Anderson_1">Anderson. [http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)].</ref> <ref name="Anderson_2">Anderson. [http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 Round up to the next highest power of 2].</ref> <ref name="Anderson_3">Anderson. [http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup].</ref> <ref name="Anderson_4">Anderson. [http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogIEEE64Float Find the integer log base 2 of an integer with a 64-bit IEEE float].</ref> <ref name="Linux_2012_FFS3">{{cite web |title=FFS(3) |url=https://www.kernel.org/doc/man-pages/online/pages/man3/ffs.3.html |work=Linux Programmer's Manual |publisher=The Linux Kernel Archives |access-date=2012-01-02}}</ref> <ref name="Apple_1994_FFS3">{{cite web |title=FFS(3) |url=https://developer.apple.com/library/mac/#documentation/darwin/reference/manpages/10.3/man3/ffs.3.html |work=Mac OS X Developer Library |publisher=Apple, Inc. |date=1994-04-19 |access-date=2012-01-04}}</ref> <ref name="FreeBSD_2012_FFS3">{{cite web |title=FFS(3) |url=http://www.freebsd.org/cgi/man.cgi?query=ffs&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html |work=FreeBSD Library Functions Manual |publisher=The FreeBSD Project |access-date=2012-01-04}}</ref> <ref name="SPARC_1992_A41">{{cite book |author=SPARC International, Inc. |title=The SPARC architecture manual: version 9 |date=1992 |publisher=Prentice Hall |location=Englewood Cliffs, New Jersey, USA |isbn=978-0-13-825001-0 |pages=[https://www.cs.utexas.edu/~novak/sparcv9.pdf#page=229 205] |edition=Version 9 |chapter=A.41: Population Count. Programming Note |url-access=registration |url=https://www.cs.utexas.edu/~novak/sparcv9.pdf#page=229}}</ref> <ref name="Warren_2013">{{cite book |title=Hacker's Delight |title-link=Hacker's Delight |author-first=Henry S. |author-last=Warren, Jr. |date=2013 |orig-date=2002 |edition=2 |publisher=Addison Wesley - Pearson Education, Inc. |isbn=978-0-321-84268-8 |id=0-321-84268-5}}</ref> <ref name="Warren_2013_2-11">Warren. Chapter 2-11: Comparison Predicates.</ref> <ref name="Warren_2013_2-12">Warren. Chapter 2-12: Overflow Detection.</ref> <ref name="Warren_2013_5-3">Warren. Chapter 5-3: Counting Leading 0's.</ref> <ref name="Warren_2013_5-4">Warren. Chapter 5-4: Counting Trailing 0's.</ref> <ref name="Warren_2013_6-2">Warren. Chapter 6-2: Find First String of 1-Bits of a Given Length.</ref> <ref name="Warren_2013_11-1">Warren. Chapter 11-1: Integer Square Root.</ref> <ref name="AD_2001">{{cite book |title=Blackfin Instruction Set Reference |date=2001 |publisher=Analog Devices |edition=Preliminary |pages=8–24 |id=Part Number 82-000410-14}}</ref> <ref name="Dietz">{{cite web |author-last=Dietz |author-first=Henry Gordon |author-link=Henry Gordon Dietz |title=The Aggregate Magic Algorithms |publisher=University of Kentucky |url=http://aggregate.org/MAGIC/#Trailing%20Zero%20Count |url-status=live |archive-url=https://web.archive.org/web/20191031070028/http://aggregate.org/MAGIC/#Trailing%20Zero%20Count |archive-date=2019-10-31}}</ref> <ref name="Isenberg">{{cite web |author-first=Gerd |author-last=Isenberg |title=BitScan: Index of LS1B by Popcount |work=Chess Programming Wiki (CPW) |date=2019-11-03 |orig-date=2012 |url=https://www.chessprogramming.org/BitScan#Index_of_LS1B_by_Popcount |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20200109231233/https://www.chessprogramming.org/BitScan#Index_of_LS1B_by_Popcount |archive-date=2020-01-09}}</ref> <ref name="Isenberg2">{{cite web |author-first=Gerd |author-last=Isenberg |title=BitScan: De Bruijn Multiplication |work=Chess Programming Wiki (CPW) |date=2019-11-03 |orig-date=2012 |url=https://www.chessprogramming.org/BitScan#De_Bruijn_Multiplication |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20200109231233/https://www.chessprogramming.org/BitScan#De_Bruijn_Multiplication |archive-date=2020-01-09}}</ref> <ref name="Isenberg3">{{cite web |author-first=Gerd |author-last=Isenberg |title=BitScan: Matt Taylor's Folding trick |work=Chess Programming Wiki (CPW) |date=2019-11-03 |orig-date=2012 |url=https://www.chessprogramming.org/BitScan#Matt_Taylor.27s_Folding_trick |access-date=2026-04-17 |url-status=live |archive-url=https://web.archive.org/web/20200109231233/https://www.chessprogramming.org/BitScan#Matt_Taylor.27s_Folding_trick |archive-date=2020-01-09}}</ref> <ref name="Leiserson_1998">{{cite web |author-last1=Leiserson |author-first1=Charles E. |author1-link=Charles E. Leiserson |author-last2=Prokop |author-first2=Harald |author2-link=Harald Prokop |author-last3=Randall |author-first3=Keith H. |title=Using de Bruijn Sequences to Index a 1 in a Computer Word |publisher=MIT Laboratory for Computer Science, Cambridge, MA, USA |date=1998-07-07 |url=http://supertech.csail.mit.edu/papers/debruijn.pdf |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20200109231248/http://supertech.csail.mit.edu/papers/debruijn.pdf |archive-date=2020-01-09}}</ref> <ref name="Busch_2009">{{cite web |author-last1=Busch |author-first1=Philip |title=Computing Trailing Zeros HOWTO |date=2009-03-01 |orig-date=2009-02-21 |url=http://7ooo.mooo.com/text/ComputingTrailingZerosHOWTO.pdf |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20160801030139/http://7ooo.mooo.com/text/ComputingTrailingZerosHOWTO.pdf |archive-date=2016-08-01}}</ref> <ref name="Sloss_2004">{{cite book |author-last1=Sloss |first1=Andrew N. |author-last2=Symes |author-first2=Dominic |author-last3=Wright |author-first3=Chris |author-link3= |title=ARM system developer's guide designing and optimizing system software |date=2004 |publisher=Morgan Kaufmann |location=San Francisco, CA, USA |isbn=978-1-55860-874-0 |pages=212–213 |edition=1}}</ref> <ref name="Schlegel_2010">{{cite book |author-last1=Schlegel |author-first1=Benjamin |author-first2=Rainer |author-last2=Gemulla |author-first3=Wolfgang |author-last3=Lehner |title=Proceedings of the Sixth International Workshop on Data Management on New Hardware |chapter=Fast integer compression using SIMD instructions |author-link3=:de:Wolfgang Lehner (Informatiker) |date=June 2010 |pages=34–40 |doi=10.1145/1869389.1869394 |isbn=978-1-45030189-3 |citeseerx=10.1.1.230.6379|s2cid=7545142 }}</ref> <ref name="Aas_2005">{{cite book |author-last=Aas |author-first=Josh |title=Understanding the Linux 2.6.8.1 CPU Scheduler |date=2005-02-17 |publisher=Silicon Graphics, Inc. (SGI) |page=19 |url=http://www.inf.ed.ac.uk/teaching/courses/os/prac/lcpusched-fullpage-2x1.pdf |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20170519130931/http://www.inf.ed.ac.uk/teaching/courses/os/prac/lcpusched-fullpage-2x1.pdf |archive-date=2017-05-19}}</ref> <ref name="Gosper_1972">{{cite journal |author-last=Gosper |author-first=Bill |author-link=Bill Gosper |editor-first=Henry Givens Jr. |editor-last=Baker |editor-link=Henry Baker (computer scientist) |title=Loop detector |journal=HAKMEM |date=April 1995 |orig-date=1972-02-29 |id=AI Memo 239 Item 132 |publisher=Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT) |location=Cambridge, Massachusetts, USA |edition=retyped & converted |url=http://www.inwap.com/pdp10/hbaker/hakmem/flows.html#item132 |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20191008021326/http://www.inwap.com/pdp10/hbaker/hakmem/flows.html#item132 |archive-date=2019-10-08}}</ref> <ref name="IBM_Z_C22">{{cite book |title=IBM z/Architecture Principles of Operation |date=March 2015 |edition=Eleventh |publisher=IBM |chapter=Chapter 22. Vector Integer Instructions |id=SA22-7832-10 |pages=((7{{hyphen}}219{{ndash}}22{{hyphen}}10)) |url=http://publibfp.dhe.ibm.com/epubs/pdf/dz9zr010.pdf |access-date=2020-01-10 |archive-date=2020-01-09 |archive-url=https://web.archive.org/web/20200109233116/http://publibfp.dhe.ibm.com/epubs/pdf/dz9zr010.pdf |url-status=dead }}</ref> <ref name="Motorola_1992">{{cite book |title=M68000 Family Programmer's Reference Manual (Includes CPU32 Instructions) |date=1992 |publisher=Motorola |id=M68000PRM/AD |edition=revision 1 |pages=((4{{hyphen}}43{{ndash}}4{{hyphen}}45)) |url=https://www.nxp.com/docs/en/reference-manual/M68000PRM.pdf |archive-url=https://web.archive.org/web/20191208205353/https://www.nxp.com/docs/en/reference-manual/M68000PRM.pdf |archive-date=2019-12-08}}</ref> <ref name="Atmel_AVR32">{{cite web |title=AVR32 Architecture Document |publisher=Atmel Corporation |id=32000D–04/201 |date=2011 |edition=CORP072610 |url=http://www.atmel.com/dyn/resources/prod_documents/doc32000.pdf |access-date=2016-10-22 |archive-url=https://web.archive.org/web/20171025180412/http://www.atmel.com/Images/doc32000.pdf |archive-date=2017-10-25}}</ref> <ref name="AMD_2011_AMD64">{{cite book |title=AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions |id=Publication No. 24594 |volume=3 |date=2011 |publisher=Advanced Micro Devices (AMD) |pages=204–205 |url=http://support.amd.com/us/Processor_TechDocs/APM_V3_24594.pdf}}</ref> <ref name="AMD_2013_AMD64">{{cite web |title=AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions |series=AMD64 Technology |volume=3 |id=Publication No. 24594 |edition=Version 3.28 |date=September 2019 |orig-date=2013 |publisher=Advanced Micro Devices (AMD) |url=http://support.amd.com/TechDocs/24594.pdf |access-date=2014-01-02 |url-status=live |archive-url=https://web.archive.org/web/20190930022109/https://www.amd.com/system/files/TechDocs/24594.pdf |archive-date=2019-09-30}}</ref> <ref name="ARM_2012_CLZ">{{cite web |title=ARM Instruction Reference > ARM general data processing instructions > CLZ |url=http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHJGJED.html |work=ARM Developer Suite Assembler Guide |publisher=ARM |access-date=2012-01-03}}</ref> <ref name="GCC_2015_Functions">{{cite web |title=Other built-in functions provided by GCC |url=https://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Other-Builtins.html |work=Using the GNU Compiler Collection (GCC) |publisher=Free Software Foundation, Inc. |access-date=2015-11-14}}</ref> <ref name="GCC_2015_Changes">{{cite web |title=GCC 3.4.0 ChangeLog |url=https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ChangeLog.9;h=8eed245136d1c2a1d198826fc692260ec077e14b;hb=3b3ea0678785edcb024c8fb6c2a870a1260bd407#l17526 |work=GCC 3.4.0 |publisher=Free Software Foundation, Inc. |access-date=2015-11-14}}</ref> <ref name="LLVM_Clang_Extensions">{{cite web |title=Clang Language Extensions - chapter Builtin Functions |url=http://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions |publisher=The Clang Team |access-date=2017-04-09 |quote=Clang supports a number of builtin library functions with the same syntax as GCC}}</ref> <ref name="LLVM_Clang_Sources">{{cite web |title=Source code of Clang |url=https://github.com/llvm-mirror/clang/blob/7a351322b4f455f6485649124423f347cb559c3b/include/clang/Basic/Builtins.def#L390 |publisher=LLVM Team, University of Illinois at Urbana-Champaign |access-date=2017-04-09}}</ref> <ref name="Microsoft_2008_Intrinsics_1">{{cite web |title=_BitScanForward, _BitScanForward64 |url=http://msdn.microsoft.com/en-us/library/wfd9z0bb(v=VS.90).aspx |work=Visual Studio 2008: Visual C++: Compiler Intrinsics | date=16 November 2012 |publisher=Microsoft |access-date=2018-05-21}}</ref> <ref name="Microsoft_2008_Intrinsics_2">{{cite web |title=_BitScanReverse, _BitScanReverse64 |url=https://msdn.microsoft.com/en-us/library/fbxyd7zd(v=VS.90).aspx |work=Visual Studio 2008: Visual C++: Compiler Intrinsics | date=16 November 2012 |publisher=Microsoft |access-date=2018-05-21}}</ref> <ref name="Microsoft_2008_Intrinsics_3">{{cite web |title=__lzcnt16, __lzcnt, __lzcnt64 |url=http://msdn.microsoft.com/en-us/library/bb384809(v=VS.90).aspx |work=Visual Studio 2008: Visual C++: Compiler Intrinsics |publisher=Microsoft |access-date=2012-01-03}}</ref> <ref name="Microsoft_2012_Intrinsics_1">{{cite web |title=ARM intrinsics |url=https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh875058(v=vs.110) |work=Visual Studio 2012: Visual C++: Compiler Intrinsics | date=20 August 2012 |publisher=Microsoft |access-date=2022-05-09}}</ref> <ref name="Intel_Intrinsics_Guide">{{cite web |title=Intel Intrinsics Guide |url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_bit_scan_ |publisher=Intel |access-date=2020-04-03}}</ref> <ref name="Intel_2006_Intrinsics">{{cite book |title=Intel C++ Compiler for Linux Intrinsics Reference |date=2006 |publisher=Intel |pages=21 |url=http://software.intel.com/file/6373}}</ref> <ref name="NVIDIA_2010_CUDA">{{cite book |title=NVIDIA CUDA Programming Guide |date=2010 |publisher=NVIDIA |pages=92 |edition=Version 3.0 |url=http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf}}</ref> <ref name="LLVM_Intrinsic">{{cite web |title='llvm.ctlz.*' Intrinsic, 'llvm.cttz.*' Intrinsic |url=http://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic |work=LLVM Language Reference Manual |publisher=The LLVM Compiler Infrastructure |access-date=2016-02-23}}</ref> <ref name="Compaq_2002_Alpha">{{cite book |title=Alpha Architecture Reference Manual |date=2002 |publisher=Compaq |pages=((4{{hyphen}}32, 4{{hyphen}}34)) |url=http://download.majix.org/dec/alpha_arch_ref.pdf}}</ref> <ref name="Intel_64-32_DM-2A">{{cite book |title=Intel 64 and IA-32 Architectures Software Developer Manual |publisher=Intel |volume=2A |pages=((3{{hyphen}}92{{ndash}}3{{hyphen}}97)) |url=http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html}} Order number 325383.</ref> <ref name="Intel_Itanium_DM-3">{{cite book |title=Intel Itanium Architecture Software Developer's Manual. Volume 3: Intel Itanium Instruction Set |volume=3 |date=2010 |publisher=Intel |pages=3:38 |url=http://www.intel.com/content/www/us/en/processors/itanium/itanium-architecture-vol-3-manual.html |archive-url=https://web.archive.org/web/20190626224557/https://www.intel.com/content/www/us/en/products/docs/processors/itanium/itanium-architecture-vol-3-manual.html |url-status=live |archive-date=2019-06-26 }}</ref> <ref name="MIPS_2011_32">{{cite book |title=MIPS Architecture For Programmers. Volume II-A: The MIPS32 Instruction Set |date=2011 |publisher=MIPS Technologies |edition=Revision 3.02 |pages=101–102 |url=http://www.mips.com/products/architectures/mips32/ |access-date=2012-01-04 |archive-date=2017-11-07 |archive-url=https://web.archive.org/web/20171107164901/http://www.mips.com/products/architectures/mips32/ |url-status=dead }}</ref> <ref name="MIPS_2011_64">{{cite book |title=MIPS Architecture For Programmers. Volume II-A: The MIPS64 Instruction Set |date=2011 |publisher=MIPS Technologies |edition=Revision 3.02 |pages=105, 107, 122, 123 |url=http://www.mips.com/products/architectures/mips64/ |archive-date=2017-11-07 |access-date=2012-01-04 |archive-url=https://web.archive.org/web/20171107060833/http://www.mips.com/products/architectures/mips64/ |url-status=dead }}</ref> <ref name="Frey_PowerPC">{{cite book |author-last=Frey |author-first=Brad |title=PowerPC Architecture Book |publisher=IBM |chapter=Chapter 3.3.11 Fixed-Point Logical Instructions |pages=70 |edition=Version 2.02 |url=http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html}}</ref> <ref name="IBM_PowerISA">{{cite book |title=Power ISA Version 3.0B |publisher=IBM |chapter=Chapter 3.3.13 Fixed-Point Logical Instructions - Chapter 3.3.13.1 64-bit Fixed-Point Logical Instructions |pages=95, 98 |url=https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0}}</ref> <ref name="Wolf_2019_RISC-V-B">{{cite web |title=RISC-V "B" Bit Manipulation Extension for RISC-V |type=Draft |edition=v0.37 |date=2019-03-22 |author-last=Wolf |author-first=Clifford |work=Github |url=https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-draft.pdf |access-date=2020-01-09 }}</ref> <ref name="Oracle_2011_SPARC">{{cite book |title=Oracle SPARC Architecture 2011 |publisher=Oracle |date=2011 |url=http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/index.html}}</ref> <ref name="DEC_1987_VAX">{{cite book |title=VAX Architecture Reference Manual |date=1987 |publisher=Digital Equipment Corporation (DEC) |pages=70–71 |url=http://www.bitsavers.org/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf |access-date=2020-01-09 |url-status=live |archive-url=https://web.archive.org/web/20190929103217/http://bitsavers.org/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf |archive-date=2019-09-29}}</ref> }}

==Further reading== * {{anchor|Warren}}{{Cite book |title=Hacker's Delight |title-link=Hacker's Delight |author-first=Henry S. |author-last=Warren, Jr. |date=2013 |orig-date=2002 |edition=2 |publisher=Addison Wesley - Pearson Education, Inc. |isbn=978-0-321-84268-8 |id=0-321-84268-5}} * {{anchor|Anderson}}{{cite web |author-last=Anderson |author-first=Sean Eron |title=Bit Twiddling Hacks |url=http://graphics.stanford.edu/~seander/bithacks.html |date=2005 |orig-date=1997 |publisher=Stanford University |access-date=2012-01-03 |url-status=live |archive-url=https://web.archive.org/web/20200108172348/http://graphics.stanford.edu/~seander/bithacks.html |archive-date=2020-01-08}} (NB. Lists several efficient public domain C implementations for [http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightLinear count trailing zeros] and [http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious log base 2].)

==External links== * [https://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide] * [https://www.chessprogramming.org/BitScan Chess Programming Wiki: BitScan]: A detailed explanation of a number of implementation methods for ffs (the index of the least significant 1 bit "LS1B") and log base 2 (the index of the most significant 1 bit "MS1B") of 64-bit values.

Category:Binary arithmetic Category:Computer arithmetic