This results in a slightly smaller overall FFT function
when the compiler inlines the stage/merge functions,
when it might not otherwise.
This also resolves the GCC warnings:
warning: no previous declaration for 'merge_rfft_f16' [-Wmissing-declarations]
warning: no previous declaration for 'merge_rfft_f32' [-Wmissing-declarations]
warning: no previous declaration for 'merge_rfft_f64' [-Wmissing-declarations]
warning: no previous declaration for 'stage_rfft_f16' [-Wmissing-declarations]
warning: no previous declaration for 'stage_rfft_f32' [-Wmissing-declarations]
warning: no previous declaration for 'stage_rfft_f64' [-Wmissing-declarations]