Bilateral filters are widely used in computer vision and digital imaging applications such as denoising, video abstraction, demosaicing, optical-flow estimation etc. to name a few. Its smoothing and edge preserving characteristics suites perfectly for image and video processing applications, yet its high computational complexity makes real-time hardware implementation a challenging task. This paper provides an efficient Field Programmable Gate Array (FPGA) based implementation of an edge preserving fast bilateral filter on a hardware software co-design environment of a most recent algorithm preserving the boundaries, spikes and canyons in presence of noise. Further, the four stage parallel pipelined architecture greatly improves the speed of operation. Moreover, our separable kernel implementation of the filtering hardware increases the speed of execution by almost five times than the traditional convolution filtering, while utilizing less hardware resource. © 2013 IEEE.