Recent advancement of AI technologies like deep learning or deep reinforcement learning greatly enhance robot’s capacities in many practical applications. In this talk, I will focus on how computer vision, deep learning and sensor fusion can be leveraged for better robotic perception and manipulation which enables the robots to better serve for industrial application and healthcare.I will start with our current work that exploits deep learning-based method to achieve the accurate estimation of object postures in the real scene. Bin picking robots have been explored for years.With the advancement in deep learning on point cloud, the object detection and pose estimation are intelligently achieved without explicit feature extraction. However, existing methods are only capable of estimating the bounding box but not the precise 3D position of an object. We propose anew learning-based object detection and pose estimation method for bin picking problem. We leverage PointNet and its following ideas for estimating 6 DoF parameters of objects. Our network estimates the translations and then rotations, and finally, we integrate the estimated postures by using mean-shift clustering. We trained our network using the data generated by physical simulations and stereo measurement system simulations. We demonstrate that our network can achieve the accurate estimation of object postures in the real scene without any transfer learning techniques. Furthermore, I will present our recent work on Robot-Enhanced therapy for children with autism spectrum disorder (ASD). Traditionally, diagnosis of ASD depend on doctor’s looking at a patient’s behavior and development, which is subjective and time consuming. We present a novel sensing-enhanced therapy system including three cameras and two Kinects to automatically, accurately and non-intrusively monitor ASD children in multiple scenarios. Involved technics are including but not limited to face expression, human motion analysis and gaze estimation.Furthermore, we propose a multimodal approach to detect ASD severity in participants undergoing treatment for ASD.Finally, I will present our earlier work on an ultrasound-based human-machine interface(HMI) for dexterous prosthetic control. Surface electromyography (EMG) is widely investigated in HMI by decoding movement intention to intuitively control intelligent devices, but could not offer satisfactory solutions for finger motion classification. We build a novel B mode ultrasound image-based HMI. Then, I present experiments on a finger motion task where this ultrasound-based HMI has a better average accuracy (95.88%) than the EMG-based HMI(90.14%) for finer motion recognition. Furthermore, I will discuss how to use machine learning and image process methods to improve the robustness of our proposed HMI in practice.