3D Tele-Immersion (3DTI) environments are emerging as a new medium for human interactions and collaborations in the areas of education, sports training, physical medicine and rehabilitation. By adding a tactile element to a visually centered 3DTI environment, such applications can be made even more engaging. But it also opens up a few challenges in terms of fusing the visual and tactile data streams in a synchronous way. In this paper we describe a 3DTI Tele-Rehabilitation system with Microsoft Kinect cameras and hap tic devices. We describe some of the challenges we face in providing as well as quantifying a good quality of experience (QoE) in this system. We propose a set of solutions that: (i) improve the user's QoE (by using multi-modal prediction for handling latencies, better synchronization that accounts for the global state of the system, etc.), (ii) quantify the QoE (by designing a controlled virtual environment and by defining appropriate user QoE metrics for immersive tele-rehabilitation). The experimental results show a marked improvement in the performance of the system, consequently improving the user-experience. This is also verified by the results of the user performance study.