Visual 2.5D perception involves understanding the semantics and geometry of ascene through reasoning about object relationships with respect to the viewer in an environment . Unlike general VRD, 2 .5VRD isegocentric, using the camera’s viewpoint as a common reference . Unlike depth estimation, it is object-centric and not onlyfocuses on depth . To enable progress on this task, we create a new dataset of 220k human-annotated . relationships among 512K objects from11K images . We analyze this dataset and conduct extensive experiments including benchmarking . multiple state-of-the-art VRD models . Our resultsshow that existing models . largely rely on semantic cues and simple heuristicsto solve 2. 5VRD

Author(s) : Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

Links : PDF - Abstract

Code :

Keywords : vrd - k - d - dataset - visual -

Leave a Reply

Your email address will not be published. Required fields are marked *