Document layout comprises both structural and visual (eg. font-sizes)information that is vital but often ignored by machine learning models . We propose a novel layout-aware multimodalhierarchical framework, LAMPreT, to model the blocks and the whole document . We evaluate the proposed model on twolayout-aware tasks — text block filling and image suggestion — and show the effectiveness of our proposed hierarchical architecture as well as pretrainingtechniques . We design hierarchicalpretraining objectives where the lower-level model is trained similarly tomultimodal grounding models, and the higher-level models are trained with our novel layouts-aware objectives. We evaluate a model on text block-filled and image-block filling and . image suggestion. We show theeffectiveness of

Author(s) : Te-Lin Wu, Cheng Li, Mingyang Zhang, Tao Chen, Spurthi Amba Hombaiah, Michael Bendersky

Links : PDF - Abstract

Code :
Coursera

Keywords : model - aware - layout - image - block -

Leave a Reply

Your email address will not be published. Required fields are marked *