Text-IVu Dataset

Text-IVu Banner

Text-IVu is a publicily available dataset containing 453 Images (403 Test Images and 50 Training Images) designed to represent the point of view from a human or humanoid robotic user in everyday scenes. The images are annotated to maintain the grammatical structures contain within text. Each word is a member of a given line and paragraph. The images are split over three distinct types; the handheld object set, the indoor scene set and the outdoor scene set.


Handheld Objects

Hand Held Objects

The Handheld Object Set is a range of images where an object is the principal focus point, and all images are taken from the perspective of a humanoid user. All images in this category contain a hand or form of gripper. The objects consist of branded and unique unbranded items.

Indoor Scenes

Indoor Scenes

The Indoor Scene Set consists of images take around the home and work space. Unlike the Handheld Object Set items containing text might not be the principal focus point. The scenes range from views with a single safety warning message to busy kitchen cupboards.

Outdoor Scenes

Outdoor Scenes

The Outdoor Scene Set covers a range of scenes from signposts and street names to bus posters and shop front branding. Text may not be the principle focus point and is unlikely to be fronto parallel to the camera. Outdoor scenes will contain a wider and greater depth of view compared to the other sets.

Ground Truth

The Text-IVu dataset is annotated using a word as the smallest unit, with every word belonging to a line and paragraph. For the purpose of this dataset a line is defined as containing one or more words that share geometric/style similarities and lead on from each other. A paragraph is defined as a collection of lines based on their geometric/style similarities. Under this definition, a paragraph may contain one or more lines.

Ground Truth

Image XML

Image XML

Each Image XML file starts with the XML file type and then the Text_Trust tag, the Image name is stored as an attribute of this tag. The main body of text is a list of paragraphs.


Paragraph XML

The paragraph tag will show the first line of text as the content attribute and will contain tags for a bounding box and a list of lines.


Line XML

The line tag will have the contents attribute and will contain tags for a bounding box and a list of words.


Word XML

The word tag will show the content attribute, a bounding box and a perspective box. The perspective box is the four points of a quadrilateral that are required to bring the word fronto parallel.


You may download a Text-IVu Dataset here.

Related Publications

C. Beck, A. Broun, M. Mirmehdi, T. Pipe and C. Melhuish. "Text Line Aggregation". ICPRAM 2014.

Back to Robot Vision

Page last updated 12 May 2016