Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Read original: arXiv:2408.00298 - Published 8/2/2024 by Ragav Sachdeva, Gyungin Shin, Andrew Zisserman
Total Score

0

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a method for automatically transcribing manga (Japanese comic books) chapters with character names.
  • It addresses the challenge of extracting text from manga panels and associating it with the correct speaking characters.
  • The proposed approach aims to enable more accessible and engaging digital manga experiences.

Plain English Explanation

The research paper describes a system for automatically transcribing the text in manga chapters and linking it to the specific characters who are speaking. This is an important problem to solve, as manga are a popular form of visual storytelling that can be difficult for some readers to fully engage with, especially those with visual impairments.

By being able to associate the text with the characters who are speaking, the researchers aim to make manga more accessible and enjoyable for a wider audience. This could involve features like text-to-speech, where the dialogue is read aloud while highlighting the corresponding character, or the ability to filter content to focus on specific characters.

Overall, this work represents an important step towards making manga more accessible and engaging for readers of all abilities.

Technical Explanation

The key technical components of the proposed system are:

  1. Text Extraction: The researchers developed a model to detect and extract text from manga panel images. This involves leveraging computer vision techniques to identify and parse the text regions.

  2. Character Association: The extracted text is then associated with the specific characters who are speaking. This is accomplished through a character identification model that can recognize and link the text to the corresponding character.

  3. Chapter-Level Transcription: The text extraction and character association are performed at the chapter-level, allowing the system to maintain context-aware information about the narrative and character interactions.

The researchers evaluated their approach on a large dataset of manga chapters and demonstrated its effectiveness in accurately transcribing the text and correctly associating it with the speaking characters.

Critical Analysis

The paper provides a thorough and well-designed approach to the problem of manga transcription and character association. However, a few potential limitations and areas for further research are worth noting:

  1. Handling Visual Complexity: Manga panels can often be highly complex, with overlapping text, intricate backgrounds, and diverse artistic styles. The researchers acknowledge that their current text extraction model may struggle in some of these more challenging scenarios, and additional work may be needed to improve robustness.

  2. Multilingual Support: The paper focuses on Japanese manga, but the researchers mention the potential to extend the approach to other languages. Adapting the system to handle different writing systems and linguistic structures could be an interesting area for future work.

  3. User Evaluation: While the paper presents technical performance metrics, it would be valuable to also evaluate the system's impact on the user experience, particularly for readers with disabilities or other accessibility needs. Gathering feedback and insights from these target users could help refine the system's design and features.

Conclusion

This research paper presents a novel approach for automatically transcribing manga chapters and associating the text with the speaking characters. By addressing the challenges of text extraction and character identification, the proposed system has the potential to make manga more accessible and engaging for a wider audience, including those with visual impairments or other accessibility needs.

The technical advancements demonstrated in this work represent an important step towards more accessible and context-aware digital manga experiences. As the researchers continue to refine and expand their approach, it could have significant implications for the manga industry and the broader field of digital accessibility.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
Total Score

0

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Ragav Sachdeva, Gyungin Shin, Andrew Zisserman

Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi

Read more

8/2/2024

The Manga Whisperer: Automatically Generating Transcriptions for Comics
Total Score

0

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Ragav Sachdeva, Andrew Zisserman

In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way. To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and (c) associate dialogues to their speakers; (2) we propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) we annotate an evaluation benchmark for this task using publicly available [English] manga pages. The code, evaluation datasets and the pre-trained model can be found at: https://github.com/ragavsachdeva/magi.

Read more

8/2/2024

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
Total Score

0

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Due to the unique features of comics, we enhance the performance of our proposed model by considering the frame reading order. We conducted experiments using Manga109Dialog and other datasets. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.

Read more

4/23/2024

👀

Total Score

0

Toward accessible comics for blind and low vision readers

Christophe Rigaud (L3I), Jean-Christophe Burie (L3I), Samuel Petit (Comix AI)

This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects.

Read more

9/11/2024