Yifeihu

Models by this creator

๐ŸŽฒ

TB-OCR-preview-0.1

yifeihu

Total Score

115

TB-OCR-preview-0.1 is an end-to-end optical character recognition (OCR) model developed by Yifei Hu that can handle text, math LaTeX, and Markdown formats simultaneously. It takes a block of text as input and returns clean Markdown output, with headers marked by ## and math expressions wrapped in brackets \( inline math \) \[ display math \] for easy parsing. This model does not require separate line detection or math formula detection. Model inputs and outputs Inputs A block of text containing a mix of regular text, math LaTeX, and Markdown formatting. Outputs Clean Markdown output with headers, math expressions, and other formatting properly identified. Capabilities TB-OCR-preview-0.1 can accurately extract and format text, math, and Markdown elements from a given block of text. This is particularly useful for tasks like digitizing scientific papers, notes, or other documents that contain a mix of these elements. What can I use it for? TB-OCR-preview-0.1 is well-suited for use cases where you need to convert scanned or photographed text, math, and Markdown content into a more structured, machine-readable format. This could include tasks like automating the digitization of research papers, lecture notes, or other technical documents. Things to try Consider combining TB-OCR-preview-0.1 with the TFT-ID-1.0 model, which specializes in text, table, and figure detection for full-page OCR. This can be more efficient than using TB-OCR-preview-0.1 on entire pages, as it allows you to split the text into smaller blocks and process them in parallel.

Read more

Updated 9/18/2024

๐ŸŽฒ

TB-OCR-preview-0.1

yifeihu

Total Score

115

TB-OCR-preview-0.1 is an end-to-end optical character recognition (OCR) model developed by Yifei Hu that can handle text, math LaTeX, and Markdown formats simultaneously. It takes a block of text as input and returns clean Markdown output, with headers marked by ## and math expressions wrapped in brackets \( inline math \) \[ display math \] for easy parsing. This model does not require separate line detection or math formula detection. Model inputs and outputs Inputs A block of text containing a mix of regular text, math LaTeX, and Markdown formatting. Outputs Clean Markdown output with headers, math expressions, and other formatting properly identified. Capabilities TB-OCR-preview-0.1 can accurately extract and format text, math, and Markdown elements from a given block of text. This is particularly useful for tasks like digitizing scientific papers, notes, or other documents that contain a mix of these elements. What can I use it for? TB-OCR-preview-0.1 is well-suited for use cases where you need to convert scanned or photographed text, math, and Markdown content into a more structured, machine-readable format. This could include tasks like automating the digitization of research papers, lecture notes, or other technical documents. Things to try Consider combining TB-OCR-preview-0.1 with the TFT-ID-1.0 model, which specializes in text, table, and figure detection for full-page OCR. This can be more efficient than using TB-OCR-preview-0.1 on entire pages, as it allows you to split the text into smaller blocks and process them in parallel.

Read more

Updated 9/18/2024

๐Ÿงช

TFT-ID-1.0

yifeihu

Total Score

85

The TFT-ID (Table/Figure/Text IDentifier) model is an object detection model fine-tuned to extract tables, figures, and text sections from academic papers. Developed by Yifei Hu, the model is based on the microsoft/Florence-2 checkpoints and was trained on over 36,000 manually annotated bounding boxes from the Hugging Face Daily Papers dataset. The model takes an image of a single paper page as input and returns bounding boxes for all tables, figures, and text sections present, along with the corresponding labels. This makes it a useful tool for academic document processing workflows, as the extracted text sections can be easily fed into downstream OCR systems. Model Inputs and Outputs Inputs Paper page image**: The model takes an image of a single page from an academic paper as input. Outputs Object detection results**: The model outputs a dictionary containing the bounding boxes and labels for all detected tables, figures, and text sections in the input image. The format is: {'': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]}} Capabilities The TFT-ID model is highly accurate, achieving a 96.78% success rate on a test set of 373 paper pages. It is particularly effective at identifying tables and figures, reaching a 98.84% success rate on a subset of 258 images. The model's ability to extract clean text content from the identified text sections makes it a valuable tool for academic document processing pipelines. However, it is important to note that the TFT-ID model is not an OCR system itself, and the extracted text may still require further processing. What Can I Use It For? The TFT-ID model is well-suited for automating the extraction of tables, figures, and text sections from academic papers. This can be particularly useful for researchers, publishers, and academic institutions looking to streamline their document processing workflows. Some potential use cases include: Academic document processing**: Integrating the TFT-ID model into a document processing pipeline to automatically identify and extract relevant content from academic papers. Literature review automation**: Using the model to rapidly locate and extract tables, figures, and key text sections from a large corpus of academic literature, facilitating more efficient literature reviews. Dataset curation**: Employing the TFT-ID model to generate structured datasets of tables, figures, and text from academic papers, which can then be used to train other AI models. Things to Try One interesting aspect of the TFT-ID model is its ability to handle a variety of table and figure formats, including both bordered and borderless elements. This robustness can be further explored by testing the model on academic papers with diverse layouts and visual styles. Additionally, the model's integration with downstream OCR workflows presents opportunities for experimentation. Users could, for example, evaluate the quality and accuracy of the extracted text sections, and explore ways to optimize the overall document processing pipeline. Finally, the TFT-ID model's exceptional performance on table and figure detection tasks suggests that it could be a valuable component in more complex academic document understanding systems, such as those focused on automated summarization or knowledge extraction.

Read more

Updated 9/18/2024