Skip to content

Convert rtf files to plaintext #19

@dputtick

Description

@dputtick

Currently, filecheck.py leaves rtf files untouched, and only changes their extension to .txt (in File.text()). An rtf, when opened as plaintext, will be difficult to read due to the various pieces of formatting code mixed in the text. Ideally, we should be able to extract the content from an rtf file during processing.

Unfortunately, there aren't any great existing solutions for this other than OpenOffice, which gives us a dependency we'd prefer not to have. If you don't need 100% compatibility, it's fairly reasonable to write an rtf parser: here is a library that implements most of the behavior we want, which could be a good starting point. Unfortunately, that code is Python 2 only, and perhaps a little verbose.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions