-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
Description
Currently, filecheck.py leaves rtf files untouched, and only changes their extension to .txt (in File.text()). An rtf, when opened as plaintext, will be difficult to read due to the various pieces of formatting code mixed in the text. Ideally, we should be able to extract the content from an rtf file during processing.
Unfortunately, there aren't any great existing solutions for this other than OpenOffice, which gives us a dependency we'd prefer not to have. If you don't need 100% compatibility, it's fairly reasonable to write an rtf parser: here is a library that implements most of the behavior we want, which could be a good starting point. Unfortunately, that code is Python 2 only, and perhaps a little verbose.