read_files_and_dirs

ucca.ioutil.read_files_and_dirs(files_and_dirs, sentences=False, paragraphs=False, converters=None, lang='en', attempts=3, delay=5)[source]
Parameters:
  • files_and_dirs – iterable of files and/or directories to look in
  • sentences – whether to split to sentences
  • paragraphs – whether to split to paragraphs
  • converters – dict of input format converters to use based on the file extension
  • lang – language to use for tokenization model
  • attempts – number of times to try reading a file before giving up
  • delay – number of seconds to wait before subsequent attempts to read a file
Returns:

lazy-loaded passages from all files given, plus any files directly under any directory given