Welcome to overview-upload’s documentation!

Upload files and directories to Overview.

Installation

sudo pip3 install overview-upload

Example Usage

import overview_upload

server = 'https://www.overviewdocs.com'
# Generate an API token by browsing to
# https://www.overviewdocs.com/documentsets/[ID]/api-tokens
# (where `[ID]` is an existing document-set ID.
api_token = 'rakhtasd...'

upload = overview_upload.Upload(server, api_token)

# Want to see what's happening? This will print to stderr:
import logging
upload.logger.setLevel(logging.DEBUG)
upload.logger.addHandler(logging.StreamHandler())

# Delete server-side uploaded, un-processed files
upload.clear_previous_upload()

# Upload all files in this directry, with titles relative to
# this directory
upload.send_directory('.')

# Upload a specific file
import pathlib
upload.send_path_if_conditions_met(
    pathlib.Path('index.md'), # valid Path
    'doc/index.md'            # Overview document title
)

# Upload a BytesIO object. This will read the whole file
# into memory before uploading it.
#
# In this example, we'll upload Overview's own homepage.
import urllib
url = 'https://www.overviewdocs.com'
overview_title = url + '/index.html'
with urllib.request.urlopen(url) as r:
    upload.send_file_if_conditions_met(r, overview_title)

# Finally, now that we've uploaded all these files, let's
# add them to the document set! The API token specifies
# which document set we'll add to.
upload.finish(lang='en', ocr=False, split_by_page=False)

# And then browse to
# https://www.overviewdocs.com/documentsets/[id] -- there
# will be new documents for the files you uploaded.

API

Utilities for uploading to www.overviewdocs.com via its API

class overview_upload.Upload(server_url, api_token, logger=None)[source]

Start an Upload session.

Parameters:
clear_previous_upload()[source]

Remove any previously uploaded files from the server.

If you don’t call this, then when you finish() you may find Overview adds files you uploaded sometime in the past and then forgot about.

finish(lang='en', ocr=True, split_by_page=False)[source]

Adds sent files to the document set.

Parameters:
  • lang (str) – ISO language code for Overview’s analysis (default is "en")
  • ocr (bool) – if True (the default), tell Overview to read text from PDF pages that contain only images.
  • split_by_page (bool) – if True, tell Overview to create a document per page of the input file. (This only applies to PDFs and LibreOffice-compatible documents.) If False (the default), tell Overview to create one document per uploaded file.
is_file_already_in_document_set(in_file, sha1=None)[source]

Return True iff the document set contains an identical file.

This works by calculating the SHA1 and asking Overview whether it’s been seen before in our document set. Files sent without a call to finish() will not be included in this check.

Parameters:
  • in_file (io.BytesIO) – bytes to upload to Overview.
  • sha1 (str) – if set, assume the given SHA1 hash instead of computing it by reading the file. If None (the default), then in_file will be read completely.
send_directory(dirname, skip_unhandled_extension=True, skip_duplicate=True, metadata=None)[source]

Upload all files in a directory to the Overview server.

If skip_duplicate == False, then files will be streamed to the server. Otherwise, this method will cache each file in memory during upload.

Parameters:
  • dirname (str) – Directory to upload.
  • skip_unhandled_extension (bool) – if True (the default), do not upload files when Overview doesn’t support their filename extensions (for instance, ".dbf").
  • skip_duplicate (bool) – if True (the default), do not upload a file if api_token points to a document set that already contains a file whose sha1 hash is identical to this file’s. Files that have been sent without a call to finish() will not be included in the check. If False, stream files instead of caching them.
  • metadata (dict) – Metadata to set on every document, or None. The document set should have a metadata schema that corresponds to this document’s metadata (or you can set the schema later).
send_file_if_conditions_met(in_file, filename, n_bytes=None, skip_unhandled_extension=True, skip_duplicate=True, metadata=None, sha1=None)[source]

Upload a file to the Overview server.

If n_bytes is None or (skip_duplicate == True and sha1 is None), then in_file will be cached in memory. Otherwise, it will be streamed to the server, saving memory.

Parameters:
  • in_file (io.BytesIO) – BytesIO containing the document.
  • filename (str) – Filename to set in Overview.
  • n_bytes (int) – Exact file size (or None to auto-calculate). Supply this and sha1 (if applicable) to stream in_file to the server instead of caching it in memory.
  • skip_unhandled_extension (bool) – if True (the default), do not upload this file if Overview doesn’t support its filename extension (for instance, ".dbf").
  • skip_duplicate (bool) – if True (the default), do not upload this file if your api_token points to a document set that already contains a file whose sha1 hash is identical to this file’s. Files that have been sent without a call to finish() will not be included in the check.
  • metadata (dict) – Metadata to set on the document, or None. The document set should have a metadata schema that corresponds to this document’s metadata (or you can set the schema later).
  • sha1 (str) – SHA1 hash:to use in skip_duplicate() check, or None to calculate on the fly. If you set this and n_bytes, this method will stream the file contents instead of caching them in memory.
send_path_if_conditions_met(path, filename, skip_unhandled_extension=True, skip_duplicate=True, metadata=None)[source]

Upload the file at the specified Path to the Overview server.

The file will be streamed: that is, the script does not risk running out of memory.

Parameters:
  • path (pathlib.Path) – absolute or relative pathlib.Path pointing to the document.
  • filename (str) – filename Overview should use.
  • skip_unhandled_extension (bool) – – if True (the default), do not upload this file if Overview doesn’t support its filename extension (for instance, ".dbf").
  • skip_duplicate (bool) – if True (the default), do not upload this file if your api_token points to a document set that already contains a file whose sha1 hash is identical to this file’s. Files that have been sent but not finish()ed will not be included in the check.
  • metadata (dict) – Metadata to set on the document, or None. The document set should have a metadata schema that corresponds to this document’s metadata (or you can set the schema later).