Use Sphinx for Documentation¶
This guide details some basics on using Sphinx to document Agatha. The goal is to produce a human-readable website on ReadTheDocs.org in the easiest way possible.
Writing Function Descriptions Within Code¶
I’ve configured Sphinx to accept Google Docstrings and to parse python3 type-hints. Here’s a full example:
def parse_predicate_name(predicate_name:str)->Tuple[str, str]:
"""Parses subject and object from predicate name strings.
Predicate names are formatted strings that follow this convention:
p:{subj}:{verb}:{obj}. This function extracts the subject and object and
returns coded-term names in the form: m:{entity}. Will raise an exception if
the predicate name is improperly formatted.
Args:
predicate_name: Predicate name in form p:{subj}:{verb}:{obj}.
Returns:
The subject and object formulated as coded-term names.
"""
typ, sub, vrb, obj = predicate_name.lower().split(":")
assert typ == PREDICATE_TYPE
return f"{UMLS_TERM_TYPE}:{sub}", f"{UMLS_TERM_TYPE}:{obj}"
Lets break that down. To document a function, first you should write a good
function signature. This means that the types for each input and the return
value should have associated hints. Here, we have a string input that returns a
tuple of two strings. Note, to get type hints for many python standard objects,
such as lists, sets, and tuples, you will need to import the typing
module.
Assuming you’ve got a good function signature, you can now write a google-formatted docstring. There are certainly more specific formate options than listed here, but at a minimum you should include:
Single-line summary
Short description
Argument descriptions
Return description
These four options are demonstrated above. Note that this string should occur as a multi-line string (three-quotes) appearing right below the function signature.
Note: at the time of writing, preactically none of the functions follow this guide. If you start modifying the code, try and fill in the backlog of missing docstrings.
Writing Help Pages¶
Sometimes you will have to write guides that are supplemental to the codebase
itself (for instance, this page). To do so, take a look at the docs
subdirectory from the root of the project. Here, I have setup docs/help
, and
each file within this directory will automatically be included in our online
documentation. Furthermore, you can write in either reStructuredText or
Markdown. I would recommend Markdown, only because it is simpler. These
files must end in either .rst
or .md
based on format.
Compiling the Docs¶
Note that this describes how to build the documentation locally, skip ahead to see how we use ReadTheDocs to automate this process for us.
Assuming the Agatha module has been installed, including the additional modules
in requirements.txt
, you should be good to start compiling. Inside docs
there is a Makefile
that is preconfigured to generate the API documentation as
well as any extra help files, like this one. Just type make html
while in
docs
to get that process started.
First, this command will run sphinx-apidoc
on the agatha
project in order to
extract all functions and docstrings. This process will create a docs/_api
directory to store all of the intermediate API-generated documentation. Next,
it will run sphinx-build
to compile html
files from all of the user-supplied
and auto-generated .rst
and .md
files. The result will be placed in
/docs/build
.
The compilation process may throw a lot of warnings, especially because there are many incorrectly formatted docstrings present in the code that predate our adoption of sphinx and google-docstrings. This is okay as long as the compilation process completes.
Using ReadTheDocs¶
We host our documentation on ReadTheDocs.org. This service is hooked into
our repository and will automatically regenerate our documentation every time we
push a commit to master. Behind the scenes this service will build our api
documentation read in all of our .rst
and .md
files for us. This process
will take a while, but the result should appear online after a few minutes.
Updating Dependencies for Read the Docs¶
The hardest part about ReadTheDocs is getting the remote server to properly install all dependencies needed within the memory and time constraints that come along with using a free 3rd party service. We solve this problem by using a combination of a lightweight conda environment, and heavy use of the mockup function of sphinx autodoc.
Some dependencies, such as protobuf, can only be installed via
conda. Additionally, because the conda environment creation process is the first
step that ReadTheDocs will perform each build, we also load in our
documentation-specific requirements. These modules are specified in
docs/environment.yaml
.
The rest of the dependencies take too long and use too much memory to be installed on ReadTheDocs. At the time of writing we only receive 900 seconds and 500mb of memory in order to build the entire package. Furthermore, many of our dependencies may have version conflicts that can cause unexpected issues that are hard to debug on the remote server. To get around this limitation, we mockup all of our external dependencies when ReadTheDocs builds our project.
When the READTHEDOCS
environment variable is set to True
, we make two
modifications to our documentation creation process. Firstly, setup.py
is
configured to drop all requirements, meaning that only the Agatha source code
itself will be installed. In order to load our source without error, we make the
second change in docs/conf.py
. Here, we set autodoc_mock_imports
to be a
list of all top-level imported modules within Agatha. Unfortunately, some
package names are different from their corresponding module names (pip install faiss_cpu
provides the faiss
module for instance). Therefore, the list of
imported modules has to be duplicated in docs/conf.py
.
Because we are mocking up all of our dependencies, there are some lower-quality
documents in places. Specifically, where we use type hints for externally
defined classes. Future work could try to selectively enable some modules for
better documentation on ReadTheDocs. However, one can always build
higher-quality documentation locally by installing the package with all
dependencies and running make html
in docs/
.