Decoupling objects in Python
I was recently working on a project at work involving generating code from Word documents (how’d you guess from that sentence that I work with medical devices?).
Without getting into too much detail, we have these documents that, for regulatory purposes, act as the single source of truth for our core algorithms, and we need to be able to prove that the algorithms described in the document are implemented exactly as described in the source documents.
Luckily for me, the documents are formatted in such a way that they can easily be parsed and represented abstractly as Python objects. Once the properties of the core components of the algorithms have been deduced, you can then generate code to implement these algorithms in your language of choice.
The wrong way
When I first started this project, I only had one language in mind for code generation: C
. So I wrote something like this:
class ParsedAction:
def __init__(self, *args, **kwargs):
pass
def generate_code(self):
print("string representation of this object and its properties for C")
class AlgorithmParser:
def __init__(self, *args, **kwargs):
pass
def scan_document_gen(self, filename):
# Pretend for a minute this is how you extract lines out of a docx... (see docx module for real world code)
with open(filename, "r") as f:
lines = f.readlines()
for line in lines:
yield self.parse_object(line)
def parse_object(self, line):
# complicated document specific parsing here
return ParsedAction('''stuff you got from the line, used to construct an object''')
def scan_document(self, *args, **kwargs):
return [_ for _ in self.scan_document_gen(*args, **kwargs)]
parser = AlgorithmParser()
parsed_tree = parser.scan_document("specification.docx")
for algorithm in parsed_tree:
algorithm.generate_code()
If you save that snippet into a file named specification.docx
and execute it with python specification.docx
, it will simply generate the string within the ParsedAction.generate_code
method one time for each line in the script. Boom, code generation (of dubious quality, but hey, it’s a blog post)!
Let’s say I wanted to generate a C
header file with this method. Let’s say each ParsedAction
in the document emits header file code when you call its generate_code()
method, but you still want to have nice things like include guards and comments for future developers who are mystified as to why you would program this way.
That might look like this:
import datetime
def generate_header_file(parsed_tree):
print(f"/* --- DO NOT EDIT ---")
print(f" * This file was generated by {__file__}")
print(f" * Generation time: {datetime.datetime.utcnow().isoformat()}")
print(f" */")
include_guard = "GENR8D_HEADER_H_"
print(f"#ifndef {include_guard}")
print(f"#define {include_guard}")
for algorithm in parsed_tree:
algorithm.generate_code()
print(f"#endif \/* {include_guard} *\/")
generate_header_file(parsed_tree)