This software aims at:
XML files are validated whenever possible
a checksum is performed on the invoice file
all cryptographical signature and certificates must be valid
invoice attachments are examined before saving them permanently
works offline once you have downloaded the necessary files
See the definition of pipeline:
In this software, a pipeline is a sequence of steps that given the signed invoice file and its metadata file makes it possible to obtain:
various integrity and authenticity verifications
the html output of the invoice file
the extraction of embedded attachments
Without entering in too much detail (you can read the source code for that), we can divide the pipeline into three main steps:
compare metadata file content with the invoice file
check invoice signature and signer’s certificate
Due to the lack of support for most of the PKCS#7 functionality in
PyOpenSSL, all OpenSSL operations are performed using the
module which calls the
openssl binary installed on the system.
There are problems with recent versions of OpenSSL concerning PKCS#7 file decoding:
A possible solution it use an older system. If you really trust the file you can disable signature checking with the appropriate option. I strongly discourage the latter solution since if you cannot prove its authenticity the invoice has no legal value.
lxml vs defusedxml
I decided to use lxml because it supports XML stylesheets (XSLT) as defusedxml does not.
At first, using defusedxml seemed the best bet because of the increased security:
There is the possibility to use defusedxml anyway by simply editing the API file like this:
import defusedxml.ElementTree as ET
import lxml.etree as ET
You must also add defusedxml and re-install the requirements as described in the contributing section.
Not having access to all schema files is a problem since there is no way to tell if
the metadata file,
the trusted list file,
the XML stylesheet file
are correct and conforming to specifications.
If you find these files please let me know and/or open a pull request.
Downloading of the W3C file
The W3C schema file is a dependency for the invoice schema file and it needs to be cached because the XML resolver can go in timeout.
Sometimes it takes more than one minute to download this file.
Fattura PA vs Fattura B2B
Some websites say that digital signature of the Fattura PA is required and in other cases it is not.
If you find any official source please let me know and/or open a pull request.
Support for non-signed invoice files has been added because of at least one reported case.
Although this whole system has its merits, it has been put into production with lots of missing pieces (see notes above). Given the importance of it I think this is unacceptable. I will not make any more comments because I may risk legal actions…