A subjective view of the various costs involved in wrapping and porting applications under EMBOSS is given in the table below (???).
Category | Wrappers | Port |
---|---|---|
Development cost | Low to Medium | Medium to High |
Flexibility | Low to Medium | High |
Limitations | Medium | Low |
Maintenance cost | Low to Medium | Medium to High |
Support cost | Low | High |
Perceived risk | Low to Medium | Medium to High |
More or less the same ACD file must be written for a wrapper or a port. The main difficulty is that third party software is typically not as flexible as EMBOSS and requires input files in a specific format, for example sequences in FASTA format. In such cases there's a decision to make - either to support all input datatypes that EMBOSS supports or to stick within the constraints of the original program.
Fully supported input requires additional code for manipulating temporary files to convert the input data into a format acceptable to third party software. If this isn't done it's necessary to enforce any input constraints at the level of the ACD file (if possible), or otherwise document them and raise an exception if the application receives data in the wrong format. Constraining the permissible formats of sequence input within ACD would go against the whole ethos of EMBOSS, so there's no alternative but to add code for reformatting or to raise an exception.
The big advantage to writing a wrapper is that you don't need to worry about the third party source code itself. All the wrapper code must do (after reformatting the input files and processing the ACD file) is to construct an appropriate command line, invoke the command, then (possibly) reformat the output (using temporary files). Building the correct command line usually involves some comparison of parameter values and therefore some housekeeping code, but that's more or less trivial.
When porting software it's essential to consider the third party source code carefully. No new files of code are usually needed, but the main()
function will need rewriting to handle the ACD file processing. Also, any other functions that read input data, and these might live in different files, will need rewriting so that they work with the data read via ACD. Therefore it might be necessary to edit multiple files. The edits themselves most probably will not be too difficult but could present a barrier. Furthermore, knowing where to edit certainly does require knowledge of the third party code, possibly quite deep knowledge. For these reasons the initial development cost for a port tends to be significantly higher than it is for a wrapper.
For the reasons just explained, ported software provides the greatest flexibility in terms of support for input and output formats. In fact support is as complete as it is for any EMBOSS application. In contrast there is potentially less flexibility for wrappers, though this might be mitigated with extra coding involving the use of temporary files.
In addition to greater flexibility, ports may have fewer intrinsic limitations owing to technical reasons. For example, difficulties in getting one program to execute another have been reported under MS Windows. Issues can also arise with inter-process communication. In principle, a call to system()
or exec()
could be used to invoke the third party application. Both system()
and exec()
create a new process. The difference is that system()
spawns a shell to invoke the program whereas exec()
just invokes the program. When using system()
the shell can get in the way of setting up inter-process communication.
To maintain a wrapper you only need to worry about changes to the interface. It's relatively easy to add one or two new options to the ACD file and corresponding wrapper code. Upon major releases, however, many of the options might have changed and it might be simpler to start from scratch, rather than incrementally modify what's already there. Nonetheless it's not difficult because there's no amendment of the third party code. In either case the package documentation must be updated.
With a port, in addition to changes to the interface, any changes to the application code must be integrated. As there is no convenient automatic mechanism for doing this, for example by using a common cvs repository, then one has to either use diff
on the code to detect changes, or otherwise start afresh by adding the EMBOSS-specific code in the port to the code to the new release. The latter may well be the simpler and safer option. This is easier if all the insertions of EMBOSS-specific code are well documented. Nonetheless it may not be obvious where the changes should be made, requiring code inspection even if the code is well documented. For these reasons porting software is likely to be more error-prone.
The cost of supporting a wrapper is substantially lower than a port. This is because you didn't write the third party application, only a wrapper to it. Therefore you can reasonably forward any queries to the original authors, so long as you're certain errors have not arisen as a result of the wrapper code. In the case of a port you've modified the original source code and so may be reasonably expected to support it.
The final issue of "perceived risk" is subjective, but it boils down to whether the end user will trust your software enough to use it. Inevitably, ported software is treated with some suspicion because someone other than the original author has modified the code, regardless of whether they have in fact fixed bugs or improved it in some other way. Therefore a port could be overlooked if a user is being cautious and is trying to avoid any possibility of discrepancy in results. They might also stick with the original, with warts and all, simply because it's what they know and have used in the past. In contrast, the perceived risk is lower with a wrapper where they are understood to merely call the wrapped software.
Whether you should port or wrap software depends on the case in question. Generally, wrapper applications are preferred as they maintain the separation between the original and EMBOSS code and are easier to develop, maintain and support. If you are the author of the original code however, or you see EMBOSS as the main access point to the software, it may well be preferable to port the application.