What went wrong with Horizon: learning from the Post Office Trial

This Post Office trial has revealed what is likely the largest miscarriage of justice in UK legal history. Hundreds of individuals who operated Post Office branches (subpostmasters) were convicted on fraud and theft charges on the basis of missing funds identified by the Horizon accounting system. Thousands more subpostmasters were forced to pay the Post Office back for these shortfalls. But the Post Office trial concluded that Horizon was “not remotely robust”, and the supposed shortfalls might never have existed in the first place and, where they did, they might not have been due to the fault of the subpostmaster.

This scandal resulted from insufficient information being disclosed in the process of prosecuting subpostmasters, poor oversight of the Post Office (both by its management and by the government) and a failure of the legal system to view evidence generated by Horizon with appropriate scepticism. These matters have been discussed elsewhere, but what’s been talked about less are the technical failures in Horizon and associated systems that might have caused the supposed shortfalls.

I spoke to the Computerphile YouTube channel about what we’ve learned about Horizon and its failures, based on the Post Office trial. What seems to be a simple problem – keeping track of how much money and stock is in a branch – is actually much harder than it appears. Considering the large number of transactions that Horizon performs (millions per day), inevitable hardware and communication failures, and the complex interactions between systems, it should have been obvious that errors would be a common occurrence.

In this video, I explained the basics of double-entry accounting, how this must be implemented on a transaction system (that provides atomicity, consistency, isolation, and durability – ACID) and gave some examples of where Horizon has failed. For this video, I had to abbreviate and simplify some of the aspects discussed, so I wrote this blog post to refer to the Post Office trial judgement that talked about the situations in which Horizon has been identified to fail.

Failure of atomicity resulting in a duplication of a transfer

At 7:06, I talked about atomicity requiring that all parts of a transaction must occur precisely once. In the judgement (paragraph 346), an example of where Horizon duplicated part of a transaction following a system crash.

Mr Godeseth was taken, very carefully, through a specific use of the transaction correction tool in 2010. In PEAK 0195561, a problem was reported to the SSC on 4 March 2010 where a SPM had tried, on 2 March 2010, to transfer out £4,000 (referred to in the PEAK as 4,000 pds, which means either pounds (plural) or pounds sterling) from an individual stock unit into the shared main stock unit when the system crashed. The SPM was then issued with 2 x £4,000 receipts. These two receipts had the same session number. The PEAK, as one would expect, records various matters in note form and also uses informal shorthand. However, the main thrust is that when the SPM did the cash declaration, although the main stock unit (into which the £4,000 was being transferred) “was fine”, the unit from which the cash was taken “was out by 4000 pounds (a loss of 4000 pds)”. This is very similar to what Mr Latif said had happened to him, although the transfer in July 2015 to which he referred was £2,000. The PEAK related to Horizon Online and was the admitted occasion when the Balancing Transaction tool had been used.

Failure of consistency resulting in a loss of synchronisation between systems

At 8:03, I talked about consistency requiring that invariants are preserved. One of these invariants is that Horizon (maintaining the Post Office counters) is synchronised with the accounting system (which keeps track of the money). In the judgement’s technical appendix (paragraph 131), an example was given where the counter system lost synchronisation with the back-end accounting system.

The bug related to the process of moving discrepancies into the local suspense account. The majority of incidents are recorded as occurring between August and October 2010. The bug was documented in a report from Mr Gareth Jenkins dated 29 September 2010 where it was stated:
“This has the following consequences: There will be a receipts and payment mismatch corresponding to the value of discrepancies that were lost. Note that if the user doesn’t check their final balance report carefully they may be unaware of the issue since there is no explicit message when a receipts and payment mismatch is found on the final balance (the user is only prompted when one is just detected during a trial balance)”

This issue is reported as causing discrepancies that disappeared at the counter or terminal when the branches followed certain steps, but which persisted or remained within the back back-end branch account. It is therefore something which is contrary to the principle of double entry bookkeeping, and should plainly not have occurred. The issue occurred when a branch cancelled the completion of the trading period and then, within the same session, rolled into a new balance or trading period. Because the discrepancy disappeared at the counter, the SPM would not know that the discrepancy existed.

Failure of isolation resulting in a transaction being absent from a report

Isolation requires that transactions that happen concurrently should not interfere with each other. At 8:59, I talked about a failure of isolation between the process of generating a report and another transaction occurring. The judgement’s technical appendix (paragraph 252) discussed this case.

Some other issues relating to Girobank were identified by the Post Office separately, as issues 3, 4, 5 and 6 under the same heading of entry 8 in the Bug Table. Issue 3 applied to two PEAKs, both from 2000. They were PC0052575 (in which the SPM reported discrepancies of £20 and £628.25) and the issue was diagnosed as arising out of the use of a shared stock unit. The Post Office submissions were “There is a window of time between a user printing and cutting-off a report. If another user was to perform a transaction during that window, that transaction may not show on the report. The issue was already due to be fixed in a future release”. Mr Coyne accepted in cross-examination that these were indications of a discrepancy being identified, but these were not of themselves evidence of a bug in Horizon. His overall evidence on this was contained in his final answer which was “It created a financial discrepancy within the Horizon system which could then ultimately have an impact on branch accounts.” I accept that this issue caused a financial discrepancy. Given the issue was “due to be fixed” in what is described as a “future release” the issue arose from the operation of the system and is therefore, in my judgment, correctly described as a defect.

Failure of durability resulting in a transaction disappearing from Horizon

Durability requires that once a transaction is committed, it will not get un-done. How this is often implemented is where the system detects failures that happen before a transaction has been fully recorded and recovers from the failure by completing the process of recording the transaction. The judgement (paragraph 142) discusses a scenario (which I covered at 9:40) where a transaction had been committed on the connected banking system but disappeared from Horizon.

Her evidence related to events of 9 May 2016 when the national outage occurred, the same date as the occasion about which both the Mr Patnys had given evidence. She gave evidence about the impact upon the branch business, the way she was serving customers and how Horizon was being very slow that day, with a sand timer appearing on the screen for a very long time. She served one customer, who was making a cash withdrawal, as she obtained the relevant messages and approvals on the screen. However, after they had left, a receipt printed saying “Recovery failed” and the withdrawal of £150 was not shown. She then later studied the transaction log and this latter transaction did not appear.

Mrs Burke then went to extraordinary lengths. She also proved herself very tenacious, as many people may well have simply given up on the sum of £150. She identified the customer, and she tracked him down. She went to his house and explained what had occurred. He happened still to have the receipt from the transaction at her Post Office. It entirely matches her account. She went with the customer to the customer’s bank, which was the TSB in Goole. She explained with the customer to the bank cashier what had happened, and the cashier printed out the bank statement and showed that the sum had been withdrawn from the customer’s bank account. The customer permitted Mrs Burke to have this.

Failure of log replication

Early versions of Horizon kept track of what happened by maintaining a log of messages which were replicated between systems to ensure a consistent state between all computers in a branch and the Post Office’s back-end systems. I talked about this at 11:33 and how it’s an example of the consensus algorithm problem for which building reliable implementations remains an open research problem. The judgement’s technical appendix (paragraph 263) discussed a case where this replication failed.

In theory, when a counter was replaced, it builds its messagestore by replicating with its neighbours in “recovery mode”. The neighbours it has depends on the office size (which would affect the number of other counters) and node number. For a single counter office, the neighbours are the correspondence server in the datacentre and the mirror disk (the second hard drive in the same counter). For a multi-counter office, the neighbours are the correspondence server and all other nodes at the office, or all the other nodes in the office (known as slaves) depending upon the node number of the counter being replaced.

A replacement counter is supposed to come out of recovery mode when it believes it has successfully replicated all relevant messages from its neighbours. The Post Office submissions state that “In this case, the replacement counter came out of recovery mode early, before it had replicated all messages from its neighbour. The replacement counter started writing messages from the point at which it believed it had replicated all relevant messages from its neighbour. This meant that it used message IDs that had been used for messages that had not been replicated from its neighbour and this prevented the “missing” messages from being replicated later on (because that would have created duplicate message IDs). The missing message was therefore “overwritten” by the replacement counter.”

Failures to ensure the reliability of evidence

At 14:55, I talked about how evidence generated by Horizon was sometimes the sole basis for prosecuting subpostmasters, and so it must be reliable. This means that we can be assured about when an action occurred, what had happened, and who did it. We also want assurance that only the minimum number of people are granted the system privileges that allow sensitive changes to be made. The judgement showed that Horizon failed to meet any of these requirements.

When actions occurred

Judgement paragraph 914:

Mr Coyne had identified issues with using Credence data. There was a one-hour difference in the time stamps used between Fujitsu and Credence, which can hardly have helped sensible investigations when SPMs raised queries, but there is more to this than that. The E&Y review in March 2011 identified various issues with Credence, including weak change controls within the back-end of the systems which allowed Logica developers (the third-party provider) to move their own uncontrolled changes into the production environment, which included both Credence software code and the data within Credence used for what was called “audit evidence” but which has to be differentiated from what I am referring to audit records in the audit store. There was a lack of further documentation to approve fixes and patches applied to Credence outside of the release process, which meant that linking changes to issue tickets to record the original request for the bug fix was not possible.

What had happened

Judgement paragraph 692:

However, when one then considers the subsequent passages of the 4th experts statement, it can be seen how far from this joint agreed (and technically justified) position the Horizon system was. The logging of Privileged User Access (in PUA logs) commenced in October 2009. For the period 2009 to 2015 – obviously a 6 year period – these logs only displayed the fact that a Privileged User had logged on or off, “but not what actions they had taken whilst the Privileged User was logged in”. Therefore the actions they were taking when logged in were being neither recorded nor audited. All that could be seen is they were logged in. Further, it has already been seen that the number of users with the relevant privileges was not, in my judgment, restricted to a minimum. Further, the use of the Transaction Correction Tool cannot be seen in these logs. Yet further, the experts are agreed that at all times, any privileged user access log only shows what tables of BRDB were accessed for a very small minority of accesses.

Who carried out an action

Judgement paragraph 532:

This stance was maintained by the Post Office in the evidence served on its behalf for the Horizon Issues trial, until service of Mr Roll’s 2nd witness statement. To be fair to the Post Office, its origin was the witness statements served by Fujitsu employees, rather than Post Office employees. The position within the Fujitsu witness evidence, prior to its correction by the later statements from Mr Parker, was that what Mr Roll said was possible on Legacy Horizon, and what he had himself done, was simply not possible. Indeed, Dr Worden considered it sufficiently clear that as an IT expert he felt able confidently to assert in his 1st Expert Report that he, Dr Worden, had “established” that Mr Roll’s evidence of fact in this respect was wrong. After service of Mr Roll’s 2nd witness statement, Fujitsu finally came clean and confirmed (via Mr Parker) that what Mr Roll said was correct. Data could be altered by Fujitsu on Horizon as if at the branch; under Legacy Horizon, transactions could be inserted at the counter in the way Mr Roll described. This could be done without the SPM knowing about this. Mr Godeseth also confirmed that it would appear as though the SPM themselves had performed the transaction. This is directly contrary to what the Post Office had been saying publicly for many years.

and also the judgement paragraph 916:

There is also at least one specific occasion, considered in the evidence during the trial, which shows that the Credence data does not show the correct position. This was put to Ms Van Den Bogerd. This was an occurrence at Lepton in October 2012. It led to something that was referred to throughout the trial as the Helen Rose Report or Rose Report, as the author of the report into it was called Helen Rose. I have dealt with it to a degree at [227] above. That report records that: “A transaction took place at Lepton SPSO 191320 on the 04/10/2012 at 10:42 for a British Telecom bill payment for £76.09; this was paid for by a Lloyds TSB cash withdrawal for £80.00 and change give for £3.91. At 10:37 on the same day the British Telecom bill payment was reversed out to cash settlement.
The branch was issued with a Transaction Correction for £76.09, which they duly settled; however the postmaster denied reversing this transaction and involved a Forensic Accountant as he believed his reputation was in doubt.” (“duly settled” means the SPM paid the Post Office that sum). The Credence data showed that the SPM had reversed the transaction. By consulting the audit data, Mr Jenkins discovered that he had not. This was expressly confirmed, both in the Rose Report and also by Ms Van Den Bogerd in her cross-examination.

Minimising access to sensitive systems

Judgement’s technical appendix paragraph 423:

This shows that prior to this change to the script (which cannot have taken place prior to the date it was implemented, for obvious reasons) all the SSC users had the very powerful permission (also sometimes called privileges) of the APPSUP user. The experts were agreed that such users could, in terms, do whatever they wanted in terms of access to the system. That could obviously have an impact on branch accounts, depending upon what was done by any particular user on any particular occasion. SSC users should only have had the far more limited SSC role and Fujitsu itself were aware of this, and can only be the entity responsible for them having the incorrect wider role, as they were all Fujitsu employees. The section of the accompanying judgment of Mr Godeseth in the judgment that accompanies this also refers to evidence from My Coyne and Mr Parker on the same subject. Mr Parker challenged Mr Coyne’s figure but had no basis for doing so, as all he had was his impression, and he had specifically failed to do a proper investigation even though I find Fujitsu could have provided far more proper, cogent evidence of the number of occasions. I accept Mr Coyne’s evidence on this, and given both Dr Worden and Mr Godeseth accepted that the powerful APPSUP permission or privileges were more widely available, and less controlled, than they ought to have been (even based on Fujitsu’s own internal controls) then this inevitably has a detrimental effect upon the robustness of Horizon.

Conclusions

The Post Office trial is one of the few cases where an in-depth examination of system failures is made public and so it’s a valuable lesson to learn from. Even simple problems like maintaining a stock balance become complex when part of a distributed system. Techniques like ACID transactions can reduce the likelihood of errors but real implementations will sometimes fail. When a system processes a large number of transactions, this small probability of failure can add up to frequent errors. I hope that the presumption that computers operate correctly is revisited, and the factors revealed by the Post Office trial are taken into account when doing so.

 

Photo by Mick Haupt on Unsplash.

19 thoughts on “What went wrong with Horizon: learning from the Post Office Trial”

  1. This is an outstanding and illuminating description of the failures in the Horizon system. Thank you so much for providing such clarity and insight. So Horizon did really fail the ACID test!

    I would challenge you on one point however, the suggestion that systems ‘fail’ – although I agree that there should be no presumption that computers operate correctly. Systems only fail in so far as their operational controls and design are not adequate for the intended purpose of the system, i.e. not ‘fit for purpose’.

    That is not to say that things won’t go wrong, but with appropriate controls and competent people in place the adverse consequences can be avoided – a car will run out of petrol sooner or later, but if you check the gauge and periodically top up the car the problem does not arise.

    Further, many companies architect their solutions to ensure principles characterised by acronyms like CIA, CAP and ACID are adhered to – e.g., using blockchain. It’s just more expensive to follow well accepted best practices, and the Post Office was wedded to meeting cost and time objectives over other considerations, as evidenced by statements made in their annual reports.

    Effective governance in the Post Office and on its Board could have surfaced the warning signs, but it would appear that compliance, transparency and oversight were inadequate at the Post Office – and probably still are.

    However, what is perhaps most unforgiveable in this saga – and I think this was a point you made – was that the failures of the Post Office management were adopted by the judiciary in embracing such a shallow, binary understanding of systems and the role of computer and network technology in them.

  2. @Nicholas: That’s a very good point. The level of Horizon’s robustness might well have been sufficient if it was only used for guiding investigations, and prosecutions would have required independent conclusive evidence of illegal activity. In such a scenario, Horizon would not have been a failure, and innocent subpostmasters would not have been prosecuted. The problems occurred because the investigation process assumed that Horizon was infallible (and that subpostmasters were dishonest) but the reality was far from that. Governance and the court system failed to pick up the discrepancy.

  3. Thanks for such a great overview Steven.
    Do you know what sorts of stress testing (Simulated high volume transactions, Simulated intermittent comms failure and so on) were done prior to a release of Horizon going live?
    And was independant acceptance testing performed on the system beyond a superficial checking of the functionality?
    Am I right that developers were occasionally making adjustments to a live prooduction system rather than being restricted to test or staging systems? Sometimes the only way to prevent corners being cut is to take away the scissors. 🙂

  4. Thanks for the comment Declan. I didn’t see any specific discussion on what sort of testing was done on Horizon. I would imagine there was some, but either it was not adequate to detect the bugs in question or the risk of these bugs causing problems was accepted.

    There were certainly changes to live production data to fix problems. I didn’t see any discussion of changes to code that had not already been validated on test or staging systems. Equally, I didn’t see any claims that changes to production systems never happened without validation.

    In general, there was little discussion on testing and change management. The focus of the Horizon trial was on known bugs and their impact. Perhaps later on there would have been more discussion about how common were unknown bugs, and that could have brought in a discussion about how robust testing procedures were. However, the case was settled before we could get to that.

    1. It is well known that no feasible amount of testing could be adequate to detect all bugs in a system, so to say “the testing was inadequate” is, at best, a tautology.

      That highlights the problem with replacing the common law presumption that evidence from computers is trustworthy. Society depends on a lot of electronic evidence. When it is challenged, what other evidence should we require the party relying on the evidence to offer in support? Even the most professionally developed and maintained systems will be able to fail in rare circumstances, so what is to be done?

      It’s clear that there were serious deficiencies in the way Horizon was developed and maintained. In my experience as an expert witness in some major litigations, there are serious deficiencies in the way most systems are developed and maintained, and all developers rely on “inadequate testing” to create the confidence to release the system into service. Few developers are willing to work within the constraints that make it possible to prove software “correct” (or even that it is free of some limited classes of bugs, such as type violations or memory safety errors).

    2. Were all of the Company’s Fujitsu’s) system software test logs revealed, with results and actions taken, before each software release, revealed in court, studied and used in evidence, at any trial or HM Gov. investigation?
      For such a “reputable Company,” and having many other HM Gov. contracts for its software oructs, its IT; R&D and Quality Control policies surely demanded that such software testing be carried out and documented on all new software before commercial release and implementation? . . . Software testing identifies bugs and issues in the development process so they’re fixed prior to product launch. This is done to ensure only workable and robust quality products are released and implemented in all new releases for commercial use. . . . Did any UK Government investigation demand such data and logs?

  5. Statistically speaking, ask why the sudden increase in shortfalls?

    This anomaly alone should have prompted deeper investigation to discover earlier the Horizon system’s incomplete logging of transactions.

  6. I was involved in the early days of the Link cash dispenser system. There were about 20 members of the consortium, mainly Building Societies. Each member was responsible for its own cash dispensers; replenishing with new notes, balancing its own dispensers and so on. Any customer could use any cash dispenser in the Link network. Every month Link HQ produced balance accounts of “us on them” and “them on us” transactions so that the members could cross charge each other for “foreign” transactions.

    In the early days the system suffered from “phantom ” transactions. For example a customer would withdraw cash from a dispenser in Lancashire and the debit would be posted to a customer in Kent!

    Many experts worked on the problem for months. Software and hardware were tested remorselessly.

    Eventually the problem was solved. If the central computer failed there were nearly always several transactions in progress. The debits might have been posted but the credits not. The central computers were known as “non stop” and had dual processors, dual memories and continued after a break of a few milliseconds.

    However the memory was not being cleared correctly and the system debited and credited wrong accounts.

    Thankfully the Link consortium was absolutely honest and transparent and compensated every member who suffered a phantom transaction until the fault was rectified.

  7. Surely basic accounting and Audit Procedures would have shown vastly increased takings compared to previous years, on the scale of this it should have set Alarm bells ringing

    1. As a percentage of the total it probably wasn’t sufficient to raise eyebrows. That alone suggests they knew it wasn’t perfect – as they had a “margin of error” deemed acceptable.

  8. Why aren’t the Auditors of the Post Office, who failed to see that the arithmetical systems errors did not reflect reality, not culpable in this scandal that ruined SP lives. The individual SP were told they were the ‘only ones with discrepancies’ and a review of the helpline/anomalies log would have shown a significant number of transcripts with SP discrepancies. Perhaps it wasn’t ‘material’ +/- 5% but surely there ought to have been a check on physical transactions compared to those recorded on the computer system. And also how the system dealt with power outages. Or is it just unlucky that the auditors never found a problem transaction?

    1. I think that the reason this didn’t happen was the belief that the evidence that the computer was making was beyond Reasonable doubt so why should anyone be looking for computer error.
      If you don’t work with computers and you have laws that force the defendant to prove innocence over plaintiff proving guilt this is what happens.
      There are so many failures on some many levels that if we want to solve it, it should start with stronger laws and oversight.

  9. Just to say thank you to the above contributors who have delved into and commented on this issue with civility and respect, and without rancour. Such is rare. As a non IT bod I am much better informed. From my personal experiences I feel that there is no prospect that a high volume system will address all issues adequately, so all should have an early access “manual override”. In the Post Office case this would have flagged the issues at an early stage.

  10. An excellent exposition of what can go wrong when systems crash resulting in failed transactions.

    Also the perils of access to data without an audit trail.

    I only used to cobble together very small client server systems using Visual Foxpro so could hit the database directly.

    Logging in in as a different user (one of the SPMs) is a whole other ball game. Is that what was happening, or purely inappropriate user permissions granted to SCC?

  11. I was not aware until the TV drama this week that it was possible for employees of Fujitsu operating the system to alter entries in live systems. This leads me to posit an opposing supposition that shortfalls were technical or operational interference.
    What if there were indeed real shortfalls?
    What if Fujitsu staff were diverting, e.g. stealing funds from these live accounts?
    Shortfalls would indeed show up as cash collected that did not cover stock sold, or payments did not cover receipts.
    Has there been any investigation of potential financial malfeasance by Fujitsu staff accessing the back end in a uncontrolled fashion?

    1. It don’t matter what database or how it is stored, someone will need admin access to the said data. The question is what this admin access traceable via logs that are protected with hash codes or blockchains. I remember hearing on the news that the post office said it cant be done. and I was thinking at the time, every IT person will be thinking “foo bar”. Unless its recorded in a way that cant be changed, for example on paper or DVD-ROM, it can be changed.

      I think also the cause is management in charge of IT personnel not having a IT background. So end up believing what ever they are told or misunderstand at a fundamental level what they are manging. Or they forget to listen to the experts below them in the chain of command.

      This are my initial thoughts and maybe wrong but had to be said.

    2. Your question the same as mine. Money doesn’t simply “appear” out of thin air, therefore it cannot simply “disappear” into thin air. This could be a real can of worms.

  12. Given that communication problems seem to have been common, and the communication relied on ISDN connections which can run very poorly over last mile copper circuits, is there any correlation between the distance of the post office from the local BT exchange and the number of sub postmasters at a given distance that reported problems with horizon?

Leave a Reply to Geoff Preston Cancel reply

Your email address will not be published. Required fields are marked *