Using Apex Designer to Generate Synthetic Healthcare Data: FHIR Factory

FHIR (Fast Healthcare Interoperability Resources) is the web standard for healthcare interoperability and data exchange. Any modern healthcare information system will have FHIR support. Healthcare applications that work with population level data need large datasets for testing, and Synthea is a dataset containing over 1 million synthetic patient records.

The Synthea dataset is a great starting point, but can be tricky to work with, and lacks certain types of data. We built the FHIR Factory app in Apex Designer to help anyone building a low code healthcare related application. It allows a low code application developer to pull in the Synthea data and generate the required missing data.  

FHIR Factory can replay the synthetic events it generates using Kafka to allow you to test your app or algorithm on a "live" data stream. If you have questions or would like more information about FHIR factory, please contact us.


Here's a quick introduction to the FHIR Factory app. The FHIR Factory app is a way to generate and manage synthetic FHIR data for testing applications.

The bundles page that you're seeing here includes imports of all of the bundle files from the Synthea collection. Each bundle is brought in, the different resources are passed, and they are stored in the design database.

We can see after doing that import on the resource type page, there are quite a few different resource types and items that have been that have been extracted from the bundle files. We can look at the patients, look at the individual patients, see the encounters related to those patients and the metadata that is exposed is just a part of it. The full set of FHIR JSON is there and viewable and editable. It also includes logic to convert the V3 FHIR resources into V4 resources. So for example, right here, this particular one is actually a medication order in the Synthea data set, but it created the equivalent medication request using the V4 format.

Some resource types are not very well represented. For example, medication dispensers and locations. Here we have locations which are pharmacies that have been generated for each of the patients. You can go to the patient page and generate pharmacies for all. It triggers logic that uses Google Places API to look for pharmacies close to each of the different patients and then make sure that all of those are added as pharmacy locations for use in generating synthetic data.

Other kinds of cleanup have been applied as well. For example, practitioners by default in the data set do not have location information with their address. So this extension for each information was done automatically by geocoding the address and adding in the latitude and longitude so that it would be available for use again in the synthetic data.

Coming to the synthetic data, in this particular example, we wanted to be able to generate some scenarios related to prescriptions and getting the prescriptions filled. And so a pharmacy scenario basically has a set of properties.

  • How many patients are going to get generated
  • What the start date is
  • How we should spread the start date out so that they don't all begin on January 1st
  • How many cycles we should go through
  • How many days per cycle

So we're going to do five cycles of 31 days. We're going to put in a medication, pick the closest practitioner to that patient, set the supply days for 30 and the refills to zero. Then on the pharmacy side, on the medical dispensers, use the closest pharmacy and in this case, have no refills.

So you can see the medication requests that have been generated, each one having the core information from the FHIR JSON.  The medication dispenses have also been generated by selecting the closest pharmacy to the particular person.

But then there also are scenarios like generating negative situations, like instead of having the closest pharmacy select one at random. This scenario uses multiple practitioners, so instead of the closest practitioner, it uses a random one. So for this particular set, we did 50 of a no refill, 20 with refills, ten patients with too many pharmacies, ten with too many practitioners and ten with supply overlap (where the supply days was 30, but they refilled the the prescription after 20 days).

Each one of these generated all of the requests and you can change the parameters and just click the button to regenerate these. So the scenarios help us build additional synthetic data. And of course, there could be many other scenarios that we add to this app to generate other kinds of synthetic data.

Once you've generated that synthetic data, we can then go to the Event Resources page, which lets you pick a start date and then replay or emit these events on a Kafka topic at a certain frequency. In this case, we have about 180 days of events, and if we do them three days per second, then we'll end up with about 60 seconds of events that are emitted and you just click the button here and it will start the process and emit those events in this compressed real time as you've specified.

So FHIR Factory is a great way to generate synthetic FHIR data and use that to test the apps and the algorithms that you're building.

David Knapp

David Knapp