Structure data recipes

Structure data recipes#

Structure data phase on Triage's workflow

Fig. 2 Structure data phase on Triage’s workflow#

The simplest way to get started is to structure your data as a series of events linked to your entity of interest (such as a person, organization, or business) that occur at specific points in time. Each row in your dataset represents one event. Every event should include an identifier, and a unique entity identifier that links the event to the entity it pertains to. It should also contain a date field, indicating when the event occurred, along with any additional attributes describing the event (e.g., type) or the entity itself (e.g., age, gender, race). A sample row might look like:

event_id

entity_id

date

event_attribute (type)

entity_attribute (age)

entity_attribute (gender)

121

19334

1/1/2013

Placement

12

Male

Triage requires a field named entity_id (of type integer) to identify the primary entities of interest in your project. It also requires a date field that specifies when each event occurred, which is essential for correctly building and validating models.

Examples

  1. Healthcare: n typical electronic health record (EHR) systems, patient demographics are stored in a dedicated table where the entity_id corresponds to the patient ID (often the medical record number, or MRN). Additional tables record events such as encounters, diagnoses, or procedures — each represented as a row that includes the entity_id linking it to the patient, a timestamp indicating when it occurred, and other relevant attributes about that event. All of these tables are provided to Triage as input within a PostgreSQL database.

  2. Education: The entity_id will typically be the student identifier and the events include things like a grade in a class in a given year, a test score in a test at a given time, graduation, etc.