Structure data recipes#
Fig. 2 Structure data phase on Triage’s workflow#
The simplest way to get started is to structure your data as a series of events linked to your entity of interest (such as a person, organization, or business) that occur at specific points in time. Each row in your dataset represents one event. Every event should include an identifier, and a unique entity identifier that links the event to the entity it pertains to. It should also contain a date field, indicating when the event occurred, along with any additional attributes describing the event (e.g., type) or the entity itself (e.g., age, gender, race). A sample row might look like:
event_id |
entity_id |
date |
event_attribute (type) |
entity_attribute (age) |
entity_attribute (gender) |
… |
|---|---|---|---|---|---|---|
121 |
19334 |
1/1/2013 |
Placement |
12 |
Male |
… |
Triage requires a field named entity_id (of type integer) to identify the primary entities of interest in your project. It also requires a date field that specifies when each event occurred, which is essential for correctly building and validating models.
Examples
Healthcare: n typical electronic health record (EHR) systems, patient demographics are stored in a dedicated table where the entity_id corresponds to the patient ID (often the medical record number, or MRN). Additional tables record events such as encounters, diagnoses, or procedures — each represented as a row that includes the
entity_idlinking it to the patient, a timestamp indicating when it occurred, and other relevant attributes about that event. All of these tables are provided to Triage as input within a PostgreSQL database.Education: The
entity_idwill typically be the student identifier and the events include things like a grade in a class in a given year, a test score in a test at a given time, graduation, etc.