Sunday, October 7, 2007

Hypothesis Testing

The current CDAL does not support hypothesis testing. Thus, the task is to implement this function on CDAL. There are two outside sources that contains built-in functions for hypothesis testing. They are R and Matlab. Matlab is chosen to carry out this task because it is the easier of the two to be implemented.

Example: For researchers, an interesting question that needs to be analyzed is to test whether the heart rate of patients who are ventilated are higher than that of those who are not ventilated.

Expected time-frame: 1 week (mid Wk 10-11)

Expected finished date: Wk 11 Thursday

Tuesday, October 2, 2007

Review of the Architcture of CDAL

The current architecture for CDAL is as followed:

The architecture of CDAL has been converted to be object oriented. Thus, what the user enters as the query will first be checked by the syntax parser (including David's SNOMED server for terminology correctness when implemented). Once checked, the query will be split by the semantic parser, which produces many different answer objects and condition objects (if any).

Both the answer objects and the condition objects can be based on different categories, (we call this an event).

The categories (and their corresponding number of attributes and definitions) are as followed:

Chart_events (total): 786
- Chart_events (numeric): 734 - All the numerical charted information for patients (E.g. heart rate, peep, cvp, etc.)
- Chart_events (categoric): 52 - All the categorical charted information for patients (E.g. ventilation mode, airway, etc.)

Medication_events: 52 - All the iv-drip-infusion (sedation and inotropes) information for patients (E.g. Propofol, Fentanyl, etc.)

Patient_events: 6 - All the basic demographic information for patients (E.g. medical record number, sex, etc.)

Lab_events: 63 - All the chemical information for patients (E.g. Chloride, Sodium, pH, etc.)

Group_events (total): 74 - All the group-of-variables pre-defined by the medical staff. Unlike the other event types, this returns more than a single attribute. For example, sedation will return all the propofol, fentanyl, etc. that the patient has taken.
- Sedation: 8
- Inotropes: 14
- Antibiotics: 46
- Thromboebolic_prophylaxis: 6

Total: 981 attributes

For example, for a condition, there can be a patient_event (age > 30), or a chart_event (heart rate > 60), or a medication_event (propofol > 1), etc. Note that a chart_event can either be numeric (heart rate > 60) or categoric (ventilation mode = PS). Furthermore, the conditions can be connected by logical operator (AND / OR).

Similarly, for an answer, there can a patient_event (all values of mrn), or a chart_event (all values of heart rate), or a medication_event (all values of propofol). One thing extra is the inclusion of group_event. So the user can retrieve, not just one, but many pre-defined groupings values witin an attribute. For example, all values of sedation will return all the sedation group, including propofol, fentanyl, morphine, etc. Furthermore, each answer object contains its corresponding reference entity (all values, any value, last value) and statistical entity (mean, sd, max, min, range, mode, etc).

The medical groupings (Sedation, Inotropes, Antibiotics and Thromboebolic prophylaxis) are defined by Angela from RPAH and are the same for the auto-population project and WRIS project.

After the semantic parsing, these condition and answer objects are passed to the SQL generator, which produces the corresponding atomised query tree. Basically, a complex large query is spilt into many separate simpler queries, the individual answers are then joined to compute the final results. The performance issue to note is as followed:

pid = patient identifier, and this is an index used within the database. This is different to the medical record number that the staff uses. To enhance performance, queries should be split according to pid for the archival database, and gprid (global patient record identifier) for the real-time database.

The improvement in speed for the archival database is about 2-3 times faster, as it is no longer needed to wait for more than 1 minute for any query in the archival database. For the real-time database, the improvement is not significant.

After the SQL generator creates the query, it is passed to the database transceiver, which sedns the queries to be executed by the DBMS software. The results (in an array) are then passed to the response generator, which creates the corresponding result objects. Again, this is an object-oriented approach. So each result has an attribute name (heart rate), a type (such as a chart_event, etc.) and its corresponding values, mrn, and chart-time.

These result objects (all stored in a single class called Results), all finally passed back to the interface where the values are displayed (in David's interface).

That's the overall structure of the current version of CDAL. The prototype has now been completed.

One more thing that may be added (if time permits and if we have ideas) is the retrieval of freetext_event.