Monday, November 5, 2007

Final Progress of CDAL

The final version of CDAL is version 2.6. The code is located at the computer at the RPAH, under "C:\Program Files\Apache Group\Apache2\cgi-bin\cdal26\". The size of this folder is over 200MB because it includes the SNOMED-CT server as well.

To run CDAL, you must do so at the RPAH, since it requires connection to the CareVue Information System. Otherwise, you will receive connection error of all sorts. The url to CDAL is "127.0.0.1/cgi-bin/cdal26/cdal.py". At the interface, you can enter the question based on CDAL's pre-defined syntax.

The web-server is a bit trickier to explain. Whenever you make any changes to the source code, you must make the same changes under the "..\htdoc\cdal26\" folder. This means that the "htdoc\cdal26\" folder and the "cgi-bin\cdal26\" folder must be identical in order for any change that you have made to be effective.

Finally, Matlab must be installed on the machine (which it already is) in order to run hypothesis testing. All code on hypothesis testing is written in Matlab. Under "/hypothesis_testing", you must generate an .exe file for every method that you have written, so that "ht_interface.py" can call it accordingly. To generate a .exe file, use the mcc command inside Matlab.

For future work on CDAL, please refer to my thesis under "Future Work".

For any clarification or problem please contact me. Thanks.

Sunday, October 7, 2007

Hypothesis Testing

The current CDAL does not support hypothesis testing. Thus, the task is to implement this function on CDAL. There are two outside sources that contains built-in functions for hypothesis testing. They are R and Matlab. Matlab is chosen to carry out this task because it is the easier of the two to be implemented.

Example: For researchers, an interesting question that needs to be analyzed is to test whether the heart rate of patients who are ventilated are higher than that of those who are not ventilated.

Expected time-frame: 1 week (mid Wk 10-11)

Expected finished date: Wk 11 Thursday

Tuesday, October 2, 2007

Review of the Architcture of CDAL

The current architecture for CDAL is as followed:

The architecture of CDAL has been converted to be object oriented. Thus, what the user enters as the query will first be checked by the syntax parser (including David's SNOMED server for terminology correctness when implemented). Once checked, the query will be split by the semantic parser, which produces many different answer objects and condition objects (if any).

Both the answer objects and the condition objects can be based on different categories, (we call this an event).

The categories (and their corresponding number of attributes and definitions) are as followed:

Chart_events (total): 786
- Chart_events (numeric): 734 - All the numerical charted information for patients (E.g. heart rate, peep, cvp, etc.)
- Chart_events (categoric): 52 - All the categorical charted information for patients (E.g. ventilation mode, airway, etc.)

Medication_events: 52 - All the iv-drip-infusion (sedation and inotropes) information for patients (E.g. Propofol, Fentanyl, etc.)

Patient_events: 6 - All the basic demographic information for patients (E.g. medical record number, sex, etc.)

Lab_events: 63 - All the chemical information for patients (E.g. Chloride, Sodium, pH, etc.)

Group_events (total): 74 - All the group-of-variables pre-defined by the medical staff. Unlike the other event types, this returns more than a single attribute. For example, sedation will return all the propofol, fentanyl, etc. that the patient has taken.
- Sedation: 8
- Inotropes: 14
- Antibiotics: 46
- Thromboebolic_prophylaxis: 6

Total: 981 attributes

For example, for a condition, there can be a patient_event (age > 30), or a chart_event (heart rate > 60), or a medication_event (propofol > 1), etc. Note that a chart_event can either be numeric (heart rate > 60) or categoric (ventilation mode = PS). Furthermore, the conditions can be connected by logical operator (AND / OR).

Similarly, for an answer, there can a patient_event (all values of mrn), or a chart_event (all values of heart rate), or a medication_event (all values of propofol). One thing extra is the inclusion of group_event. So the user can retrieve, not just one, but many pre-defined groupings values witin an attribute. For example, all values of sedation will return all the sedation group, including propofol, fentanyl, morphine, etc. Furthermore, each answer object contains its corresponding reference entity (all values, any value, last value) and statistical entity (mean, sd, max, min, range, mode, etc).

The medical groupings (Sedation, Inotropes, Antibiotics and Thromboebolic prophylaxis) are defined by Angela from RPAH and are the same for the auto-population project and WRIS project.

After the semantic parsing, these condition and answer objects are passed to the SQL generator, which produces the corresponding atomised query tree. Basically, a complex large query is spilt into many separate simpler queries, the individual answers are then joined to compute the final results. The performance issue to note is as followed:

pid = patient identifier, and this is an index used within the database. This is different to the medical record number that the staff uses. To enhance performance, queries should be split according to pid for the archival database, and gprid (global patient record identifier) for the real-time database.

The improvement in speed for the archival database is about 2-3 times faster, as it is no longer needed to wait for more than 1 minute for any query in the archival database. For the real-time database, the improvement is not significant.

After the SQL generator creates the query, it is passed to the database transceiver, which sedns the queries to be executed by the DBMS software. The results (in an array) are then passed to the response generator, which creates the corresponding result objects. Again, this is an object-oriented approach. So each result has an attribute name (heart rate), a type (such as a chart_event, etc.) and its corresponding values, mrn, and chart-time.

These result objects (all stored in a single class called Results), all finally passed back to the interface where the values are displayed (in David's interface).

That's the overall structure of the current version of CDAL. The prototype has now been completed.

One more thing that may be added (if time permits and if we have ideas) is the retrieval of freetext_event.

Tuesday, September 25, 2007

Plans for coming 5 weeks

1. Updating the built-in dictionary used by CDAL - from the list mentioned above (0.5 week: in semester break).
2. Initial Implementation - My work (database) and David's work (interface) need to be implemented together (0.5 week: in semester break).
3. Explore further areas in CDAL - data mining from free-text fields in database. Note that this is not expected to be implemented into CDAL, but only provide an idea in this area. (0.5 week: week 10)
4. Further Testing - At the moment, the CDAL prototype has only been informally tested. The tests performed so far are non-systematic and non-automated. The final CDAL version must be tested for completeness and soundness, using automated tests and must follow a properly designed test model (1 week: week 10-11).
5. Final Implementation - My work (database) and David's work (SNOMED-CT) need to be implemented together (0.5 week: week 11)
6. Demo - Final version of CDAL need to be presented to Jon and hospital staff (1 day: week 12).
7. Documentation - User manual and thesis need be completed (1 week: week 12).

Work done Up to 25/9

SQL Generation

The CDAL prototype is now completed. This prototype is only connected to the ISM and the GICU real-time, and only a limited variables across the different tables can be retrieved.
However, all the major categories can now be extracted, and include the followings:

1. Patient event - All the basic demographic information for patients (E.g. medical record number, sex, etc.)
2. Chart event (Numerical) - All the numerical charted information for patients (E.g. heart rate, peep, cvp, etc.)
3. Chart event (Categorical) - All the categorical charted information for patients (E.g. ventilation mode, airway, etc.)
4. Medication event - All the iv-drip-infusion (sedation and inotropes) information for patients (E.g. Propofol, Fentanyl, etc.)
5. Laboratory event - All the chemical information for patients (E.g. Chloride, Sodium, pH, etc.)
6. Group event - All the group-of-variables pre-defined by the medical staff. Unlike the other event types, this returns more than a single attribute. For example, sedation will return all the propofol, fentanyl, etc. that the patient has taken.


Dictionaries

A list of database terms has been mapped to the terminologies that doctors use, and include all their corresponding synonyms and abbreviations. Please see attached. All terms on this list can now be extracted by the CDAL prototype.

Monday, September 17, 2007

Work done Up to 17/9

SQL Generation

The SQLGenerator has been implemented on both the real-time and the archival databases. This prototype now allows user to make any query involving any attribute that is defined by the underlying dictionaries. The results extracted from both databases are then combined and shown to the user in a text-based interface.

In addition, one more event type called "Group Event" has been defined. Unlike the other event types (such as Chart event, patient event, etc) which returns a single attribute each time, the group event type returns an "aggregated results", meaning a group of pre-defined attributes are returned to the user.

The most common group events are sedation and inotropes, and a typical clinical question that a physician may ask is: "For each patient in the GICU, find the time and dosage of all the sedation that the patient had taken during the last 24 hrs."

The SQLGenerator will then output in the format:
[patient's] [sedation] [dosage] [chart-time]


User Interface
The current user interface should contain the following features:
1. Automatically update the query as the user types and makes selection (with the use of AJAX).
2. Check whether the variable names that the user entered are found by the SNOMED-CT server.
3. Map the terms entered by the user to the underlying database terms.
4. Trace a variable using the SNOMED-CT server in the case that a variable name is not defined.
5. Display the query result in tabular format.

At the moment, feature 1 is completed. Features 2 - 5 are in progress. Feature 2 is currently implemented with a dictionary replacing the SNOMED-CT server.

Monday, August 27, 2007

Work done Up to 28/8

SQL Generation
The SQLGenerator has been extended to include the following categories in the answer and condition part of the query:
1. Patient event - All the basic demographic information for patients (E.g. medical record number, sex, etc.)
2. Chart event (Numerical) - All the numerical charted information for patients (E.g. heart rate, peep, cvp, etc.)
3. Chart event (Categorical) - All the categorical charted information for patients (E.g. ventilation mode, airway, etc.)
4. Medication event (drip) - All the iv-drip-infusion (sedation and inotropes) information for patients (E.g. Propofol, Fentanyl, etc.)
5. Medication event (dose) - All the dosage (antibiotics and thromoebolic prophylaxis) information for patients (E.g. Panadol, etc.)
6. Laboratory event - All the chemical information for patients (E.g. Chloride, Sodium, pH, etc.)
7. Output event - All the output information for patients (E.g. urine, etc.)

At the moment, categories 1,2,3,4 are completed. Categories 5,6,7 are in progress.

User Interface
The user interface has been extended to include the following features:
1. Automatically update the query as the user types and makes selection (with the use of AJAX).
2. Check whether the variable names that the user entered are found in the dictionary.
3. Display the query result in the format selected by the user (E.g. Table, List, etc.)

At the moment, features 1,2 are completed. Feature 3 is in progress. Feature 2 will later be implemented with the SNOMED-CT server to replace the dictionary.