Best Practices in Teaching Residents on Internal Medicine Wards

Background: Faculty development in clinical teaching can be challenging due to time constraints with limitations of evaluative and educational processes. Purpose: The objective was to determine what residents value most in clinical teachers and if the Stanford Faculty Development Programs’ (SFDP) evaluation tool captures all important items. Methods: We analyzed three years of learner comments according to 52 teacher themes. Forty-nine themes came from a contemporary, comprehensive literature review. Three novel themes were added based upon a faculty survey. Wilcoxon’s rank-sum tests were used to compare each theme between the highest and lowest rated faculty. A linear regression model was utilized to determine how well the SFDP evaluation categories correlated with a single-item measure of overall teaching effectiveness for each theme calculating differences between observed and predicted scores (residuals). In order to identify whether any of the themes contribute to overall teaching effectiveness, but were not captured by the SFDP tool, we used Spearman’s coefficients to assess the correlation between the residuals, the seven SFDP categories, and the 52 themes. Results: Eight themes were significantly different between the lowest and highest rated faculty. These themes included: "practices evidence-based medicine," "shows enthusiasm for medicine," "dedicates time to teaching," "demonstrates enthusiasm for teaching," "supervised adequately," "demonstrates knowledge of teaching skills," stimulates or inspires trainees’ thinking," and "acts as a role model." The regression model explained 93.5% of the variability in the single-item scores between the two groups. Two themes ("is efficient and "acts as a role model") were significantly correlated with the residuals from the regression model. However, both correlated with at least one SFDP category. Kohler R, Fadel W, Wright C, Hui S, Tierney W MedEdPublish https://doi.org/10.15694/mep.2016.000009 Page | 2 Conclusions: Residents value certain teaching themes more highly than others, and these should be emphasized in faculty development. The SFDP tool excels in capturing good teaching traits.


Introduction
Best teaching practices for teaching in clinical medicine has been the focus of a large body of literature.Sutkin et al. (Sutkin, Wagner, Harris, & Schiffer, 2008) specifically addressed the question of "What makes a good clinical teacher in medicine?"Their systematic review of nearly 5000 publications between 1909 and 2006 found a paucity of objective "outcome" measures were employed.Of the 68 articles they ultimately chose to analyze, about half were essays or transcriptions of oral presentations by renowned educators.Most of the others were based on surveys of learners or ratings and rankings by learners of their opinions regarding their teachers.The authors identified 480 descriptors of ideal teacher characteristics and classified these into 49 common themes and three main categories (Appendix 1).They found that approximately two-thirds of these themes were what they termed "noncognitive" and, in general, involved relationship skills, emotional states, and personality types.Conversely, "cognitive" themes included perception, memory, judgment, reasoning, and procedural skills.Given the preponderance of noncognitive themes, the authors concluded that faculty development programs and future research should focus on development of the noncognitive attributes of clinical teachers.
In our institution, we have found noncognitive attributes to be difficult foci for faculty development activities.We have widely adopted the Stanford Faculty Development Program's (SFDP's) framework (Appendix 2) for clinical faculty development and gathering learner feedback .(Litzelman, Stratos, Marriott, & Skeff, 1998) Although our program's leaders have found the information returned by the SFDP's evaluation tool to be helpful in assessing faculty, occasionally, our faculty question the relevance of this evaluation tool.For example, we had received criticism of the usage of the evaluation question: "Used blackboard or other visual aids?"This led to the modification of this question to delete the blackboard portion.Moreover, the SFDP framework seems to focus heavily on the "teacher" category that Sutkin et al. (Sutkin et al., 2008) (Appendix 1) proposed with less emphasis on the other themes and categories.
Throughout the literature, there have been critics and proponents of multidimensional instruments and similar positive and negative opinions regarding using a global rating.(Berk, 2013;Bierer & Hull, 2007) The Michigan Global Rating Scale (GRS) was shown to have a positive linear relationship with the seven dimension SFDP framework and correlation coefficient ranging from .86 to .98 .(Williams,Litzelman, Babbott, Lubitz, & Hofer, 2002) Therefore, we concluded that a global rating score could be utilized to assess whether the SFDP tool was capturing definable themes that correlate with overall teaching effectiveness (as assessed by resident learners).
The purpose of our study was to: 1) expand on the themes described by Sutkin (Appendix 1) based on input from our highly rated faculty, 2) extract such themes from our residents' written comments in their evaluations of our teaching faculty, 3) determine the characteristics that separated our most highly rated teachers from the poorly rated ones, 4) investigate how much the individual components of the SFDP teaching evaluation tool contribute to overall teaching quality as judged by resident learners, and 5) explore themes that were not captured by our faculty evaluation tool through correlation with any of the expanded Sutkin themes.

Methods
Upon approval of the institutional review board, this study was conducted at Wishard Memorial Hospital.Wishard is a 300-bed public, teaching hospital affiliated with the Indiana University School of Medicine (IUSM) and located on the Indiana University Medical Center campus in Indianapolis, Indiana.At the time of the study, the general medicine ward service consisted of six teaching teams that each included a senior resident, two interns, a fourth year medical student, three third year medical students, and an attending faculty general internist.The general medicine ward service accepts all general and subspecialty medicine admissions.The average team census was 14 patients, with a maximum of 20.In general, faculty physicians spent 14-17 consecutive days on the service, during each teaching rotation, to optimize both continuity of their teaching of residents and students and continuity of patient care.

Faculty evaluations and ranking
At the end of the residents' rotations at this hospital, they were encouraged to evaluate each attending physician who was assigned to their team using a validated tool which is based on the seven-category Stanford Faculty Development Program's (SFDP's) framework (Appendix 2).(Litzelman et al., 1998) Residents completed evaluations electronically.The tool consisted of three components: 1) questions that addressed the seven SFDP categories, 2) a single-item "overall rating," and 3) comments section.The seven SFDP evaluation categories are: 1) learning climate, 2) control of session, 3) communication of goals, 4) promoting understanding and retention, 5) evaluation of learners, 6) providing feedback, and 7) promoting self-directed learning.Each item within the categories is scored on a Likert rating scale from 1 (worst) to 5 (best), and a mean score is calculated for each category.In addition to rating faculty performance in these seven categories, residents were also asked to provide a summative quantitative overall rating of their attending physicians using the same Likert scale.Finally, residents were also encouraged to provide written comments.During the three-year study period, all attending physicians assigned to Wishard Hospital's teaching general medicine ward services were eligible for this study.For the purposes of this study, all faculty members were ranked using the single-item measure of overall teaching effectiveness.

Expansion of Sutkin's Themes
We started with Sutkin's 49 themes to identify attributes that potentially correlate with the performance of teaching faculty, However, we realized that using an existing coding system can impede the development of new ideas about how to evaluate and improve teachers.(Strauss & Corbin, 1990) Therefore, we surveyed the highest scoring faculty in the top quartile to try to uncover concepts that were not captured by Sutkin's themes.We accomplished this by asking open-ended questions about their practices, the responses to which were grouped into general themes (see Appendix 3).Those themes were compared to those of Sutkin.If the theme did not readily fit into a Sutkin theme then a new theme was designated.Three new themes were identified and added to the 49 Sutkin themes.These themes were "Practices Evidence-Based Medicine," "Is Efficient," and "Dedicates Time to Teaching."All 52 themes were also expanded to allow negative attributes.For example, if the positive attribute was "is efficient," the negative attribute was "is NOT efficient."Notably, Sutkin did not count the multitude of themes that fall under the "other" category amongst the 49 (See Appendix 1).Therefore, to avoid confusion, we report our results similarly.However, for our qualitative analysis of written comments (see below), we separated the "other" category into its components.For example, "bedside teaching," "supervised adequately," and "flexibility" were given separate themes and are included in the "52 expanded themes" (i.e. the 49 themes plus the three themes that were added).Similarly, unique faculty behaviors (see Appendix 3) were added to the expanded Sutkin themes for the purposes of analysis of written comments.A unique behavior (e.g., reviewing radiographs or emphasizing physical exam findings) was retrospectively determined if our analysis of written comments yielded few or no data points in that theme (See Appendix 3).

Qualitative Analyses of Written Comments
A reviewer (CW), non-blinded to the tier of the faculty, utilized constant comparative analysis (Strauss & Corbin, 1990) to code each of the residents' written comments, as described below.Themes were extracted and coded as positive or negative from the residents' comments about their faculty's teaching performance.A second reviewer (RK), blinded to the extracted and coded themes of the first reviewer, reviewed a subsample of the comments to establish the inter-rater reliability of our coding method.The comparative analysis or coding process was carried out by reading each comment and attributing a code to each sentence or word.These codes represented a theme or idea with which each part of the data was associated.Each code was then classified as positive or negatives in one of the 52 expanded Sutkin themes.For example, the comment: "Dr.X is an excellent 1) educator, 2) physician, and 3) role model.I truly appreciated his 4) involvement with the students, as well as his 5) personal approach with each individual student" was coded to fit the Sutkin themes.In this comment, 1) codes to "demonstrates knowledge of teaching skills, methods, principles, and their application"; 2) codes to "demonstrates clinical and technical skill/competence clinical reasoning"; 3) codes to "acts as a role-model; 4) codes to "actively involves students"; and 5) codes to "provides individual attention to students."All of these were coded as positive.Multiple comments from a single resident that fit into a given theme were counted only once per faculty (i.e. if a resident stated that the faculty was efficient in 3 different ways, it was only counted once).

Statistical Analysis
Faculty gender, experience, and quantity of evaluations were compared using Fisher's exact test and Wilcoxon's rank sum test.The reliability of the two raters' codes was assessed using a weighted Kappa statistic.All subsequent analyses were performed with the faculty member as the unit of analysis.For each faculty member, the residents' ratings in the SFDP categories were averaged over three years.To quantify the Sutkin themes, we computed the difference between the proportions of the positive and negative comments for each theme for each faculty member.Since each measure for a faculty member was the mean of multiple scores, these measures were treated as continuous variables in all subsequent analyses.We ranked all faculty members using the single-item measure of overall teaching effectiveness.To identify which characteristics separated our most highly rated teachers from the poorest rated teachers, we used Wilcoxon's ranksum test to compare each of the expanded Sutkin themes between the highest and lowest quartiles of faculty members.
In order to determine how well the seven SFDP categories explain overall teaching effectiveness, we fitted a linear regression model using the single-item measure of overall teaching effectiveness as the dependent variable and the seven SFDP category scores as the independent variables.We then computed the residual, or difference of the observed and predicted scores, for each faculty member in order to identify variations that remained unexplained by the SFDP tool in rating the teachers' overall effectiveness.Larger residuals would indicate that factors other than the seven SFDP tool categories were contributing to that faculty member's measure of overall teaching effectiveness.Spearman's correlation coefficients were calculated between the residuals and the expanded Sutkin themes in order to identify whether any of the expanded Sutkin themes contributed to overall teaching effectiveness and were not captured by the SFDP tool.All analyses were performed using SAS version 9.3 (SAS Institute, Cary NC).

A total of 47 faculty members rounded at Wishard Hospital on General Medicine Wards between 2006 and 2009.
There was no difference between high and low-tier faculty with regards to gender, experience, or the quantity of resident evaluations (Table 1).Generally, our faculty received high marks on evaluations.The overall teaching effectiveness score ranged from 3.50 to 4.93 (with the maximum being 5) with a standard deviation of 0.33.There was an 81% completion rate of resident evaluations of faculty.Residents provided written comments for 843 (89%) of their completed evaluations.
With regards to the expansion of the themes described by Sutkin et al., (Sutkin et al., 2008) Appendix 3 demonstrates the results of the faculty survey designed to identify additional teacher themes not distinctly represented in the review article.Notably, consistent with the literature, (Ende, 1997) our faculty employed diverse teaching and patient care practices.The emergent attributes included: paying attention to time, seeing patients without the entire team when needed, being flexible, and limiting competing activities.Moreover, many of these faculty chose to comment on directly engaging team members with regards to the timing of rounds, setting expectations from day 1, and mutually developing goals.Ultimately, in our mapping these responses to Sutkin's themes, the following codes were added for qualitative analysis as they did not clearly fit into the 49 themes that Sutkin outlined: 1) "practices evidence-based medicine," 2) "is efficient" ("physician characteristics"), and 3) "dedicates time to teaching" ("teacher characteristic").Importantly, per our definition, "dedicates time to teaching" does not translate to spending more time with the team but a concerted effort to impart clinical knowledge whether it is at the bedside or classroom--formal or informal.
In the analysis of the inter-rater reliability of assigning comments to Sutkin's themes, coding of 45% of the residents' comments by both reviewers yielded a Kappa of 0.64 (95% confidence interval of 0.61 to 0.67).This implies that there is substantial agreement between raters.(Landis & Koch, 1977) Furthermore, the analysis of resident comments demonstrated 8 characteristics from among the expanded 52 Sutkin themes that are significantly different between the lowest and highest ranked faculty (Table 2).The majority of the overall teaching score was accounted for by the SFDP categories Learning Climate, Communication of Goals, and Evaluation.However, the SFDP categories are so highly correlated with each other, we cannot reliably determine which truly influence the overall score the most.For the most part, any of the expanded Sutkin themes that correlated with the single-item score were also correlated with at least one of the seven SFDP categories.
The SFDP tool does an excellent job of capturing good teaching traits.The regression model predicting overall teaching effectiveness from the seven SFDP categories explains 93.5% of the variability in the single-item overall teaching effectiveness scores.We were only able to identify two themes that were significantly correlated with the residuals from the regression model; however, both themes were also correlated with at least one of the seven SFDP categories.These themes may help to explain any discrepancies between the sum of the seven SFDP categories and single-item measure of overall teaching effectiveness.The physician characteristic identified by these analyses was "Is efficient" (p=0.047) and the human characteristic identified was "Acts as a role model" (p=0.041)."Is efficient" was correlated with the SFDP categories "Control of Session" and "Self-Directed Learning" (p<0.0001 and p=0.043, respectively) while "Acts as a role model" was correlated with "Learning Climate" (p=0.039).

Discussion
The faculty survey and analysis of learner comments allowed for a reexamination of our current evaluation process.
Our data confirm the finding (Williams et al., 2002) that there is a strong correlation between a single-item overall global rating score and the seven dimension SFDP framework.Furthermore, despite the multitude of expanded Sutkin themes, we were unable to identify any themes that were not correlated with the single-item score or at least one of the seven SFDP categories.Importantly, our study suggests which teaching themes may be the most critical to a resident's evaluation of faculty performance.
Among our attending physicians, we found eight themes in three domains -"physician," "teacher, "and "human" that separated teachers in the highest rated quartile from teachers in the lowest quartile.Three of these themes were not part of the original Sutkin article and are based on a survey of our high-tier faculty.Moreover, the determination of these discriminatory themes was allowed by an analysis of learner comments in conjunction with the SFDP tool and single-item score.The value of analysis of comments has been similarly demonstrated in the area of patient comments and routinely high patient satisfaction scores (Harris, 1996).Importantly, the themes we have identified may serve as special foci for faculty development activities and possibly help guide interventions aimed at improving clinical teaching.
It is intuitive but, we believe, critically important to point out that "dedicates time to teaching" is a strong discriminator of teaching scores.It is clear that especially when the time constraints on faculty are increasing due to greater documentation requirements and resident work hour limits, stronger faculty devote specific time to teaching.There is no shortcut to quality teaching.Additionally, our data show that our residents appreciate when their teachers "demonstrate knowledge of teaching skills, methods, principles, and their application."A structured faculty development program is critical to the success of medical school faculty members.(Wilkerson & Irby, 1998) We are strong proponents of the Stanford Faculty Development Program, and we believe that this program can address many of the identified themes.
Based on our results (Table 2), we need to ensure that faculty are continually updating their medical knowledge and practicing evidence-based medicine (EBM).With regards to EBM, there is no shortage of data that demonstrates that training courses can improve literature searching, critical appraisals, (Davis, Crabb, Rogers, Zamora, & Khan, 2008;Harewood & Hendrick, 2010;Thangaratinam et al., 2009) or even change clinical practice.(Straus, Ball, Balcombe, Sheldon, & McAlister, 2005) However, a recent systematic review of EBM "teach the teacher" courses concluded that there were no specific assessment tools for evaluating effectiveness of these courses, making it impossible to ascertain if they are having the desired effect.(Walczak et al., 2010) Therefore, we cannot endorse any specific EBM teaching model but highly recommend that faculty development programs specifically incorporate this topic.
Additional concrete recommendations for faculty, based on our data, should also include admitting to one's own fallibility and mutually exploring educational topics with residents.We believe this stimulates and inspires trainees' thinking.This is highly consistent with Skeff's program (Kelley M. Skeff, Stratos, & Mount, 2007; K. M. Skeff et al., 1997) in that highly rated teachers are not necessarily content experts but rather can admit limitations and model how they address knowledge gaps or get questions answered through self-directed learning.Also, there needs to be an emphasis on the role that enthusiasm for medicine and teaching plays in learners' experiences and evaluation of faculty.The teachers' excitement about what they do impacts how the learner engages in the learning process and includes not only the learning climate but also level of interest and involvement.(Baum, 2002;K. M. Skeff, 1988) "Adequate" supervision was important to our learners.It has been found that medical residents desire a pleasant learning environment with the ability to function independently and supervision from understanding, patient, communicative and collaborative physicians.(Busari, Weggelaar, Knottnerus, Greidanus, & Scherpbier, 2005) A large, interdisciplinary review of the literature (S.M. Kilminster & Jolly, 2000) concluded that the utmost important factor for the effectiveness of supervision is the supervision relationship, more so than any particular methods employed.There is a published guide that explores "helpful" and "ineffective" supervisory behaviors that could prove useful for faculty development in this area.(S.Kilminster, Cottrell, Grant, & Jolly, 2007) In our analysis, "Is efficient" and "Acts as a role model" were the two themes that significantly correlated with the residuals from the regression model that may explain discrepancies between the sum of the seven SFDP categories and single-item score.However, the meaning of this finding is indeterminate since one score has not been proven to be superior to the other.Moreover, both themes correlated with SFDP categories.Interestingly, "Acts as a role model" was a theme that discriminated our highly rated faculty from those rated lower.
The data showed that there was a correlation between Learning Climate (SFDP) and "Acts as a role model." The literature reflects that an excellent role model typically has greater assigned teaching responsibilities, spends additional time with house staff beyond teaching responsibilities (e.g.morning reports or teaching conferences), has had some formal training in teaching, teaches the psychosocial aspect of medicine, and spends 25 hours or more per week teaching and conducting rounds.(Wright, Kern, Kolodner, Howard, & Brancati, 1998) Furthermore, characteristics of good role models and strategies to improve role modeling have been described that can be useful for faculty development.(Cruess, Cruess, & Steinert, 2008) These characteristics fall in the domains of clinical competence (e.g.sound clinical reasoning), teaching skills (e.g.provides timely feedback), and personal qualities (e.g.compassionate).Interestingly, "makes time for teaching" is identified in this framework as a desirable attribute of a positive role model.

Limitations
This was a smaller study at a single teaching hospital to investigate the feasibility of this method to better delineate the characteristics that determine who is perceived as a high-quality teacher by residents.The sample size and the particular nature of our teaching wards may not make the results broadly generalizable, particularly on non-medicine services.(Elliot & Hickam, 1993) Additionally, a larger sample size might make it possible to distinguish between the various levels of residents (e.g.PGY-1 vs. PGY-3).Finally, we must always be cognizant of the halo (McGaghie & Frey, 1986) and ceiling effects (K.M. Skeff, 1983) that may affect the validity of evaluation data.

Implications
Our faculty survey and analysis of learner comments allowed the identification of eight themes that may strongly contribute to residents' perceptions on what makes a good teacher.We believe that these themes should be studied further to determine if they need to be emphasized for both faculty evaluation and development.However, each program may need to individualize their approach and content.Just like every learner is different, there is a milieu at each institution that defines the things that are important.

Human Characteristics
Acts as a role model 0.066 0.028 0.026

Figures
Figure 1.Linear regression model using the single-item measure of overall teaching effectiveness as the dependent variable (y-axis) and the seven SFDP category scores as the independent variables which constitute the predicted teaching effectiveness (x-axis).The "residual" is the difference between the observed and predicted scores.Larger residuals indicate factors other than the seven SFDP tool categories are contributing to that faculty member's measure of overall teaching effectiveness.

Appendix 2 :
Learner Evaluation (Stanford Category added for clarity) All questions must be answered before submitting Circle One: Scale: SD=Strongly Disagree, D= Disagree, N=Neither, A=Agree, Motivated learners to learn on their own SD D N A SA Encouraged learners to do outside reading SD D N A SA Comments:

Table 2
Eight characteristics among the 52 analyzed (see Appendix 1) for which the standardized scores were significantly different between the high-and low-rated groups.
three characteristics and 52 themes* Table reproduced and modified with permission from Sutkin.Practices Evidence-Based Medicine, Is Efficient, and Dedicates Time to Teaching were added to Sutkin's themes based on a high tier faculty survey.Sutkin did not count the "other" themes but these were used in the analysis where appropriate.Numbers are correlated with faculty responses in Appendix 3. *