As a lecturer at the Princeton School of Public and International Affairs, I teach econometrics and research methods at the intersections between data, education, and social justice, and how generative AI is reshaping the collecting experience. I spend a lot of time thinking. Analyzing data and using it for change.
My students are working toward master’s degrees in public affairs, and many are interested in pursuing careers in international and domestic public policy. The graduate-level econometrics course I teach is required and designed to develop analytical and critical thinking skills in causal research methods. Throughout the course, students are tasked with writing four memos on specified policy issues. Typically, we examine publicly available datasets related to societal concerns, such as determining the optimal criteria for loan forgiveness or evaluating the effectiveness of stop-and-frisk police policies.
To better understand how to use generative AI effectively and prepare students to apply these tools to the data-related jobs they encounter in their post-graduate careers, they need to try them out for themselves. I felt that. So I set up an experiment where I performed one of the tasks I asked my students to do and used generative AI to complete it.
My goals were twofold. I wanted to experience what it felt like to use the tools my students had access to. And since we believe many students are now using AI for these assignments, we wanted to take a more evidence-based stance on whether we should change the way we grade.
I pride myself on assigning tasks that are both practical and intellectually challenging, but let’s be honest: AI tools consistently perform statistical analysis and make appropriate policy decisions based on the results. I didn’t have much faith in being able to create the connections needed to recommend .
Experimenting with code interpreters
My experiment explores how to create a grant program for health care providers to provide perinatal (before and after birth) services to women to promote infant health and reduce low birth weight. We recreated an assignment from last semester in which students were asked to imagine something. Students were given a publicly available dataset and asked to develop eligibility criteria by building a statistical model to predict low birth weight babies. They were required to substantiate their choices with reference to existing literature, interpret the results, provide relevant policy recommendations, and state their position.
As for tools, I decided to test ChatGPT’s new code interpreter. This is a tool developed to allow users to upload data (in any format) and run code using a conversational language. I provided ChatGPT with the same guidelines I gave my students and uploaded the dataset to the code interpreter.
First Code Interpreter has broken down each task into smaller pieces. Then, after selecting the variables for the statistical model (or criteria for the perinatal program), they were asked whether they wanted to continue with the analysis. (See Task Analysis and Variables below.)

After running the statistics and analyzing and interpreting the data, Code Interpreter produced a memo with four policy recommendations. While the recommendations were robust, the tool did not provide references to prior literature or direct relationships to results. I also couldn’t make statements about position. Part of that depended on students reflecting on their own backgrounds and experiences and considering the biases they might bring, and the tools didn’t allow for that.

Another drawback is that each part of the assignment is presented in separate chunks, so I found myself returning to the tool many times to seek clarity on omitted elements or results. It quickly became clear that it was easier to manually combine the various elements myself.
Without the human touch, this memo would not have received a passing grade. The level was too high and a literature review with appropriate citations was not provided. However, if you sew all the pieces together, the quality of your work could definitely be worthy of a B.
Although Code Interpreter could not achieve a passing score on its own, it is essential to be aware of the tool’s current capabilities. They demonstrated the type of critical thinking skills I expect from my students by using conversational language to expertly perform statistical analysis and provide actionable policy recommendations. As the field of generative AI continues to advance, it’s only a matter of time before these tools consistently deliver “better” results.
How are you using the lessons learned?
Generative AI tools like the one I experimented with are available to my students, so I assume they are using them for assignments in my course. Given this pressing reality, it is important for educators to adapt their teaching methods to incorporate the use of these tools into the learning process. It’s difficult, if not impossible, especially considering: Current limitations of AI detectors, to distinguish between AI-generated content and human-generated content. That’s why I’m working to incorporate exploration of generative AI tools into my courses, with an emphasis on critical thinking and problem-solving skills. We believe these will continue to be the keys to success in the workforce.
As we considered ways to incorporate these tools into our curriculum, two paths emerged. I can help teach students how to use AI to generate initial content and then review and enhance it with human input. This is especially beneficial when students encounter writing blocks, but it can inadvertently stifle creativity. Conversely, I can support students in creating original work and then leveraging AI to enhance it.
I’m drawn to the second approach, but both emphasize the importance of writing, critical thinking, and computational thinking to help students work effectively with computers, which are central to the future of education and the workforce. We recognize that we need to develop new skills.
As an educator, it’s important for me to not only make sure learning is happening, but also to understand what tools exist, what benefits and limitations they have, and most importantly, , we have a duty to stay on top of how our students learn and stay informed about the latest developments in generative AI. you may be using them.
However, it is also important to recognize that the quality of work produced by students may require higher expectations and adjustments to marking methods. The baseline is no longer zero, it’s AI. And the upper limits of what humans can accomplish with these new capabilities remain uncharted territory.