维陶塔斯·扬考斯卡斯与斯特芬·埃克哈德合著，《国际组织中的评估政治学》（牛津大学出版社，2023年）

本集简介

评估已成为衡量国际组织绩效、促进学习及展现问责制的关键工具。联合国系统内，每年有数千名评估师和顾问产出价值数百万美元的上百份评估报告。但评估真能兑现其提供客观证据与实用价值的承诺吗？《国际组织中的评估政治》（牛津大学出版社，2023年）通过剖析国际组织评估体系的内在机制，挑战了将评估视为价值中立活动的传统认知。Vytautas Jankauskas博士与Steffen Eckhard博士揭示了这一表面中立的专业技术工具如何成为全球治理中的权力手段；他们论证并阐释了政治因素如何深植于评估利益相关者的诉求、国际组织评估体系的控制与设计之中，并在较小程度上也体现在评估报告内容里。该研究基于对评估师、成员国代表及国际组织秘书处官员的120次访谈，以及对200余份评估报告的文本分析，覆盖21个联合国系统组织，包括国际劳工组织、国际货币基金组织、联合国开发计划署、联合国妇女署、国际移民组织、联合国难民署、粮农组织、世界卫生组织和联合国教科文组织的详细案例研究。在揭示循证决策（不）有效性的同时，作者提出了如何更好地调和评估政治现象与收集可靠证据以改进联合国运作的需求。应对评估政治的答案并非放弃评估或将其与利益相关者隔离，而是承认周遭政治利益并据此设计评估体系。本次访谈由Miranda Melcher博士主持，其即将出版的新书聚焦战后军事整合，通过安哥拉与莫桑比克内战的定性分析，探讨内战背景下的条约谈判与执行。了解广告选择，请访问 megaphone.fm/adchoices 支持我们的节目，请升级为高级会员！https://newbooksnetwork.supportingcast.fm/political-science

Evaluation has become a key tool in assessing the performance of international organisations, in fostering learning, and in demonstrating accountability. Within the United Nations (UN) system, thousands of evaluators and consultants produce hundreds of evaluation reports worth millions of dollars every year. But does evaluation really deliver on its promise of objective evidence and functional use? By unravelling the internal machinery of evaluation systems in international organisations, The Politics of Evaluation in International Organisations (Oxford University Press, 2023) challenges the conventional understanding of evaluation as a value-free activity. Dr. Vytautas Jankauskas and Dr. Steffen Eckhard show how a seemingly neutral technocratic tool can serve as an instrument for power in global governance; they demonstrate and explain how deeply politics are entrenched in the interests of evaluation stakeholders, in the control and design of IO evaluation systems, and to a lesser extent also in the content of evaluation reports. The analysis draws on 120 research interviews with evaluators, member state representatives, and IO secretariat officials as well as on textual analysis of over 200 evaluation reports. The investigation covers 21 UN system organisations, including detailed case studies of the ILO, IMF, UNDP, UN WOMEN, IOM, UNHCR, FAO, WHO, and UNESCO. Shedding light on the (in-)effectiveness of evidence-based policymaking, the authors propose possible ways of better reconciling the observed evaluation politics with the need to gather reliable evidence that is used to improve the functioning of the United Nations. The answer to evaluation politics is not to abandon evaluation or isolate it from the stakeholders but to acknowledge surrounding political interests and design evaluation systems accordingly. This interview was conducted by Dr. Miranda Melcher whose forthcoming book focuses on post-conflict military integration, understanding treaty negotiation and implementation in civil war contexts, with qualitative analysis of the Angolan and Mozambican civil wars. Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm/political-science

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

大家好。

Hello, everybody.

Speaker 0

我是马歇尔·坡。

This is Marshall Poe.

Speaker 0

我是新书网络的创始人和编辑。

I'm the founder and editor of the New Books Network.

Speaker 0

如果你正在收听这个节目，你应该知道新书网络是全球最大的学术播客网络。

And if you're listening to this, you know that the NBN is the largest academic podcast network in the world.

Speaker 0

我们的全球听众数量达到200万人。

We reach a worldwide audience of 2,000,000 people.

Speaker 0

你可能已经有一个播客，或者正考虑开设一个播客。

You may have a podcast or you may be thinking about starting a podcast.

Speaker 0

如你所知，这其中存在挑战。

As you probably know, there are challenges.

Speaker 0

主要分为两类。

Basically, of two kinds.

Speaker 0

第一类是技术问题。

One is technical.

Speaker 0

你需要掌握相关知识才能制作和分发你的播客节目。

There are things you have to know in order to get your podcast produced and distributed.

Speaker 0

第二类，也是最关键的问题，就是你需要获取听众。

And the second is, and this is the biggest problem, you need to get an audience.

Speaker 0

在当今的播客领域，积累听众是最困难的事情。

Building an audience in podcasting is the hardest thing to do today.

Speaker 0

请记住，我们NBM已经启动了一项名为NBN Productions的服务。

Put this in mind, we at the NBM have started a service called NBN Productions.

Speaker 0

我们的服务包括协助您创建播客、制作播客、分发播客，并为您托管播客。

What we do is help you create a podcast, produce your podcast, distribute your podcast, and we host your podcast.

Speaker 0

最重要的是，我们会将您的播客推送给NBN的听众群体。

Most importantly, what we do is we distribute your podcast to the NBN audience.

Speaker 0

我们已为众多学术播客多次提供这项服务，很乐意也能帮助您。

We've done this many times with many academic podcasts, and we would like to help you.

Speaker 0

如果您有兴趣与我们探讨如何为您的播客提供支持，请联系我们。

If you would be interested in talking to us about how we can help you with your podcast, please contact us.

Speaker 0

只需访问New Books Network首页，您就能看到NBN productions的链接入口。

Just go to the front page of the New Books Network, and you will see a link to NBN productions.

Speaker 0

点击填写表格后，我们即可展开对话。

Click that, fill out the form, and we can talk.

Speaker 0

欢迎来到New Books Network。

Welcome to the New Books Network.

Speaker 1

大家好，欢迎收听New Books Network的新一期节目。

Hello, and welcome to another episode on the New Books Network.

Speaker 1

我是主持人之一，Miranda Melcher博士。

I'm one of your hosts, doctor Miranda Melcher.

Speaker 1

今天我非常高兴，因为我们将讨论一本我认为提出并回答了极其重要问题的著作。

And I'm very pleased today because we get to talk about a book that I think asks and answers an incredibly important question.

Speaker 1

虽然是个非常书呆子气的问题，但往往这类问题才最有价值。

Granted, a very nerdy question, but sometimes those are the best ones.

Speaker 1

这本书已由牛津大学出版社出版，书名直截了当，叫做《国际组织中的评估政治》。

So the book has come out from Oxford University Press and is straightforwardly titled The Politics of Evaluation in International Organizations.

Speaker 1

书中提出的核心问题其实是：等等，评估到底是什么？

And the question it's asking is very much around kind of, hang on, what is evaluation?

Speaker 1

为什么这么多国际组织都在做评估？

Why are so many international organizations doing it?

Speaker 1

花费这么多时间和金钱评估各种项目，实际效果究竟如何？

And what is actually happening with all of this time and money spent evaluating various projects?

Speaker 1

评估有用吗？

Does it do anything?

Speaker 1

如果有用，具体发挥了什么作用？

If so, what is it doing?

Speaker 1

正如我所说，这是个相当书呆子气的问题，但却具有重要现实意义，涉及诸多不同议题。

So as I said, this is a pretty nerdy question, but has some really important implications and relates to a whole bunch of different topics.

Speaker 1

很高兴能邀请到本书两位作者之一——Vitus Jankauskas做客本期播客。

So I'm very pleased to welcome one of the two authors of the book to the podcast, Vitus Jankauskas.

Speaker 1

另一位合著者是Stefan Eckhardt博士。

And the book is also written by doctor Stefan Eckhardt.

Speaker 1

Vitus，非常感谢你参与我们的播客节目。

Vitus, thank you so much for being with us on the podcast.

Speaker 2

非常感谢您的邀请。

Many thanks for having me.

Speaker 2

这是我的荣幸。

It's a pleasure.

Speaker 1

能否请你先简单介绍一下自己和你的背景，并说明为什么决定写这本书，以及为何与Stefan合作撰写？

Could you please start us off by introducing yourself and your background a little bit and explain why you decided to write this book and why you and Stefan decided to write it together?

Speaker 2

好的，没问题。

Yes, sure.

Speaker 2

我是Witters Jankauskas，来自立陶宛，这个复杂的姓氏就是源于那里。

So I'm Witters Jankauskas, and I'm actually originally from Lithuania, where the complicated name comes from.

Speaker 2

但过去十年我一直居住在德国。

But I've been living for the past ten years in Germany.

Speaker 2

我在慕尼黑大学先后完成了国际关系学士学位、政治学硕士学位以及政治学博士学位。

That's also where I did my bachelor's in international relations and then later my master's in political science and my PhD in political science at the University of Munich.

Speaker 2

目前我仍是慕尼黑路德维希大学的副研究员。

And I'm still an associate research fellow at the University of Munich, Ludwig University of Munich there.

Speaker 2

同时我也是腓特烈港齐柏林大学的博士后研究员，那里是德国南部毗邻美丽康斯坦茨湖的绝佳去处，大家都推荐去那里。

And I'm also a postdoctoral research fellow at the Zeppelin University in Friedrichshafen, which is a very nice place in the South Of Germany, next to the beautiful Lake Of Konstanz, a place which everyone recommends to go.

Speaker 2

有人去度假，也有人在那里工作。

People go there for holidays, others work there.

Speaker 2

我的杰出合著者Stefan Eckert教授就在该校任职，他专攻公共行政与公共政策领域。

And that is also where my great co author works, Stefan Eckert, who is a professor for public administration and public policy.

Speaker 2

我们共同撰写的这本书于去年刚刚出版。

And together, they wrote this book, and it came out last year, so just recently.

Speaker 2

不过整件事要追溯到更早之前。

But, you know, everything began quite a while ago.

Speaker 2

确切地说，大概始于2016年5月左右。

Actually, I would say probably somewhere around May 2016.

Speaker 2

就在那时，Stefan正在布鲁塞尔的北约总部进行采访。

And that is when Stefan was conducting interviews in Brussels at the headquarters of NATO.

Speaker 2

Stefan现在布鲁塞尔，他请北约工作人员描述证据在这个军事联盟中的作用——北约确实如此。

So Stefan is in Brussels, and he is asking NATO staff members to, among many things, to describe the role of evidence in the military alliance, which NATO is right.

Speaker 2

你知道，就在那时我们收到了一个颇具争议的回答，这某种程度上激励我们更深入地研究这个话题。

And, you know, this is when we received a very controversial, I would say, answer, which somehow inspired us to look more into this topic.

Speaker 2

回答是这样的：当他请一位北约工作人员描述证据在组织中的作用时，那人说：'在一个基于共识的组织里，对吧？'

And the answer was the following: when he asked one of the staff members at NATO to describe this role of evidence in the organization, the person said as: In an organization that is based on consensus, right?

Speaker 2

这就是北约的本质，对吧？

And that is NATO, what is it about, right?

Speaker 2

不存在客观信息。

There is no objective information.

Speaker 2

如果在联盟中达成一致的唯一方式是承认空气是平的，那我们就会说空气是平的，尽管我们知道并非如此。

If the only way to agree in a summit community is to say the airf is flat, we would say the airf is flat, even though we know it is not.

Speaker 2

对吧？

Right?

Speaker 2

这里的'我们'，那人指的是北约的官僚体系、行政机构。

And by we, the person meant the bureaucracy, the administration of NATO.

Speaker 2

因为最终重要的是共识，所以外界不存在客观信息。

So because the consensus is what ultimately counts, and so there is no objective information out there.

Speaker 2

这就是当时的陈述，对吧？

So that was the statement, right?

Speaker 2

这某种程度上让我们有些震惊。

And somehow it shooked us a bit.

Speaker 2

然而与此同时，这份声明与我们先前对官僚影响力及官僚可能持有的政治利益的研究反思高度共鸣，总的来说，这些政治因素围绕着国际政府组织。

However, at the same time, the statement resonated a lot with our previous reflections on the research studying bureaucratic influence and the political interest that bureaucrats may have, and in general, these politics surrounding international governmental organisations.

Speaker 2

于是我们将声明打印出来，贴在慕尼黑办公室的墙上，那时我和斯蒂芬一起。

So we printed the statement out, we put it on the walls at our offices back then in Munich, Stefan and I.

Speaker 2

后来斯蒂芬将这个想法发展成一个项目，并获得了德国研究基金会的资助。

And Stefan developed this idea into a project, which was then later funded by the German Research Foundation.

Speaker 2

我就是这样加入他的，我们开始了长达数年的联合国循证决策探索之旅，特别关注评估这一关键工具，试图回答诸如‘联合国如何运作’这类问题。

And so this is where I joined him, and we started this multi year journey down the rabbit hole of UN evidence based policymaking, looking specifically at evaluation as a key tool to answer such questions like, you know, how does the UN work?

Speaker 2

国际组织是如何运作的？

How do the international organizations work?

Speaker 2

它们是否发挥了应有的作用？

Do they perform as they should or not?

Speaker 2

总的来说，权力与证据或评估之间存在怎样的张力？毕竟这是基于证据的。

And in general, what is this tension between power and evidence or evaluation, right, as it is evidence based?

Speaker 2

评估能否向权力说出真相？

Can evaluation speak truth to power?

Speaker 2

那么，为何我们认为这很重要？

Now, why we thought it matters?

Speaker 2

首先，2015年——也就是北约那次著名访谈之前——联合国提出了可持续发展目标（SDGs），这套雄心勃勃的发展目标计划在2030年前实现。

Well, for one, a year ago in 2015, so before this famous interview of Stetson with NATO, the United Nations introduced the Sustainable Development Goals, right, which we all know SDGs, very ambitious set of development targets which should be achieved by 2030.

Speaker 2

许多人称赞其宏大愿景，认为这些目标非常美好。

And many said, well, it's great, they're ambitious, and, you know, these are the nice aims.

Speaker 2

但我们如何真正知道联合国是否在实现这些目标呢？

But how do we actually know whether the UN is achieving these objectives?

Speaker 2

对吧？

Right?

Speaker 2

我们如何了解进展？

How can we know about the progress?

Speaker 2

联合国给我们的答案就是评估，对吧？

And the answer that the UN would give us is evaluation, right?

Speaker 2

当时的联合国秘书长金·穆恩说过，在各个层面进行全面评估将在实施这一新发展议程中发挥关键作用。

So the UN Secretary General back then, Kim Mun, said that evaluation everywhere at every level will play a key role in implementing this new development agenda.

Speaker 2

后来现任秘书长安东尼奥·古特雷斯表示，我们需要一种文化评估——独立、客观且完全透明的评估。

And then later on, Antonio Guterres, the current Secretary General, said that we need a cultural evaluation, independent, you know, objective, full transparency.

Speaker 2

所以我们觉得这确实非常重要，对吧？

And so we thought, well, it's actually quite important, right?

Speaker 2

如果评估是工具，本应让我们客观了解联合国是否实现了这些可持续发展目标，这些项目和计划是否进展顺利，那我们确实应该知道它被政治化的程度。

If evaluation is the tool which should give us this objective knowledge about whether the UN is achieving these SDGs, whether all of these programs and projects are making good progress or no, we should actually know how politicized it is.

Speaker 2

这些背景理念正是促使斯特凡和我撰写这本书的动机。

And so that is sort of the background ideas which motivated Stefan and myself to write the book.

Speaker 1

真是有趣的背景故事。

What an interesting backstory.

Speaker 1

我很喜欢把东西打印出来贴在墙上，然后突然意识到：等等，这里到底发生了什么？这个想法很棒。

I love the idea of kind of printing something out and sticking it on the wall and going, hang on a second, what's actually happening here?

Speaker 1

感谢你为我们提供这些背景信息。

So thank you for giving us that background.

Speaker 1

既然你已经介绍了问题是如何形成的，关于书中整合所有思考后提出的核心问题，还有什么想补充告诉我们的吗？

Is there anything further you'd like to tell us about kind of the questions you sort of consolidated all of that thinking around in asking in the book, given that you've told us a bit about sort of how you developed them?

Speaker 2

是的。

Yes.

Speaker 2

所以，你知道，评估通常被认为是一种非常技术官僚的工具，对吧？

So, you know, evaluation is usually thought to be a very technocratic tool, right?

Speaker 2

某种程度上几乎像一种科学实践，因为根据定义，评估是一种基于研究的系统性评定，对吧？针对特定活动，无论是发展中国家的某个项目、某个计划，还是某个联合国机构的整个项目组合。

Sort of almost a scientific like exercise, because per definition, evaluation is a systematic assessment, research based, right, of a specific activity, be it a project in a developing country or a program or the whole portfolio of a specific UN agency.

Speaker 2

因此这是一个非常彻底的实践过程，会回答关于被评估活动效率的各种问题，比如项目的有效性、可持续性、影响力以及其他类似方面。

So it is a very thorough exercise where different questions are being answered regarding the efficiency of that evaluated activity, let's say a project effectiveness, sustainability, impact and other similar things.

Speaker 2

它基于已收集的研究数据和分析，对吧？

It is based on research which was collected, data analysis, right?

Speaker 2

这确实是相当复杂的工作。

It is really quite sophisticated thing.

Speaker 2

如果你看一份典型的联合国机构出具的评估报告，会发现那是非常冗长的文件，对吧？

And if you would look at a report, a typical evaluation report produced by the United Nations, one of the agencies, it is a very long report, right?

Speaker 2

报告大约有100页，你会读得晕头转向。

It's about 100 pages long, and you will lose yourself reading it.

Speaker 2

需要花时间才能理解内容，因为它太复杂太详细了。

Takes time to understand what's going on there because it is so sophisticated and so detailed.

Speaker 2

评估人员会走访受影响人群，采访那些生活在项目实施的村庄里的居民等等。

The evaluators would go to the affected populations, they would interview, you know, the people living in those villages where, let's say, the project was conducted, etc.

Speaker 2

所以其中包含大量知识。

So there is a lot of knowledge there.

Speaker 2

因此通常的假设是：这是现代公共管理中技术官僚主义的工具。

And so the typical assumption is that it is this technocratic tool of modern public management.

Speaker 2

然而在我们的思考中，正如书中的关键亮点所指出的，评估并非存在于真空中，对吧？

However, in our thinking, you know, we and that is one of the key highlights of the book is that evaluation does not exist in a vacuum, right?

Speaker 2

它被具有政治利益的利益相关者所包围。

It is surrounded by stakeholders, which have political interests.

Speaker 2

受影响群体有自己的利益诉求，成员国也有自身考量，国际组织的管理层、秘书处则持有与评估相关的特定利益。

Affected populations have their own interests, the member states have their own interests, the management of the international organization, the secretariat have their own evaluation related interests.

Speaker 2

因此我们认为实际上存在两种不同的评估视角：功能性视角和政治性视角。

And so we thought that there are actually two different perspectives on evaluation, the functional one, and then the second is the political one.

Speaker 2

我们并非首个指出评估具有政治属性的人，但希望着重强调这一点。

We are not the first to say that evaluation is political or can be political, but we wanted to really highlight that.

Speaker 2

这正是驱动我们在本书中研究评估政治学的兴趣所在。

And that was, you know, what was driving our interest in the politics of evaluation in this book.

Speaker 2

公共行政学著名学者维尔达夫斯基早在70年代就指出，评估可被用作政治斗争中的武器。

Vildavsky, a famous scholar of public administration, said already in the 70s that evaluations can be used as weapons in the political wars.

Speaker 2

可见这个讨论在公共行政和政治研究史上由来已久。

So, you know, the discussion goes quite a long time back in the history of public administration, political research.

Speaker 2

但我们感觉在国际层面，特别是涉及联合国和政府间国际组织时，关于评估政治性的讨论相对匮乏。

But we had this feeling that there is not so much about these political or these politics of evaluation at the international level when it comes to the UN, when it comes to the international governmental organizations.

Speaker 2

因此我们将研究聚焦于国际层面，探究政治因素在不同评估阶段的表现程度。

And so we focused on the international level, asking to what extent do we observe politics at different levels or steps of evaluation, right?

Speaker 2

我所说的政治视角证据，在评估体系制度设计中、评估人员的认知中、评估者群体内部、乃至评估报告内容里体现到什么程度？我们能否发现政治偏见？这些证据在评估使用方式中又如何显现？

To what extent evidence for the political perspective, which I described on evaluation, prevails, let's say, in the institutional design of how these evaluation systems are built, in the perceptions of evaluation staff, so in those evaluators, among the evaluators themselves, but also in the content of these evaluation reports, can we find political biases there, and also in the way how evaluations are used.

Speaker 2

那么这些报告究竟是如何被使用的？

So how are the reports used?

Speaker 2

我们能否找到这样的例子：它们实际上并非从功能视角出发——即课程修正、问责制和学习——而是像我提到的，更多关乎那些利益相关者的政治利益？

Can we actually find examples where it's not really about this functional perspective, meaning course correction and accountability and learning, but more about the political interests of those stakeholders, which I mentioned.

Speaker 1

这引出了许多非常值得探讨的问题，但我想这些问题也提高了讨论的价值，因为它们确实很棒。

This raises so many things that are so worth investigating and asking about, but I think also kind of raises the stakes because those are great questions.

Speaker 1

但它们能被解答吗？

But can they be answered?

Speaker 1

如何解答这些问题？

How can they be answered?

Speaker 1

我的意思是，这确实是我阅读本书开篇部分时产生的疑问。

I mean, that was certainly my question reading this initial section of the book.

Speaker 1

能找到哪些类型的例子？

What sorts of examples can be found?

Speaker 1

可以用哪些方法和数据来解答这个问题？

What sorts of methods and data can be used to answer this?

Speaker 1

或许你能带我们梳理一下研究的这些方面？

So could you maybe talk us through those aspects of the research?

Speaker 2

当然。

Sure.

Speaker 2

正如你正确指出的，这是个相当复杂的问题。

So as you correctly point out, right, it's quite a complex question.

Speaker 2

没错。

Right.

Speaker 2

因此它首先需要大量的概念性工作。

And it, of course, therefore demands, first of all, a lot of conceptual work.

Speaker 2

因此，Stefan和我花了很多时间思考如何在不同层面上概念化评估的政治学。

So Stefan and I spend a lot of time thinking about how can we conceptualize the politics of evaluation at different levels.

Speaker 2

我们可以稍后再深入探讨这个问题。

And we can delve into that later.

Speaker 2

但问题是，我们如何定义评估的政治学？

But the question, how do we define politics of evaluation?

Speaker 2

涉及哪些关键行为体等等？

What are the key actors involved, etc?

Speaker 2

因此这方面需要进行大量的概念性工作。

So that is a lot of conceptual work needed to be done there.

Speaker 2

当然，随后的问题是我们如何从经验上观察它？

And then later on, of course, the question of how can we empirically observe it?

Speaker 2

这本书基本上提供了首批关于国际组织中评估政治学的比较实证证据，基于混合方法研究设计，正如我们所定义的。

So this book basically provides one of the first comparative empirical evidence on evaluation politics, as we define it, in international organizations based on a mixed method research design.

Speaker 2

首先，我们说要明确研究的范围和规模，对吧？

So, first of all, we said, well, we need to define the scope and scale of our research, right?

Speaker 2

我们要研究哪些类型的组织？

What kind of organisations do we look at?

Speaker 2

我们主要关注联合国机构，研究了21个国际政府组织。

And we focused mostly on the UN agencies, on 21 international governmental organisations.

Speaker 2

我们想了解它们的评估系统是如何设计的。

And what we did is that we wanted to know how their evaluation systems are designed.

Speaker 2

因此我们查阅了所有评估政策和文件，基本勾勒出这些组织在评估工作时的共性与差异，以及我们能观察到哪些系统性模式。

So we went through all of the evaluation policies and evaluation documents to really basically outline both the similarities and differences as to how these organizations evaluate the work, and what of systematic patterns can we observe there.

Speaker 2

后来我们提出，还想了解评估人员——也就是这些评估单位——如何看待他们的工作以及这些政治议题？

Later on, we said, well, we also want to know how do the evaluators, so these evaluation units, how do they see their work and these political issues?

Speaker 2

因此我们与样本中的每个评估单位都进行了交谈。

And so we spoke with every evaluation unit from our sample.

Speaker 2

我们进行了超过120次访谈，对象包括评估人员、国际组织的管理人员、成员国代表，他们还走访了这些组织的总部。

So we, you know, conducted over 120 interviews with evaluators, but also with the management people from these international organizations, with member state representatives, and they traveled to the headquarters of these organizations.

Speaker 2

我们深入探究了联合国评估机制的运作，这实际上是一个非常庞大的体系。

So we really went down the rabbit hole of UN evaluation machinery, which is actually a very big mechanism.

Speaker 2

这是个巨大的、蓬勃发展的行业，对吧？

It's a huge it's also a booming business, right?

Speaker 2

评估这个话题我们可以稍后再讨论。

Evaluation, we can speak about that later.

Speaker 2

不过我们还是先做了案头研究，分析这些评估系统的设计方式。

But nevertheless, so, you know, what we did is that we conducted desk review of how these evaluation systems are designed.

Speaker 2

随后通过访谈获取定性见解，了解这些评估单位的运作方式以及评估结果的使用情况。

Then we conducted interviews to gain qualitative insights as to how these evaluation units work, and also how the evaluations are used.

Speaker 2

除此之外，我们还希望研究评估报告本身。

But in addition to that, we also wanted to look into the evaluation reports themselves.

Speaker 2

我们在书中及后续研究中都做了这项工作——审阅评估报告。

And so what we did in the book and what we continue to do beyond the book's research, we looked into the evaluation reports.

Speaker 2

就本书而言，我们人工编码了样本中精选国际组织的240份评估报告的执行摘要。

When it comes to the book, we manually coded the executive summaries of two forty evaluation reports from selected international organisations from our sample.

Speaker 2

我们想看看是否能发现这些报告写作中存在的系统性偏见或模式。

And we wanted to see, you know, whether we can find any systematic, well, let's say, biases or patterns in how these reports are written.

Speaker 2

因此，我们是基于16,000条手工编码的句子来完成这项工作的。

And so we did it on the basis of 16,000 hand coded sentences.

Speaker 2

也就是说，这是实证基础、案头审查、研究访谈以及对评估报告的定性文本分析。

So that is, you know, the empirical basis, the desk review, the research interviews, and the qualitative text analysis of the evaluation reports.

Speaker 1

要回答这些重要而复杂的问题，工作量确实非常庞大。

That is a massive amount of work to answer these important and complicated questions.

Speaker 1

感谢你们总结了你们两人所做的诸多不同工作。

So thank you for summarising the many different things the both of you did.

Speaker 1

我想接着谈谈你在回答中提到的关于评估规模及其范围的问题。

I'd like to pick up on something you mentioned in that answer about kind of the size of evaluation and the scale of it that you're looking into.

Speaker 1

我们能稍微讨论一下这个问题吗？

Could we talk a bit about that?

Speaker 1

我们谈论的规模有多大？

How big a deal are we talking?

Speaker 1

有多少人参与其中？

How many people are involved?

Speaker 1

涉及多少资金？

How much money?

Speaker 1

整个过程需要花费多少时间？

How much time is all of this taking?

Speaker 2

当然。

Sure.

Speaker 2

确实，评估工作在过去的几十年里发展得相当惊人。

Indeed, evaluation has grown quite spectacularly in the last decades.

Speaker 2

当然，评估本身，我的意思是，这并不是新鲜事，对吧？

Of course, evaluation as such, I mean, it's not new, right?

Speaker 2

在我们的生活中，我们每天都在评估一切，你知道，在日常情境中，我们试图评估并从错误中学习。

In our lives, we evaluate everything on a daily basis, you know, in our mundane situations, we sort of try to assess and learn from our mistakes.

Speaker 2

政府也是如此，你知道，这已经是存在相当长时间的事情了。

And so governments as well, you know, that is something which exists already for quite a long time.

Speaker 2

如果我们回顾评估的历史，评估的起源，早在十九世纪，各国政府就已经使用简单的统计报告来评估他们的决策。

If we look at the history of evaluation, at the emergence of evaluation, already by the nineteenth century, national governments were using simple statistical reports to assess their decision making.

Speaker 2

不过，好吧，应该说不是'不过'，而是实际上很幸运，对吧，他们开发了越来越复杂的工具来为决策提供信息。

However, well, let's say not however, but actually fortunately, right, they developed more and more sophisticated tools to do so, to inform their decision making.

Speaker 2

因此到了二十世纪末，评估已成为国家治理程序中不可或缺的一部分。

And so by the end of the twentieth century, evaluation become really very integral to national state governance procedures.

Speaker 2

现在说到国际层面，国际组织，我们可以说联合国乃至整个国际层面的评估发展大致经历了三个阶段。

Now, when it comes to the international level, right, the international organizations, here we could say that there are probably three main phases of how evaluation in the United Nations, but also in general at the international level, developed.

Speaker 2

第一阶段规模很小，大约从70年代初持续到90年代末，当时世界银行、联合国开发计划署等先驱国际组织首次在机构内部设立了评估部门。

And the first phase, which was very small in scale, right, goes back to the early 70s until approximately late 90s, where first, we say, pioneer international organizations such as the World Bank, UNDP, developed their own evaluation units within the organizations.

Speaker 2

这在当时相当新颖，远不如现在这么普遍。

That was quite new, something which wasn't really so popular as it is right now.

Speaker 2

特别是世界银行，被视为这波评估浪潮和新公共管理议程的关键推动者，对吧？

And the World Bank especially was seen sort of as a key driver behind this wave of evaluation, this new public management agenda, right?

Speaker 2

即便在今天，他们依然非常具有前瞻性和现代性，推动着评估文化的发展。

They are still very, very, well, let's say, ambitious and modern today, driving evaluation culture.

Speaker 2

但这最初只是由少数组织推动的，直到二十一世纪头十年的第二阶段，大多数国际组织才陆续建立评估部门，其他机构加入这些先驱者行列，使评估职能得到了显著巩固。

But it was really just driven by several organizations, and only later in the second phase, which was the first decade of the twenty first century, marked then the development of the majority evaluation units in the international organisations, when other organisations joined these pioneers and helped to consolidate the evaluation function quite significantly.

Speaker 2

美联储阶段指的是那些在过去十年间加入的其他组织，它们被称为后来者，目前仍在发展其评估部门。

The Fed phase is the so called latecomers organisations which joined the others in the last decade and are still developing their evaluation units.

Speaker 2

但总体而言，如今评估和国际组织已经蓬勃发展起来了。

But overall today, evaluation and international organizations, you know, has flourished.

Speaker 2

因此有数千名顾问与联合国评估员合作，每年产出数百份评估报告。

So thousands of consultants work with UN evaluators, and they produce hundreds of evaluation reports annually.

Speaker 2

我们试图估算具体数量。

We try to estimate and calculate how many.

Speaker 2

我们得出一个近似数字：联合国各机构每年产出约750份评估报告。

We have an approximate number of seven fifty evaluation reports produced by the UN agencies every year.

Speaker 2

现在，如果记得这些报告的篇幅，你大概能粗略理解这些报告包含多少信息量，以及相关成本有多高，对吧？

Now, if remember how long these reports are, you can probably approximately understand how much information is in these reports, and also what kind of costs are related to that, right?

Speaker 2

我的意思是，由于各组织差异及其预算计算方式不同，精确成本很难计算。

I mean, it's quite difficult to calculate the exact cost because of the different organizations and how they calculate evaluation budgets.

Speaker 2

但我们大致估算，整个联合国系统的评估支出约为4.3亿美元——这相当于教科文组织或国际劳工组织这类机构的全年预算总额，对吧？

But approximately, we assume that overall, we have an estimated evaluation spending of around $430,000,000 US dollars, which is about the entire annual budget of such organizations like UNESCO or the ILO, when it comes to the overall UN system, right?

Speaker 2

因此有数百万美元投入评估工作，这很正常，毕竟评估承担着重要角色且潜力巨大。

So millions of dollars are being put into evaluation, and that is fine, right, because of the very important role that evaluation has to play and also the great potential.

Speaker 2

不过我们也要谨慎，或者说应该承认整个评估活动存在的政治性——这也正是我们这本书要探讨的课题。

However, we should be careful, or let's say, should acknowledge also the political side of this whole exercise, which is sort of the task of our book.

Speaker 2

还有个有趣的数据：你知道一份普通评估报告的平均成本是多少吗？

And maybe another interesting fact, you know, how much does an average report cost for an evaluation?

Speaker 2

根据联合国内部监督事务厅数据，每份评估报告的平均成本约为50万美元，这包括23个联合国机构的监测与评估管理费用和资源开销。

So according to the UN's Office for Internal Oversight Service, an average cost for an evaluation report is about half a million US dollar, which includes, you know, the overheads and overhead resources for monitoring and evaluation across 23 UN agencies.

Speaker 2

所以这既多又贵，对吧？

So that's quite a lot and quite expensive, right?

Speaker 2

相当昂贵的操作。

Quite an expensive exercise.

Speaker 2

我建议大家都看看这些评估报告。

And I recommend everyone to look into these evaluation reports.

Speaker 2

我的意思是，这当然是个技术性很强的任务，但也可能相当有趣。

I mean, that is, of course, a very technical task, but it can be quite interesting.

Speaker 2

如果你想了解粮农组织在格鲁吉亚的表现，或者联合国开发计划署在性别政策方面的成效，这类报告能帮助你理解其绩效水平。

If you want to know, let's say, how is FAO doing in Georgia, or how has UNDP been doing when it comes to its gender policies, these are kind of reports which you need to look at to understand the level of performance here.

Speaker 2

考虑到它们如此昂贵，我认为我们该多关注这些严格来说非技术官僚的文件。

Given how expensive they are, I think we should pay a bit more attention to these arguably not technocratic documents.

Speaker 1

感谢您带我们了解这项工作的规模及其发展历程。

Thank you for taking us through the scale of this and also the evolution of kind of how we got to this point.

Speaker 1

因为我觉得，如果之前讨论的问题还不够重要，那么这项工作的规模就充分说明了调查这些报告意义重大——不仅要看最终成果，还要关注其制作过程，毕竟涉及这么多人、这么长的篇幅、这么广的覆盖面。

Because I think if the questions you were discussing, asking earlier in our conversation kind of weren't important enough, that the scale of this really makes the point clear of just how important it is to investigate what these reports are doing, not just the kind of end result, but also the process of creating these reports, given how many people are involved, how long the reports are, how much is covered in them.

Speaker 1

既然我们已了解被提出的问题、你们为分析这些付出的巨大努力、任务的规模及领域范围，能否带我们看看你们发现评估可能政治化的具体表现？并帮我们理解其重要性？

So now that we understand the questions that are being asked, the massive amount of work the two of you went into to analyse all of this, the scale of the task and the kind of area, can you walk us through the ways in which you've discovered in your findings that evaluation can be political and help us understand why it matters?

Speaker 2

好的，当然。

Yes, sure.

Speaker 2

我会尽量简洁系统地说明。

So I'll try to do that, of course, as briefly as possible and in a systematic manner.

Speaker 2

我们首先观察到一个有趣现象：联合国各机构在设计评估体系时存在显著差异。

And first interesting finding that we observed is that UN agencies differ significantly as to how do they design their evaluation systems.

Speaker 2

这些差异随后会对评估的政治性产生影响。

And these differences matter later on for the politics of evaluation.

Speaker 2

根据重要利益相关方的参与程度，大致可分为三类组织如何构建其评估体系。

There are three clusters as to how organisations organise, let's say, their evaluation systems, according to the involvement or with regards to the involvement of important stakeholders.

Speaker 2

因此需要明白，在国际政府组织中，这些评估部门并非完全独立存在，而是依赖于关键利益相关方，对吧？

So you need to understand that in international governmental organizations, these evaluation units, they do not exist, they are not completely independent, but they actually depend on key stakeholders, right?

Speaker 2

这些关键方要么是成员国——即坐在国际组织执行董事会和理事机构中的大使们，要么就是管理层，对吧？

And these are either the member states, ambassadors sitting on the governing bodies of executive boards and the governing bodies of the international organizations, or the management, right?

Speaker 2

秘书处，那些管理人员。

The secretariat, the management people.

Speaker 2

我们注意到在某些组织中，实际上是成员国决定这些评估部门需要评估哪些议程。

And we observe that in some organizations, it is actually the member states which decide upon the agenda of these evaluation units what has to be evaluated.

Speaker 2

成员国还负责批准评估部门的负责人。

It is the member states which approve the head of evaluation unit.

Speaker 2

评估资源的分配同样由成员国决定。

And it is also the member states which decide upon the resources of evaluations.

Speaker 2

例如世界银行、国际货币基金组织或联合国开发计划署就是这种情况。

So that is the case, for instance, at the World Bank or the IMF or UNDP.

Speaker 2

然而在其他组织中，情况则完全不同。

However, in other organizations, is completely different.

Speaker 2

是由行政部门、管理层来做出关于评估系统核心资源的决策。

It is the administration, the management, which makes these decisions regarding the key resources of evaluation system.

Speaker 2

而第三类组织则采用混合控制模式。

And then in some others, in the third cluster, it is the mixed control.

Speaker 2

那么，举个管理主导或官僚主导的评估系统设计的例子。

So just to give an example of the management dominated or the bureaucracy dominated evaluation system designs.

Speaker 2

国际移民组织(IOM)就是这种情况，联合国难民署(UNHCR)和联合国妇女署(UNWomen)也是如此。

That is the case at the IOM, the International Organization for Migration, as well as at UNHCR or UNWomen.

Speaker 2

然后在我提到的第三类组织中，存在混合控制系统——成员国和IO行政部门都参与批准评估单位、其资源负责人和议程，这类混合控制组织包括粮农组织(FAO)、教科文组织(UNESCO)或联合国环境规划署(UNEP)等机构。

And then in this third cluster, which I mentioned, which has the mixed control systems, where both member states and the IO administration are involved in approving evaluation unit, its resources head and agenda, this mixed control cluster is to be found in such organizations like FAO, UNESCO, or UNEP, UN Environmental Programme.

Speaker 2

因此我们观察到在谁控制关键评估系统资源方面存在这些差异。

So we observed these differences as to who controls key evaluation system resources.

Speaker 2

我们首先觉得这有点令人惊讶，毕竟这是同一个联合国系统，而且存在所谓的联合国评估小组(UNEG)试图协调各联合国组织的评估流程，然而我们却看到如此大的差异。

And we thought, well, first of all, that is a bit surprising, given that it is the same UN system, and also given that there is the so called UN Evaluation Group, which seeks to harmonize evaluation processes across the UN organizations, and yet we have these large differences.

Speaker 2

不过我们也认为，这可能取决于这些评估单位倾向于面向谁，或者换句话说，这些评估单位将谁视为关键用户、关键保护者或赞助者。

However, we also said, well, probably it matters as to who these evaluation units tend to orientate to, or differently put, who these evaluation units see as the key users, key protectors or sponsors.

Speaker 2

事实上，当我们与评估单位工作人员交谈时发现，在那些由成员国决定评估资源关键事项的组织中，评估者往往将他们视为评估的主要用户、遇到问题时的关键赞助者和保护者，他们倾向于求助成员国而非管理层。

And indeed, when we spoke with the evaluation unit staff, we observed that in those organizations where the member states make these key decisions regarding the evaluation resources, the evaluators tend to see them as key users of evaluation, as key sponsors, protectors in case of issues, they tend to go to the member states and not the management.

Speaker 2

这或许很有道理，对吧？

And that probably makes sense, right?

Speaker 2

这与著名的委托-代理理论完全一致——在多委托方情境下，代理方会倾向于对其实施最大控制权的委托方，对吧？

This goes in line with the very famous principal agent theory, where the agent orientates to that principle in the multiple principle setting, which has the most power on the agent, right?

Speaker 2

国际组织及其评估系统也存在同样的情况。

So that is the same in international organizations and when it comes to the evaluation systems.

Speaker 2

在那些由AIO管理层控制评估资源的组织中，

In those organizations, the AIO management controls these evaluation resources.

Speaker 2

评估者往往更侧重于管理层。

Evaliators tend to focus much more on the management.

Speaker 2

在混合案例集群中，我们并未看到对某一方利益相关者的明确倾向。

And in a mixed case cluster, we do not see this clear orientation to one or the other stakeholders.

Speaker 2

似乎两者都至关重要。

It is both which seem to matter.

Speaker 2

而这正是基于我们的定性访谈得出的结论。

And that is based on our qualitative interviews.

Speaker 2

因此我们目前的发现是：评估体系在制度设计上存在差异。

So what we have now is that evaluation systems differ in their institutional design.

Speaker 2

评估者实际上会对此作出反应，导致他们认为某些利益相关者比其他方更重要。

Evaliators actually respond to that so that they perceive different stakeholders as more important than others.

Speaker 2

有趣的是，在后续访谈管理层和成员国时我们还发现，这些差异还会影响谁主要将评估工具——不仅是功能性地，更重要的是政治性地——加以利用，对吧？

And interestingly, what we also find in the next step when we interviewed the management and the member states, these differences also affect as to who uses primarily evaluation as such, and not just functionally, but also politically, more importantly, right?

Speaker 2

我们发现，在那些由成员国（如前所述）掌控资源的组织里，主要将评估用于自身政治利益的也正是这些成员国。

So we found that in those organizations where the member states, let's say, define the resources, as I mentioned, it is also the member states who primarily use evaluation for their own political interests.

Speaker 2

我举个例子说明。

Let me give you an example.

Speaker 2

这些强国往往会选择性采用评估结果，将其作为国际谈判筹码，或是用来限制管理层的官僚影响力。

These powerful states tend to cherry pick evaluation results and then use them as a bargaining chip in international negotiations, or they tend to use these evaluations to contain the management's bureaucratic influence.

Speaker 2

而在国际组织管理层控制评估资源的案例中，我们观察到这些行政机构会战略性地运用评估——例如增强对成员国的官僚影响力，或实施内部管控。

Now, in other cases where the IO management controls the evaluation resources, we observed that these administrations, these managements tend to also use these evaluations strategically, for instance, to increase their bureaucratic influence vis a vis the member states, or let's say, to exercise control internally.

Speaker 2

最后在评估报告环节，根据制度设计的不同，我们也发现了显著差异。

And final step, when it comes to the evaluation reports, we also see significant variation here as well, based on the institutional design.

Speaker 2

这就引出了下个重点：关于评估报告是否包含政治偏见的问题，这是通过对报告文本分析得出的结论。

And that is when I come to the next point, you know, about the evaluation reports and whether they contain political biases, and that is done based on the text analysis of the reports.

Speaker 2

在这里，我们还观察到利益相关方会影响评估报告的内容。

And here, we also observed that stakeholders can affect the content of evaluation reports.

Speaker 2

例如，我们发现评估建议的撰写方式存在系统性差异，对吧？

For instance, we observed that, or we found systematic differences as to how evaluation recommendations are written, right?

Speaker 2

每份报告都有建议部分，这是评估报告中非常重要的组成部分，因为当你阅读报告时，你希望知道基于评估结果下一步应该采取什么行动，对吗？

So every report has recommendations, and that is a very significant part of evaluation report, because when you read a report, right, you want to know, so what should the next steps be based on your evaluation findings, right?

Speaker 2

我们对这些建议进行了编码分析，发现在联合国机构中，当成员国控制评估单位的系统资源时（如我提到的员工任命、预算和议程），评估建议的措辞会更加具体明确。

So we coded these recommendations and we observed that in UN organizations, where member states control evaluation unit system resources, such as, as I mentioned, staff appointments, budget and agenda, evaluation recommendations are more specific in the language, right?

Speaker 2

这些建议在是否或如何实施方面留给解读的空间更小。

They leave less room for interpretation as to whether or how the recommendations should be implemented.

Speaker 2

你可以想象，当我说'考虑关闭国家办事处'和直接建议'关闭国家办事处'时存在巨大差异，对吧？

And you can imagine that there is a big difference whether I say, consider closing the country office, or if I say in a recommendation, close the country office, right?

Speaker 2

这类细微差别是可以观察到的。

So these kind of small differences can be observed.

Speaker 2

此外，这类建议很少会暗示需要增加资源或减少监督，这符合成员国的典型利益——我们假设其政治利益在于优化官僚资源而非扩张官僚机制，并希望保持对国际组织管理的控制。

And also, such recommendations rarely imply the need for more resources and oversight reduction, which goes in line with typical member state interests, as we would assume political interests, when member states usually want to optimize the bureaucratic resources, right, not really expand the bureaucratic mechanism, and they want to maintain control over the IO management.

Speaker 2

而在国际组织管理层控制评估单位资源的机构中，我们发现建议采用更宽泛模糊的措辞，更多使用'考虑'或'可以'等非强制性语气。

And so also, in organisations where the IO management controls these evaluation unit resources, the recommendations we found to take a much broader tone, more ambiguous tone, much more language such as consider, or you could, you would, you know, not such imperative tone.

Speaker 2

这类建议也更倾向于主张为组织增加资源并减少监督。

These recommendations were also found to be more likely to advocate for increased resources for the organisation and reduced oversight.

Speaker 2

这些就是评估政治学中的一些关键例证。

So that is, you know, sort of some of the key examples when it comes to the politics of evaluation.

Speaker 2

实际上，在后续深入研究中，我们还对报告内容进行了更细致的分析。

And actually, extended research, we even went into the reports into more details.

Speaker 2

我还可以举例说明我们发现的其他政治偏见类型。

And I can also give you examples what kind of other political biases we found.

Speaker 2

但说到这本书，你知道的，制度设计差异、利益相关方通过这种方式施加影响，对吧？

But when it comes to the book, you know, institutional design differences, stakeholder influence through that, right?

Speaker 2

评估者倾向于更占主导地位、更有权势的利益相关方，然后还会根据谁掌控这些资源来决定评估方式的不同运用。

The orientation of the evaluators to the more dominant, more powerful stakeholder, and then also the different use of evaluation based on who controls these resources.

Speaker 2

最后是我们展示的一些偏见，特别是在建议和报告本身方面。

And finally, some of the biases which we demonstrate, especially when it comes to the recommendations and the reports themselves.

Speaker 1

所以你刚才说的这些，让人对那些并非总是但有时被宣称的'这些评估是独立的'、'由独立评估单位或独立评估人员完成'的说法产生了很大怀疑。

So everything you've just told us kind of puts a lot of skepticism on the claims that are not always made, but sometimes made that these evaluations are independent, that these are done by independent evaluation units or independent evaluators.

Speaker 1

根据你的讲述，当然还有更广泛的研究，这些评估或评估单位中有任何一个是真正独立的吗？

Given what you've told us, and of course, the research more broadly, are any of these evaluations or evaluation units independent?

Speaker 2

嗯，这是个非常重要的问题，因为我们想传达的信息并不是要放弃评估或将其与利益相关方完全隔离。

Well, is a very significant question, because the message which we want to send is not that we should abandon evaluation or that we should isolate it completely from stakeholders.

Speaker 2

相反，我们认为解决评估政治化的关键在于首先要认识到评估单位周围存在的影响力，对吧？

Instead, we say that the solution to evaluation politics lies first of all in recognizing the influence surrounding evaluation units, right?

Speaker 2

要认识到成员国和管理层的政治利益，然后据此设计评估体系。

Recognizing the political interests of member states, of the management, and then designing evaluation systems accordingly.

Speaker 2

我们还有具体的建议，我们认为这些建议具有可操作性，能减少对评估人员的政治影响。

And we also have specific recommendations, which we think would be plausible to implement, to reduce the political influence upon the evaluators.

Speaker 2

但同时我们也认为，评估单位确实无法完全独立，因为它们是组织的一部分。

But at the same time, we argue that indeed, evaluation units are not well, they cannot be completely independent because they are part of the organization.

Speaker 2

如果你是组织的一部分，就必然依赖资源、人员审批、议程批准，这很正常。

And if you are part of an organization, you are dependent on resources, on staff approval, on the approval of agenda, and that is normal.

Speaker 2

根本无处可逃，对吧？

There are no ways to escape, right?

Speaker 2

即便你雇佣外部顾问，这些顾问仍有一定动机继续与你合作或争取后续合同。

Even if you, let's say, hire external consultants, these external consultants still have certain level of incentives to continue working with you or to get another contract later on.

Speaker 2

因此他们自身也在某种程度上依赖于你提供的数据质量。

And so they themselves are also, to a certain degree, dependent on the data quality that you provide.

Speaker 2

他们可能也想继续与你合作。

They may also want to continue working with you.

Speaker 2

所以这是个相当复杂的问题，不是吗？

And so it is quite complicated issue, right?

Speaker 2

你可以建立非常独立的评估流程——在数据采集、技术问题、数据分析、数据解读和数据可视化方面。

You can build evaluation processes, which are very independent when it comes to data collection, you know, technical issues, data analysis, data interpretation, data visualization.

Speaker 2

但当涉及报告撰写，尤其是这些报告后续如何被使用时，政治因素就不可避免。

But when it comes to the report writing, and especially when it comes to how these reports are then later on used, you cannot escape politics.

Speaker 2

我们必须承认这一点。

And we should acknowledge that.

Speaker 2

但具体回答你关于评估部门的问题，我们发现相比其他方案，评估部门可以说是良好的守门人或保障机制。

But to answer your question regarding specifically evaluation units, we actually find evaluation units to be, well, let's say, good gatekeepers or let's say safeguards as compared to other alternatives.

Speaker 2

另一种方案是建立分散的评估部门，即每个国家办事处、每个项目所在地的办公室自行决定雇佣外部顾问开展评估，因为他们更了解当地情况和工作。

So one alternative would be to say, well, we should create decentralized evaluation units, meaning that every country office, every programme or office in a city where the project is being done has to decide upon external consultants and then conduct evaluations, because they know better the landscape, they know better the work there.

Speaker 2

但我们发现这个方案很有争议——在《公共行政评论》期刊的文章中，我们证明由国际组织驻地办事处管理的评估报告，相比联合国中央评估部门管理的报告，系统性更为乐观。

However, we find this alternative very controversial, and that is because in another article, which is published in the Journal of Public Administration Review, we demonstrate that such evaluation reports, are managed by operative IO units, let's say country offices, they are systematically more positive as compared to evaluations managed by the UN evaluation units, by the centralized evaluation units, right?

Speaker 2

这些差异非常显著。

And these differences are significant.

Speaker 2

对于建立联合国范围内的统一评估机构这一构想，我们持相当怀疑的态度，你知道的，就是那种适用于所有组织的单一评估体系。

And we are also quite sceptical when it comes to the idea of creating a UN wide evaluation unit, you know, one for all of these organizations.

Speaker 2

这是因为联合国各机构之间仍存在显著差异，这一点你也清楚。

And that is because there are still significant differences when it comes to, you know, UN agencies.

Speaker 2

因此，尽管联合国现有的集中化评估机构并未完全脱离政治影响，但相比分散评估或为所有联合国机构设立单一集中评估部门，仍有更优选择。

And so although UN well, these centralized evaluation units are not completely isolated from the politics, there are still better options to have than decentralized evaluations or just one single centralized evaluation unit for all UN agencies.

Speaker 1

感谢你详细剖析其中的微妙之处，因为这确实不是简单用'是'或'否'就能回答的问题。

Thank you for walking through kind of the nuance of that, because it isn't as simple as sort of a yes or no.

Speaker 1

但理解这里存在的不同选项也很重要。

But it is also worth understanding kind of what the different options are here.

Speaker 1

既然你提到了书中提出的实际影响和建议，我们能否转向研究的这个方面，请你详细说明你认为应该采取哪些措施？

I suppose because you've mentioned the practical implications and recommendations from the book, could we maybe turn to that aspect of the research and walk us through what you think should be done?

Speaker 2

当然可以。

Sure.

Speaker 2

正如你所说，本书的核心观点之一在于：尽管我们着重强调了评估过程中涉及的政治因素，但我们仍然认为评估本身极具价值，对吧？

So, as you said, one of the key messages of the book is to say that, look, although we highlight a lot the politics involved in the evaluation process, we still think that a valuation is very useful, right?

Speaker 2

在书中，我们确实更侧重于政治视角而非功能性视角——就像我们对话开始时我描述的那样。

In the book, we do focus more on politics rather than this functional perspective, which I described in the beginning of our conversation, right?

Speaker 2

可以说，我们并未过多分析硬币的另一面，即评估在多大程度上被用于经验学习或路线修正。

We do not analyze the other side of the coin, so to say, to the extent to which evaluation is used for learning, or course corrections.

Speaker 2

这种情况其实经常发生。

And that happens a lot.

Speaker 2

相反，在书中我们真正想剖析的是这个被忽视的侧面——政治性的一面。

Instead, you know, in the book, we actually wanted to analyze this neglected side of the coin, which is the political one.

Speaker 2

因此，这个观点不应仅限于政治层面。

And so the idea shouldn't be that it's only politics.

Speaker 2

更准确地说，这个观点是要指出评估中也涉及政治因素。

The idea is rather to say it's also politics involved in evaluation.

Speaker 2

正如我所说，解决方案实际上是认清这些政治因素，并据此设计评估体系。

And so, as I said, the solution is actually to recognize these politics and then design evaluation systems accordingly.

Speaker 2

因此，我们的建议如下。

And so our recommendations are the following.

Speaker 2

首先，我们建议转向资源混合控制的评估体系，即由成员国和行政部门共同批准评估人员或评估单位负责人、其议程和预算。

First of all, we suggest moving towards evaluation systems with mixed control of resources, right, where both member states and the administration are involved in approving evaluation staff or head of evaluation unit, its agenda and budget.

Speaker 2

为什么这样做是合理的？

And why does that make sense?

Speaker 2

我们认为，通过相互协商和共同批准机制，这类评估体系能创造更高的诚信度和更可信的环境。

Well, we argue that by consulting with each other, and by approving the mechanism collaboratively, such systems, such evaluation systems, create more integrity, a more trustworth environment.

Speaker 2

这还能让双方利益相关者都参与评估使用，从而提升评估在组织中的相关性，而非仅成为某一强势利益方的工具。

It also embeds both stakeholders in the use of evaluation, so it increases the relevance of evaluation in the organization, instead of making it a tool merely for the use of one powerful stakeholder.

Speaker 2

同时也能提高客观性，实现对组织绩效的全面评估。

It also, you know, increases the objectivity and enables comprehensive assessment of the organization's performance.

Speaker 2

这是第一点。

So that is one.

Speaker 2

第二，根据我们对评估报告建议部分的观察，我们认为可能需要制定更好的指导方针来规范评估建议的表述，确保其基于调查结果。

Second, based on what we observed when it comes to recommendations and the evaluation reports, we argue that maybe it's a good idea to work on better guidance when it comes to the formulation of these evaluations, evaluation recommendations, right, to make sure that they are based on the findings.

Speaker 2

由于我们观察到不同组织在建议措辞上存在显著差异，似乎目前对建议的严格程度、应使用的语气等缺乏明确标准，因此制定指南、手册或开展建议撰写培训都能提升客观性。

And since we observed such different language when it comes to recommendations across organizations, it seems that it's not really quite clear how strict they should be, what kind of imperatives should be used, and therefore a guideline, a manual, or just trainings when it comes to crafting recommendations can increase objectivity.

Speaker 2

最后，关于这些构想——实务工作者中有人提议为整个联合国建立一套全系统评估机制——我们认为这种做法将难以兼顾各计划、基金、机构和组织的独特性，同时还要维持利益相关方的参与度。

And finally, when it comes to these ideas, and there are some ideas among the practitioners to create a system wide evaluation mechanism for the whole UN, we think that such an approach would struggle to consider the unique characteristics of each program, fund, agency, organization, right, maintain stakeholder involvement.

Speaker 2

因此我们认为，联合国各机构应设立独立的评估部门。

And therefore, we think that in the United Nations, the different organisations should have their own standalone evaluation units.

Speaker 2

这些评估部门应从内部监督服务等其它部门中独立出来。

They should move them out of other departments, such as the internal oversight services.

Speaker 2

以OCE为例，对吧？

Let's take an example, the OCE, right?

Speaker 2

还有安全组织。

Also the security organisation.

Speaker 2

它们的评估单位设在内部监督服务或内部监督部门内，对吧？

They have an evaluation unit inside an internal oversight services or internal oversight department, right?

Speaker 2

具体名称我不确定，但基本架构是：存在一个内部监督部门，设有部门主管，对吧？

I'm not sure exactly how it's called, but basically the idea is that you have a department for internal oversight, a head of department, right?

Speaker 2

然后该部门内部又设有一个评估单位，另配评估单位主管。

And then inside you have an evaluation unit with another head of evaluation unit.

Speaker 2

但根据我们对其他组织的观察，通常这类更广泛的内部监督服务运作逻辑与评估逻辑不同——因为评估逻辑更侧重学习改进，而非财务审计和问责，对吧？

However, usually, as we observed in other organisations, the logic upon which these broader internal oversight services work is different from the logic of evaluation, because the logic of evaluation is much more about learning, is less about financial auditing and accountability, right?

Speaker 2

因此两者可能存在冲突，最佳方案是各组织设立独立集中的评估部门，直接向成员国和本组织首长（秘书长或总干事）汇报。

And so there might be conflicts between the two, and it is best that each organization has its standalone independent evaluation unit, which is centralized and which reports directly to both member states and the head of the organization, the secretary general or the director general.

Speaker 1

这些都是非常清晰可行的建议。

So those are some very clear practical recommendations.

Speaker 1

不知您能否谈谈这项研究在理论假设层面应带来或已带来的改变？特别是对我们这些研究国际组织运作机制的人而言。

I wonder if you can also talk about what you think this research should or does perhaps change more on a kind of theoretical assumptions level, especially for those of us researching how international organisations work?

Speaker 2

是的，当然。

Yes, sure.

Speaker 2

所以我认为，我们展示的一点与官僚主义在国际组织中的影响力有关。

So I think, you know, one thing which we demonstrate refers or relates to the bureaucratic influence in international organizations.

Speaker 2

尽管我们现在处于这样一个阶段，我认为大多数学者都同意秘书处——我们称之为国际公共行政机构（IPAs）——本身就是具有独立影响力的行为体。

And although we are now at this stage where I think most scholars agree that the secretariats, the international public administration, as we call them, IPAs, right, International Public Administration, are actors in their own right.

Speaker 2

最初，现实主义者认为在国际政治中只有成员国才发挥作用。

Initially, realists said that it's only member states who play a role, right, when it comes to international politics.

Speaker 2

现在越来越多人探讨的问题不再是国际秘书处是否重要，而是它们如何产生影响。

Now more and more go into the question not whether international secretariats matter, but how do they matter?

Speaker 2

我认为我们的书展示了像评估这样高度技术性的议题如何能被这些官僚机构用作施加影响的工具。

I think our book demonstrates how very technocratic issues such as evaluation can be used as tools for influence by these bureaucracies.

Speaker 2

对吧？

Right?

Speaker 2

我们确实证明，如果他们完全掌控制度设计，就能利用这些工具对成员国施加影响。

We do show that if they control the institutional design completely, they can use these tools to exert their influence vis a vis the member states.

Speaker 2

所以这，我认为，是第一点。

So that is, I think, one, right?

Speaker 2

这体现了国际公共行政机构（IPAs）在国际组织中官僚影响力的重要性或亮点。

The importance or the highlight of the bureaucratic influence in international organizations when it comes to the international public administrations, the IPAs.

Speaker 2

第二点可能与委托-代理理论相关。

Second probably relates to the principal agent theory.

Speaker 2

我认为委托-代理理论是分析不同层级结构最著名的核心工具之一。

And I think, you know, the principal agent theory is one of key tools, one of the most famous tools, right, to analyse different hierarchical settings.

Speaker 2

在这种情况下，我们也有一个层级化的设置。

And in this case, we also have a hierarchical setting.

Speaker 2

我们还有多个原则设置的优秀案例。

We also have very nice examples for multiple principle setting.

Speaker 2

我们还观察到代理方如何在多重原则间周旋——代理方作为评估单元，而多重原则则体现为国际组织的管理架构：一方面是官僚体系，另一方面是成员国。

We also and we do see how the agent has to navigate between multiple principles, the agent being the evaluation unit, and then the being the multiple principles being the management of the international organization, the bureaucracy on the one hand, the member states on the other hand.

Speaker 2

再次强调，即使在评估这样看似中立、技术官僚化的功能维度或关系中，这种委托-代理行为依然存在，并带来其特有的紧张关系，这点我们都清楚，对吧？

And again, even in such arguably neutral, technocratic, functional dimensions or relationships like evaluation, these principal agent behaviors exist and they bring their own tensions, which we all know, right?

Speaker 2

潜在的代理懈怠问题，以及代理方面临的困境：我究竟该遵循哪项原则？虽然理论上应该保持独立，但实际上却因身处委托-代理关系中而无法真正独立。

The potential agency slack and also this dilemma for the agent, to which principle should I orientate myself, considering that I actually should be independent, but I cannot be independent because I am in a principal agent relationship.

Speaker 2

因此我认为，运用委托-代理理论视角来分析国际组织中的这类关系，无论它们看起来多么技术官僚或中立，都有助于我们更好地理解国际组织的运作机制。

And so I think using this principal agent theory lenses to analyze such relationships in international organizations, regardless of how technocratic or arguably neutral they are, help us better understand the functioning of international organisations.

Speaker 1

我认为这本书能催生很多重要内容，而且很可能会推动相关领域的发展。

I think there's a lot of things that very much can and probably will develop from the book.

Speaker 1

从你部分回答来看，你和斯蒂芬似乎已经在这个课题上有了更深入的探索。

And it sounds like from some of your answers, you and Stefan are already going further on this topic.

展开剩余字幕（还有 56 条）

Speaker 1

那么你愿意谈谈两位在本书基础上正在开展的进一步研究吗？

So would you like to tell us about the further research the two of you are doing beyond what's already in the book?

Speaker 2

当然。

Sure.

Speaker 2

我们认为研读这些评估报告本身非常重要。

You know, we thought that it is very important to look at these evaluation reports themselves.

Speaker 2

首先因为这些报告是现成的公开资料，对吧？

For one, because we have them, so they are publicly available, right?

Speaker 2

这对研究者来说非常便利，能找到如此良好的实证基础。

And that is very convenient for a researcher to find such a good empirical basis.

Speaker 2

越来越多的组织将这些资料公开。

More and more organisations make them publicly available.

Speaker 2

因此我们在数据库中收集了数千份此类报告，这可能是现存为数不多的此类数据集之一。

And so we collected thousands of such reports in our databank, in our dataset, which probably is, well, one of the few such data sets which exist.

Speaker 2

考虑到当前的技术创新，特别是在自然语言处理和语言模型方面，这些技术使我们能够分析海量数据。

And well, said, considering the technological innovations which we have currently, especially when it comes to the natural language processing and the language models, right, which allow us to analyse huge amounts of data.

Speaker 2

这正是运用这些新方法、通过评估报告分析政策的最佳时机，因为它们都是书面文本，对吧？

And that is a perfect point in time to employ these new methods and analyze these politics using these evaluation reports, because they are all written text, right?

Speaker 2

我们最近的工作（已发表在《国际组织评论》期刊上）是：首先收集了约1000份报告，来自九个具有不同制度设置的联合国系统组织。

So what we did recently, and it is already published in the Review of International Organizations Journal, is that we collected, initially for the first step, seven fifty no, I'm sorry, actually more, 1,000, about 1,000 reports from nine UN system organisations, which have different institutional settings, as described.

Speaker 2

然后我们通过人工标注这些报告中的句子——判断其对评估活动的评价是正面、负面还是中性——来训练语言模型。

And then we trained a language model by manually labeling these reports at the sentence level as to whether the sentence makes a positive or negative or neutral assessment of evaluated activity, right?

Speaker 2

要判断评估报告中的句子是否作出评价，如果是，那么对评估项目是正面、中性还是负面？

Whether a sentence in evaluation report is making a judgment, and if yes, is it positive neutral regarding the evaluated project, or is it a neutral sentence, right?

Speaker 2

我们手动标注了数千个这样的句子，输入模型训练后，现在能自动计算评估报告中句子层面的正面与负面评价比例。

And so we labeled that manually, thousands of such sentences, then put it into the model and trained the language model, so that now we can automatically calculate the extent or the ratio of positive versus negative assessments at a sentence level in an evaluation report.

Speaker 2

你可以想象，假设某份报告中70%的评价都是正面的。

And so you can imagine that we have a report where, let's say, 70% of all assessments are positive.

Speaker 2

这样我们就得到了一个评分，对吧？

And that gives us a score, right?

Speaker 2

某种绩效评分。

Sort of a performance score.

Speaker 2

如今，在会议和研究研讨会上，许多人问我们：仅仅统计句子数量真的能反映绩效表现吗？

Now, at conferences and research colloquiums, many ask us, well, is this mere counting of sentences actually tell us anything about the performance?

Speaker 2

对吧？

Right?

Speaker 2

这是个非常合理的问题。

And that is a very legitimate question.

Speaker 2

当然，我们需要验证这些绩效评分。

And we needed to validate, of course, these performance score.

Speaker 2

幸运的是，我们发现世界银行不仅撰写这些评估报告，其人类分析师专家还会为每份报告打分。

And so luckily, what we have is that the World Bank not only writes these evaluation reports, but the human analysts, experts, also give a score to each evaluation report.

Speaker 2

简单来说，他们会用0到6分来评定评估报告中描述活动表现的优劣程度。

So they tell from zero to six how good or bad, let's say, to put it simply, right, that the overall performance of that Evaluated activity, which is described in the Evaluation Report is.

Speaker 2

因此我们获得了另一种数字绩效评级，可以与基于语言模型的评分进行对比。

And so we had an alternative numeric performance rating, which we could compare to our score based on the language model.

Speaker 2

我们选取了600份世界银行报告，用专门训练的语言模型进行分析，发现人类评分与我们的计算评分在正负面评价比例上存在显著的正相关关系。

And so we took 600 of World Bank reports, we analyzed them using our language model, which we trained for this, and then we observed a very nice positive and strong correlation between the World Bank scores given by a human and our calculated scores when it comes to the ratio between positive and negative assessments.

Speaker 2

换句话说，人类对评估报告的评分越高，报告中积极评价句子的比例就越大。

Or to put it differently, the better a human scores the evaluation report, the higher is the ratio of positive assessment sentences in these reports.

Speaker 2

这意味着在数千份没有世界银行这类数字评分的联合国评估中，我们可以使用这个语言模型，根据评估文本计算出经过验证的类似绩效评分，这为多种用途创造了可能。

Meaning that in thousands of UN evaluations, which do not have such numeric ratings as has the World Bank, we can use this language model and calculate a similar right validated performance score based on the actual text in evaluations, which gives a chance for many, many purposes.

Speaker 2

举个实例：组织自身可以利用这个算法识别异常报告——那些根据我们语言模型显示内容极其负面或正面的报告，然后进行二次核查。

Just to give you one example, the organisations themselves can use this algorithm and identify outlier reports, which are, let's say, very negative or very positive in the content based on our language model, and then they can take another look.

Speaker 2

对吧？

Right?

Speaker 2

因此，这可以说是一种数据质量工具，用于观察现状，尤其是在他们缺乏这些评分的情况下。

So this is kind of a data quality tool to see what's happening there, especially given that they do not have these scores.

Speaker 2

对他们而言，要真正理解过去二十年里数千份报告中究竟包含什么内容，实在是困难重重。

It's so difficult for them to understand what do they actually have in those thousands of reports over, let's say, twenty years.

Speaker 2

目前缺乏全局视角。

There is no bird's eye perspective available.

Speaker 2

现在通过我们的工具、数据集和算法（所有这些都已公开发表在这篇文章中，可在线获取），这成为了可能。

Now with our tool and data set and the algorithm, which everything is published and available online in this article, it becomes possible.

Speaker 2

对吧？

Right?

Speaker 2

这就是我们当前的工作重点，我们也尝试使用它并向学者们展示潜在的应用场景。

So this is, you know, where we are working on and we also try to use it and then show the scholars potential applications.

Speaker 2

我们该如何使用这项工具？

How can we how can this be used?

Speaker 2

其中一个应用案例发表在《公共行政评论》上——正如我简要提到的，我们发现中央评估单位撰写的报告相比业务单位（如国家办事处）的报告，系统性更为负面。

One of the examples of how can this be used is in another article published by the Public Administration Review, where we demonstrate, as I briefly mentioned, that such reports which are produced by the centralised evaluation units are systematically more negative as to compared to those reports produced by the operative IO units, such as the country offices.

Speaker 2

本质上，有了这些变量后，你可以通过这些评分进行各种分析，尝试基于我们的语言模型理解哪些因素影响了绩效评级。

So basically, you know, having variables then, you can play around these scores and try to understand what influences the performance ratings based on our language model.

Speaker 2

这有望大幅推动关于国际组织绩效的更广泛研究，而不仅限于评估政治本身的研究。

And this hopefully will drive significantly the research on the performance of international organisations more broadly beyond the mere research on the politics of evaluation itself.

Speaker 1

听起来这是项令人着迷的杰出工作。

That sounds like fascinating and fabulous work.

Speaker 1

感谢你向我们介绍这本书的内容以及后续的研究进展。

So thank you for telling us about kind of the book and what's continuing beyond that.

Speaker 1

我通常在采访结束时询问书籍完成后你们目前在忙什么，但觉得你已经非常全面地回答了这个问题，除非你和Stefan还有其他想重点提及的项目？

I do usually ask at the end of interviews kind of what you're working on now that the book is done, but think you've really just answered that quite comprehensively, unless there's any other projects either you or Stefan are working on you'd like to highlight?

Speaker 2

嗯，这正是我们目前专注的方向。

Well, that is exactly what the focus of ours is now on.

Speaker 1

是的，这非常合理。

Yeah, no, that makes a ton of sense.

Speaker 1

这方面确实有很多工作可做。

It's not like there isn't a lot to work on there.

Speaker 1

非常感谢你不仅带我们了解这本书，还介绍了整个研究项目。

So thank you very much for taking us through not just the book really, but the whole research project.

Speaker 1

对想深入了解的听众来说，这本书名为《国际组织中的评估政治学》，由牛津大学出版社出版。

And for anyone looking to get into the details of this, the book is obviously titled The Politics of Evaluation in International Organisations, published by Oxford University Press.

Speaker 1

正如Vitas刚才所说，还有更多研究正在进行。

And as Vitas has just told us about, there's a lot more research ongoing.

Speaker 1

我想如果有听众就此事联系你们两位中的任何一位，你们会非常感兴趣。

So I imagine you'd be quite interested if listeners reached out to either of both of you on it.

Speaker 2

当然，非常欢迎。

Sure, definitely.

Speaker 2

请随时联系我们。

Please reach out.

Speaker 2

如果你有任何想法、建议，甚至参与过这类评估的实践经验，我们都很乐意交流并深入探讨更多观点和见解。

And also, you have any ideas or recommendations or maybe even practical experiences working on one of these evaluations, we are very happy to talk about and discuss further ideas and insights.

Speaker 1

太棒了。

Fabulous.

Speaker 1

嗯，谁知道呢？

Well, who knows?

Speaker 1

这可能只是对话的开始。

This might just be the start of conversation.

Speaker 1

那么，维塔斯，非常感谢你参加播客，并告诉我们你和斯特凡的工作。

So, Vitas, thank you so much for coming on to the podcast and telling us all about your and Stefan's work.