电气与自动化 外文翻译 外文文献 英文文献 中英对照 计算机视觉中的学习_几点想法

余年寄山水
858次浏览
2020年08月18日 07:38
最佳经验
本文由作者推荐

孟子全文-南宁人事考试




计算机视觉中的学习:几点想法
Maria Petrou
Communications and Signal Processing Group,
Electrical and Electronic Engineering Department,
Imperial College,
London SW7 2AZ, UK
摘要:归纳的能力是学习的最重要的品质,并且归纳总结可以通过模型识别系统了解< br>知识的规律而不是它的表面,一个叫知识库的东西已经被提上日程了,通过它所有的知识
都能过被 组织起来,一个通过使用这个知识库和效用理论方面的知识的翻译场景系统也被
提上日程了,同时有人认 为,全部一致的标签解决方案是一个人造认知体系既不可能,也
不可取。
1 引言 通常“学习”这个词在现代社会有很多种解释,特别是从计算机语言来看。它已经被
用来解释很多事 物,从传输数据到怎样去识别视觉结构,这些领域最有价值参数的定义都
将可以得到解释。所以,我们应 该试着去理解这个世界的现象,就像它出现在认知领域相
关数学方程式中的问题一样。在下面的部分,我 们首先要从电脑视角去了解学习的意义,
同时去探究这个世界的两种定义之间的可能联系。
2 学习的主要特征
和认知任务有关的第一算法的部分内容已经被提上日程,它们的主要内容主要是有关< br>的神经网络的。神经网络的支持者们经常评论他们所开发的神经网络的归纳能力。归纳是
学习的主 要特点之一。事实上,我们在教我们的孩子时常常用童话故事和比喻,通常假定
他们有能力把它们推广到 实际情况中去。为了能确保我们能够教会我们的孩子,我们时常
希望把生活中可能遇到的情况都交给他们 ,我想这种情形是十分可笑的。因此我们可以肯
定推断归纳总结的能力是学习中最重要的能力。
L. Rueda, D. Mery, and J. Kittler (Eds.): CIARP 2007, LNCS 4756, pp. 1–12,
2007.c Springer-Verlag Berlin Heidelberg 2007



这意味着这种利用训练样本区间跨越特征空间的经典模型识别方法并不是这个 世界认
知意识的真正的学习方法。为了能够更好的推广,这种建立在规律上的神经网络通常在很
大程度上依赖于有足够的训练样本才能填充足够的特征空间。神经网络的这种训练方法通
常被用来去靠近 与分段线性部分特征空间中的类边界。当一个未知的模式到来时,它能够
和已经确定的填充空间联系起来 ,从而能够把模型出现的地方填充起来。一些旧的和一些
最近刚开发的方法是通过选择在类边界中其主要 作用的模型实现的,而不是使用高一级的
归纳能力,这些方法都是一些简单可行的方法。所以,神经网络 和模式识别方法并不是这个
世界认知意识上的学习方法。
问题在于:是这种被认知科学家和心 理学家理解意义上的学习就是算法,还是某种超越
目前掌握的数学形式?反对学习就是算法的证据是人类 的学习能力来自于一些简单的例子。
同样相反的观点是人类花大量的时间去学习,我们所见证的是一个简 单的学习是这些新方
法的应用的表现,比如一些通用的规则, 元知识
一直慢慢刻画在潜意识 里了。我要澄清一下,“学习”在这里不是指取得一个大学学
位。这的确是一个很漫长的过程,这是需要 大约三分之一的寿命的一个人。学习这里喻指生
存技能,操作技能在日常生活中。由于人类成长期在大约 12 - 15岁之间,我们可以假定自
然认为人类儿童已经准备好怎样面对生活在那个阶段。所以,我 们可以说,人类作为独立的
实体生命“学习”他们要学习的东西要经过12 - 15年的时间,这将是 他们生命中很长的
一段时间。所以,人类似乎是一个很慢的学习者。他们花很多时间去了解元知识的规则 。正
是这些被人们从数学算法的角度掌握的规则使人们能够很快的掌握知识,这就是我们经常
的 见到的例子。我们也许会从以上的观点得出以下结论:
- 归纳总结的一个重要特征是学习;
- 在算法学习中,概括必须得到足够的训练,从而能够填充整个类空间部分,或者是类
边界的部分地方;
- 我们都有真实的归纳命题能力,只有当由训练的例子学会知识变成如何间接地提取
物体的身 份而不是对象的类。如果这样的学习发生,完全未知的东西可能会得到正确的解释,
即使在没有先前的例 子的情况下。这个结论意味着我们必须教导计算机怎样去做,为的是建
造一个认知系统,这是关系,而不 是事实。例如,背字典的一种语言,没有教一个人的语言。
这个人必须去学习单词之间的关系,以掌握这 门语言。这是同意温斯顿的创新实验教计算机



认识拱门,但不给计算机所有它 可能遇到拱门。他把这个例子和柜台拱的例子教给它,教它
识别组件之间的关系,如“支持”和“被支持 ”之间的关系。
3 知识和元知识
至少有50个学习方面的理论的已经被认知科学家写 出来了。在计算机科学中,我们往
往会区分两种形式的学习: 试验式的学习和证明式的学习。归纳学习 也被计算机科学家使
用了,但是我们不应该像那样去处理,因为它意味着学习者已经学会了逻辑法则。所 以,归纳
学习可能被认为是已学中的应用元知识规则实验的能力,学习意味着尝试和给出结果,并从中得出结论。它最简单的形式就像全自动的过程一样。全自动分割算法,例如,将工作如下:
进行分 割,评估结果质量,调整参数,然后再试一次。直接的结论是:实验式的学习需要反馈
回路。还要求一个 可用的标准进行系统工作怎样的一个判断依据。这个标准的成为独立知
识或信息用于进行信息分段,是非 常重要的,否则结果将是一个预言式的幻想而已。在半自
动系统,则是由人来决定。报告的结果则包括文 学的声明类型:“最好的阈值是通过尝试与
错误发现的。“这种方法是一种非常合理的学习方式。就像一 个受到监督式学习的雨伞,它
对于帮助人类学习来说就像一个教师。飞往完全自动化的方法是相当于假设 计算机可能通
过学习元知识阶段,需要构建规则,现在学习收益很快,利用这些元规则。例如,如果我们 知
道农业等行业倾向于有直的边界,可以分辨部分输出捕捉到的图像的好与坏,它是通过空气
传 播的传感器捕捉到的,根据它是否产生了地区边界直。那个领域的知识是一个元知识是
直的边界。部分可 能操作只使用纹理和边缘信息。边界推断何以直,可用于作为标准来驱动
系统使用其反馈回路,以制定出 一个更好的分割。问题于是变为:系统如何获得这元知识?早
些时候,那一定是元知识必须要由人类的孩 子(或计算机学习者)痛苦地缓慢掌握通过大量
观察农业领域的例子。虽然没有发现可以将这些元知识移 植到人类孩子的大脑中的有效方
法。从老师的大脑中,计算机的优势在这里:更像一个老师,就是我们人 类,我们可插入把元
知识进入系统当正在进行的标准算法的自我评估。从这个的论点,我们得出结论:
- 可能在出现时不仅有未注册的关系,而且特点,类对象都有。
- 在交互系统中,元知识被人类老师人为地插入到计算机学习者大脑中。
- 在自动化系统, 元知识被用来对计算机的学习者提供人类的老师绩效考核的标准的
形式。
出现了两个问题是:



- 什么把知识和元知识连接起来?
- 开始的时候如何学习元知识?
4 论证式学习
为了回答上述问题,我们得到了一个从我 们前面提到的第二种类型学习的机会,即通
过示范学习线索。这里的示范者是老师。接下来是一个故事, 我从我奶奶听到。请记住,
儿童的传统的教学方式一直被通过故事和比喻。这个故事提供的线索,是我们 正在寻找的
内容。“很久以前有一个陶工,他得到一个学徒并且可以让他有学习陶艺技术的机会。陶工把他的陶壶放在烤箱里。两个小时后,他把火灭了坐下来休息,抽起烟来,就像他是一
个老人一样 。然后,他把盆拿出烤箱。它们看上去真是太完美了。学徒后来决定做自己的
盆。他让他们拿出粘土,放 在烤箱里。两个小时之后,他也把他们拿出来。但是这些壶都
坏了。他重复同样的工作但却得到同样的结 果。他找到那个陶工并问他说:“你没真正的
教我。这样事情不可能发生如果你真的教我的话。”“难道 你停止吸烟后,您关掉火了?”
“不,我不吸烟。”“那是因为你拿出了烤箱盆出得太早了。”
我相信这个故事是有关我为了教我们要注意细节而讲的。事实上,如果陶工的徒弟看
了陶工来回几十次 表演的时候注意一些轻微变化,而且始终与前稍作停,他可能学到做陶
盘至关重要的过程。另一方面,如 果老师给出了明确的信息的话,那么这个老师可能就是
一个更好的老师了。所以,这个故事告诉我们,我 们学习新知识是从很少的例子当中获得
的,如果想快速获得的话只有当有人向我们解释他么不是怎样做到 的才行。一个孩子一直
问“为什么”,这才是一个孩子如何才能学习好的办法。这告诉我们“我们不能撇 清学习
认识到每个对象的学习,为什么它是这样的”。有人可能会想到下面的老师和学生之间的
交流和学习:
“这是什么?”
“这是一个窗口。”
“为什么?”
“因为它让光进来,并允许人们看到了。”
“那怎样做呢?”
“通过具有一个宽阔的视野。”
“这是是真的吗?”
这种学习顺序如图1所示。这 个数字建议在我们的大脑,知识是由一个网络,形成一系列



复杂的结构,我称之为‘知识之塔’。名词的网络是一个对象名称,标签网络,例如: ‘窗
口’,‘烟囱’,‘门’等动词或行动网络,是网络功能,例如: ‘看出来’,‘进入’,
‘退出’等表象网络是一个功能所需的基本形状网络实现,如: “这是人类以 自己的视野
对外认识的水平。”因此,知识的洪流是上面那段谈话内容一样。当系统循环结束时,我们可以证明,我们正在研究的对象其功能用途的特点可以实现这项任务,因此,对于人工
视觉科学家 来说,他们的任务是去建立网络及其相互连接层之间的模型。我们有我们所掌
握的各种工具:马尔可夫随 机域,推理规则,贝叶斯网络,模糊推理等我从一开始就排除
任何确定性的办法,或者是因为事情是真正 随机的性质(或至少有一个显着的随机成分),
或者当为制定清晰的决策规则和决定时因为我们的模式和 我们的知识是太粗糙和不完美而
不能胜任。

5 马尔可夫随机场
最近的一些工作发现的证据表明,网络名词(最贴切的说法是“思想”,如果从网络
心理加以描述)是一 个随机网络拓扑,而由一些想法组成的关系网络,它们是拓扑无尺度
的。例如,像“叉刀”,“门窗口” 比“门”或“窗口”出现地更加频繁。这表明,这些
网络相互之间的连接总是不断变化的,而且实际上并 不总是对称的。例如,想法“门”的
可能引发的“窗口”的想法比“窗口”引发的想法“门”的想法更加 频繁。这种不对称的
相互作用表明马尔可夫随机场(马尔科夫场)不适用于它们在图像处理应用上的一贯 的形
式。一个在网格上定义的马尔可夫随机场邻里之间的相互作用的例子如下图2b所示。马尔
可夫随机场以及它提供给邻近的相互作用的重力,不能由吉布斯联合概率密度函数来表示。
例如,在中心 的细胞被顶部左邻重力-1影响,而他自己,作为顶部左下角的单元格的右边
的邻居,通过重力-1来影 响它。当一个试图释放这样一个随机场时,这种不对称会导致系
统的不稳定,因为创建的本地模式不是全 部一致的(不是全可由吉布斯分布可表达出来的)。
这种马尔可夫随机场的一些发散是不收敛的,而是在 一些可能的状态之间摇摆。 (最佳化
的吉布斯分布偶尔会收敛到正确的解释,但更多的往往不是这样, 他们幻想他们得到了错
误解释。)
因此,通过使用一个非吉卜赛中期马尔科夫场,在每一个知 识水平的人们都可以来建
立网络模型,如图表1所示。层与层之间的相互依赖性,也可能通过这样的网络 来建立模
型,但也许使用贝叶斯模型比较适合,因为层与层之间的相关性是因果关系或诊断关系,



而不是点对点对等关系。然而问题是:“我们去哪儿得到这些知识来构建网络?”母 亲教
她的孩子从哪里得来?孩子的母亲没有‘最简单的真理’或普遍的知识可以教给她的孩子:
她明白一些东西,然后才告诉他的孩子,然后她想起别的东西,根据她自己的想法,这些
想法是相互交织 的并且由她自己的感觉来组织的,她会再次给孩子讲讲。因此,所有的母
亲(老师)的作用是传授给孩子 她自己关于这些想法和概念之间的联系。如果母亲告诉孩
子:“这是一支铅笔,那是一个橡胶。铅笔帮助 我们而橡皮帮助我们抹去我们写的东西。”
孩子会在自己的大脑中会有像他的母亲同样的反应。铅笔橡皮 现象将在孩子的大脑中产生
一个关于名词的网络很强的记忆,就像在孩子的动词网络。所以,有一件事情 我们可以做
的是建立属于我们自己关于这些想法和功能之间联系的模型。然后让孩子或计算机提出正确的问题。对于每一个答案,相应的连接强度会提高。我们可能会把这些优势变成概率。
然后,一个 全新的场景可能会显示到计算机上。该儿童或计算机必须能够使用他已经学会
的这些连接去解释这种新的 景象。
实际上,这是通过使用手动标注的图像实现的。 数学家Heesch和彼得鲁做这些是去解释建筑物室外的场景:他们使用了数百张地面的图像学习区域配置的马氏相依理论,确
定了居委会 是一个地区被分成六个地区,这些地区履行下列几何约束中的一个:它在上面,
下面,向左边,向右边, 它被装载,或把这个区域考虑在内。一个未知的场面被贴上标签,
通过使用一个初级的,建立在个人工具 基础上的一个标签来完成,这个个人工具由各个区
域和组成,以及放宽对分割区域定义的马尔科夫场,用 图的着色与绘图按各当地条件概率
地区的标签标记,这个标签是建立在当前周围标签之上的。全部一致是 不存在的,当标签
之间的相互依存是不对称的时候。我们可以直观地理解为在室外环境中物体之间的相互 作
用太弱而不可能对一个区域的影响产生显著效果。例如,如果属于该房子的该区域是




一个门, 这一区域就在图1的另一端。双头箭头代表上下文交互 。连续的细箭头表示
查询。虚线箭头代表的答案,即信息的传递。在认知视觉工作中兴趣的水平就是名词 的水
平,在这里我们要把标签分配到每个对象。与名词网络上下文内涵有关的例子有“门”,
“ 窗口”,“阳台”。与语境网络的功能内涵节点有关的例子是“让空气进来”,“让光
进来”,“允许一 人进入”。与描述网络上下文内涵节点有关的例子是“有一个玻璃面板”,
“在水平面”,“有一个把手 来打开它”。视野可能是汽车,灌木,一个窗口,房子,或
一棵树。从可能性的角度来看这些选择的差别 是很小的:标签之间的相关函数,指在一定
距离之外的物体,变得平坦而且不能提供有用的信息。所以, 没有一个模型是相关的或有
用的。

6 贝叶斯推理
贝叶斯方法 迄今为止已被用在两个方面:要么是在概率松弛的形式(PR)方面或在推理珍
珠贝氏网络的形式。概率 松弛起源于华尔兹写的关于约束传播著作,他用简洁的制约条件



和一劳永逸地 方法解决所有的不符和条件的标号,这些标号导致其他不可能的问题。通过
考虑每一个对象的环境信息, 概率松弛不断更新单个物体的各种标签的概率。由于该上下
文信息是有效对等的,概率松弛对于建立因果 关系的方面的模型不是一个好的工具。对于
马科夫场来说它是一个替代的工具,在上一节讨论了在同一层 建模的影响。概率松弛,像
马尔科夫松弛一样,是不能保证得到一个独特的全方位的解决方案,除非有特 殊条件。我
们曾讨论过,虽然这是一个不现实的问题,那就是:场景标号不必是全部一致的,但在本地要一致。这项声明似乎与以前的声明是矛盾德,概率松弛是华尔兹的算法,这个算法可
以在60秒 之内解决标号不一致的问题。这一矛盾,然而,只是表面的。在60年代标号不
符的问题,是指单固体物 体的标号,通过在子部分上贴上标签,而不是包含许多不同对象
的场景,在这些对象之间的约束是远远弱 于其他相同的固态物体内的子模块之间的限制。
贝叶斯方法的第二种形式是推理珍珠贝氏网 络。这里的关系可能是因果关系,因此这
些网络对中间层推论来说是合适的。贝叶斯方法依赖于条件概率 。
如何为这种方法选择一个条件概率一直是一个问题。条件概率可能要通过数百个例子
才能慢 慢地被我们掌握。 Stassopoulou等人通过绘制类边界的图来解决了条件概率的学习
问题, 这个类边界通过从这个网络到另一个网络来表达,通常这种网络被用来作为学习条
件概率的一种方法。另 外,有的条件概率可能会从其他已经受过训练的网络转移过来:老
师的网络。这种转移相当于让他们具备 一些来自其他参数的一些参数(如高斯)。形式和
参数的任意选择通常导致对临时或不合理的做法的批评 。现在情况是,老师只是将自己的
获得的知识的转移到学生(或电脑)身上。这种做法使我们获得了新的 理论,例如像所谓
的“效用理论”。
效用理论是一种决策理论。在图像对象中向物体 分配标签就是一个决定。在贝叶斯框
架内,在
Bayesian
的结构理论中,我们做 出这个决定是通过最大限度地把我们所获得信息
全部输入到这个标签中。在效用理论中,这种可能性函数 被叫做“效用函数”,这表示主
观偏好或每个标签可能带来的后果。效用函数乘以每个标签的贝叶斯概率 再加上上面所有
可能性,这就导致只有一个可以传递到最后的标签。所以,这种方法避免了马尔科夫场和
PR的迭代。效用函数可用来鉴定一些人对这个世界已经获得的元知识。就是这样的知识,
可以 通过演算和从很多例子中学到,但现在通过我们现有的工具基础,它们是不能通过条
件和偏见这些形式来 表达的。就是这样的知识告诉我们,当我们想要从一个经常推迟约定
的人手中购买一辆车时要慎重,那样 就不会立即产生维修记录。没有人使用“实用功能”



这个术语的这种想法已经有一段时间了。例如,在九十年代中期,心理学家都在谈论所谓
的P –map和m –map。P –map主要是关于在我们生活中我们可能遇到情况的一些知识。
一个P -map可以或多或少的帮助我们抽样调查一个场景,无论这些地方重不重要,我们做
一个m- map都是很特别的。人们可能会把p-map看作是今天的实用功能而m- map主要是建
立在我们已有的工具之上的,这里的m-map是指标签的贝叶斯部分。
在计算机视觉方面,实用的理论已被Marengoni 用于选择本应被应用于空间图像方面
的特色功能。此外,人们可以把米勒等人的工作当作是用一个可以给那些不寻常的转换设
置障碍的实用功 能,这些转换必须和电脑认为的转换一致。在假定转换和标签测量都是独
立的情况下,通过最大限度的寻 找所观察到的形式的概率密度函数和标签以及所观察到事
物的密度方程之间的可能性,作者有效地提取了 一些样本。

7 建立“为什么”和“怎样”模型来回答“是什么”
让我们来 考虑一下在图1中提到的知识塔。为了在以后的情景中能够识别物体我们首
先要给学习的问题建立一个方 程,在这个场景中可以使用分层表述形式来说明知识和效用
理论。让我们假定我们可以最大可能的把标签 分到每一个场景中,在通常情况下标签
lj

及它的可能性
pij
将 被分配给
oi

,表达式如下:
pij = p(lj |mi)p(mi) = p(mi|lj)p(lj) (1)

这里mi代 表我们在oi基础上所制造的所有工具,p(mi)和p(lj)分别是测量和标签之间的先验概
率。概 率松弛理论将根据来自邻近地区的上下文收到的资料来更新这些概率。在这里我们
可以不遵守那些规则。 相反,我们将使用来从其他层知识的信息来平衡这个公式。我们设
定图一中的在“动词”水平线上的单位 用fk来表示,而在描述水平线上的单元用dl表示,而
后我们lji把给oi,表达式如下:
j
i
argmax

ujk

v
kl
c
il
p
ij

2

j
kl
其中
ujk
表示对带标签
lj
的物体的重要程度,
lj用来满足
fk

vkl
表示特征量
dl
对于一个物
体有满足
fk
可能性的重要性,
cik
是指
dl
适 用于
oi

假定效用函数的值表示我们手中的证据,区域
oi
可以 满足作为区域
lj
的角色的特征,
例如,分给
oi
的标签是一个“阳 台”,效用函数必须能够表示出是否这个阳台有足够大空间
以至于可以站下一个人,是否靠近墙,是否有 个门面对它。所有这些因素在一起就容许一
个物体来扮演阳台的角色。一个学习计划要能够学到
ujk

vkl
的值,要么是通过一些例子
(这个过程表较缓慢的并且有点痛 苦),要么选择相信一个已经亲身经历过这个过程而且



了解那些值的老师,这 样就可以把他们直接输入电脑处理器中。电脑处理器必须有一个输
入处理系统的装置来计算出
c il
的值。

图2(a)中,在边线上相邻的点和马尔科夫参数 都是一致的,如果这个区域被释放,
将会出现和上下不一样的彼此相似的标签水平带,在图像处理中它会 导致水平方向的纹理
图案。图(b), 在边线上相邻的点和马尔科夫参数是不一致的:左边的和中间的 是不一样
的。中间的看作是顶部左面的右底角邻居,它们是相似的。

8 总结
我认为学习是一种归纳总结能力的体现,如果要学的东西并不是可以看到的物体的表
面 而是被分配标签的内部规律,那么学习就可以学好。我也认为元知识可能被老师输送给
他的老师,方式可 以是一些规则,或是最简单的方法,这些都是人类通过自己的经验和体
会应用算术参数完成的。这使我和 社会上大多数审稿人一样都很排斥论文,理由是这些参
数已经被选过了:这就是老师把他们潜心学到的知 识传授给他的学生的例子。另一种方法
是学习者从大量的示例中慢慢地琢磨然后才学到他们要学的知识。
我同时认为我们不需要全部一致的场景标签。通过玛格丽特画的场景,我们可以得出
全部一致的 标签从来不会让我们建立一个合适的关于一辆火车从大火中冲出来的场景,这
是因为火车不是来自于有火 灾的地方。我永远不会让电脑来识别一个五条腿的马,但是我
们人可以。因此我们需要的是现实和知识的 碎片。
从电脑的视角,抛弃全部一致的方案的想法现在已经成熟了。这并不和其他科学相左。
战略分析家们会谈论‘学习行动的结构’,甚至是数学家们很早以前就应经抛弃了关于全
部一致的数学科 学的想法。当奥尔德的证明出版的时候,数学家罗素不得不抛弃建立在一
部分公理上全部一致算法的理念 。自然地系统都不是全部一致的:他们在一些状态之间摇
摆,而我们必须战胜这个动态的全部不一致的而 且含糊不清的世界才能生存。一个机器人
系统也必须能做到这些,或许那样做成功的方法是去自己构造, 这样才能和一些有限的理



解力相符。











Learning in Computer Vision: Some Thoughts
Maria Petrou
Communications and Signal Processing Group,
Electrical and Electronic Engineering Department,
Imperial College,
London SW7 2AZ, UK
Abstract. It is argued that the ability to generalise is the most important characteristic of learning
and that generalisation may be achieved only if pattern recognition systems learn the rules of
meta- knowledge rather than the labels of objects. A structure, called “tower of knowledge”,
according to which knowledge may be organised, is proposed. A scheme of interpreting scenes
using the tower of knowledge and aspects of utility theory is also proposed. Finally, it is argued
that globally consistent solutions of labellings are neither possible, nor desirable for an artificial
cognitive system.

1 Introduction

The word “learning” has many interpretations among the pattern recognition community in
general, and the computer vision community in particular. It has been used to loosely mean
anything between the identification of the best value of a parameter from training data, to learning
how to recognise visual structures. So, perhaps we should try to distinguish the interpretation of
the word as it appears in the mathematical formulation of problems, from its interpretation as it
appears in cognition related tasks. In the sections that follow, we shall try to understand the
significance of learning in the context of computer vision and identify a possible relationship
between these two interpretations of the word.



2 The Main Characteristic of Learning

Some of the first learning algorithms in relation to cognitive tasks, that have been proposed in the
past, are the various types of neural network. Proponents of neural networks often comment on
the generalisation capabilities of the networks they develop. Generalisation is one of the
characteristics of learning. Indeed, we, humans, teach our children often with fairy tales and
parables, assuming that they have the ability to generalise to real situations. It is preposterous to
expect that we shall have to teach our children about all individual possible situations they may
encounter in life, in order to make sure that we have taught them well. We may safely conclude,
therefore, that the ability to generalise is the most important characteristic of learning.
L. Rueda, D. Mery, and J. Kittler (Eds.): CIARP 2007, LNCS 4756, pp. 1–12, 2007._c
Springer-Verlag Berlin Heidelberg 2007

This implies that classical pattern recognition methods, that use training examples to span the
feature space, are not really learning methods in the cognitive sense of the word. Even neural
network based methods, in order to generalise well, rely heavily on the availability of enough
training samples to populate adequately the feature space. The training patterns are used by the
neural networks to approximate the class boundaries in the feature space with piece-wise linear
segments. When an unknown pattern arrives, it can be associated with the class that has been
identified to populate the part of the feature space where the pattern appears. Some old [3] and
some more recently developed methods [1], that can work with fewer training patterns than
straightforward methods, do so by selecting the patterns that matter most in defining the class
boundaries, rather than by using some higher level generalisation abilities of the classifier [23].
So, neural networks and pattern classification methods are not learning methods in the cognitive
sense of the word.
The question then that arises is: is learning, as understood by cognitive scientists and
psychologists, algorithmic, or is it something beyond the current grasp of mathematical formalism
Evidence against learning being algorithmic is the ability of humans to learn even from single
examples. A counter-argument is that humans take a lot of time to learn, and it is possible that
what we witness as super-fast learning is simply a manifestation of the application of some
meta-knowledge, some generic rules that have been slowly and painfully learnt subconsciously. I
would like to clarify that “learning” here does not refer to getting a University degree. This is
indeed a very long process and it takes almost one third of the life span of a person. Learning here
refers to survival skills, to skills needed to operate in everyday life. Given that a human becomes
reproductive roughly between the ages of 12–15, we may assume that nature considers the human
child becoming ready for life at that stage. So, we may say that humans “learn” what they have to
learn, to be ready for life as independent entities, over a period of 12–15 years, which is still a
significant fraction of their life span. Therefore, humans seem to be slow learners after all. They
take a lot of time to work out the rules of meta-knowledge. It is these rules, that have possibly
been learnt in an algorithmic way, that allow then the human to learn in the super-fast, almost
magical, way that we often witness. We may conclude from the above arguments that:
– generalisation is an important characteristic of learning;
– generalisation in algorithmic learning may only be achieved by having enough training
examples to populate all parts of the class space, or at least the parts that form the borders



between classes;
– we have true generalisation capabilities, only when what is learnt by training examples are rules
on how to extract the identity of objects and not the classes of objects directly. If such learning
has taken place, totally unknown objects may be interpreted correctly, even in the absence of any
previously seen examples.
This conclusion implies that what we have to teach the computer, in order to construct a
cognitive system, are relations rather than facts. For example,memorising the dictionary of a
language, does not teach a person the language. The person has to learn the relations between
words in order to master the language. This is in agreement with Winstone’s pioneering
experiments on teaching the computer to recognise arches. He did not show to the computer all
possible types of arch it may encounter. He showed it examples and counter examples of arches
and taught it to recognise relations between components, such as “supports” or “is supported by”
[26].

3 Knowledge and Meta-knowledge

There are at least 50 theories of learning developed by cognitive scientists [28]. In computer
science, we tend to distinguish two forms of learning: learning by experimentation and learning
by demonstration. Inductive learning is also used by computer scientists, but we shall not deal
with that here, because it implies that the learner has already learnt the rules of logic. So,
inductive learning may be thought of as a way of applying the already learnt rules of
meta- knowledge [22].
Learning by experimentation implies the ability to try, reason about the results and draw
conclusions. In its simplest form, this is what fully automatic programs do. A fully automatic
segmentation algorithm, for example, will work as follows: perform segmentation, assess the
quality of the result, adjust the parameters and try again. The immediate conclusion is that
learning by experimentation requires the presence of a feed- back loop. It also requires the
availability of a criterion that says how well the system has performed each time. It is important
for this criterion to be independent of the knowledge or information used to perform the
segmentation in the first place, otherwise the result will be a self-fulfilling prophecy. In
semi-automatic systems, the criterion is provided by the human. The reported results in the
literature then include a statement of the type: “the best threshold was found by trial and error.”
This method is a perfectly legitimate method of learning. It comes under the umbrella of
supervised
learning and it corresponds to human learning with the help of a teacher.
Proceeding to fully automated methods is equivalent to assuming that the computer has
somehow passed that stage of learning the meta-knowledge, necessary to construct rules, and now
learning proceeds very fast, making use of these meta-rules. For example, if we know that
agricultural fields tend to have straight borders, we may judge the output of a segmenter of an
image, captured by an airborne sensor, as good or bad, according to whether it produced regions
with straight borders or not. The knowledge that fields have straight boundaries is a
meta-knowledge. The segmenter might have operated using only texture and edge information.
The straightness of the inferred borders may be used as a criterion to drive the system to use its
feed-back loop to work out a better segmentation. The question then is: how did the system



acquire this meta- knowledge? As argued earlier, it must be the meta- knowledge that had to be
learnt by the human child (or the computer learner) painfully slowly by seeing lots of examples of
agricultural fields. And although no method has been found yet to transplant this meta-knowledge
to the brain of the human child from the brain of the teacher, computers have an advantage here:
the teacher, i.e. the human, may insert the meta-knowledge into the system while developing the
criterion of self assessment of the algorithm. From this line of argument, we conclude that:
– meta- knowledge may take the form not only of relations, but also of generic characteristics
that categories of objects have;
– in interactive systems, meta-knowledge is inserted into the computer learner by the human
teacher manually;
– in automatic systems, meta- knowledge is supplied to the computer learner by the human
teacher in the form of a criterion of performance assessment.
Two questions then arise:
– what connects the knowledge with the meta-knowledge?
– how is meta-knowledge learnt in the first place?

4 Learning by Demonstration

To answer the above questions, we get a clue from the second type of learning we mentioned
earlier, namely learning by demonstration. The demonstrator here is the teacher. The next is a
story I heard from my grandmother. Remember that the traditional way of teaching children has
always been through stories and parables. This story offers the clue we are searching for.
‘Once upon a time there was a potter who got an apprentice who wanted to learn the art of pottery.
The potter made his clay pots and put them in the oven. After two hours, he turned the fire off,
and sat down to rest and smoke, as he was an old man. Then he took the pots out of the oven.
They were perfect. The apprentice later decided to do his own pots. He made them out of clay and
put them in the oven. After two hours, he took them out. The pots broke. He repeated the task and
he had the same results. He went back to the potter: “You did not teach me well. Such and such
happened.” “Did you stop to smoke after you switched off the fire?” “No, I am not a smoker.”
“So, you got the pots out of the oven too soon.”’
I am sure the story was related to me in order to teach me to pay attention to the detail. Indeed,
if the apprentice had seen the potter performing the act dozens of times with slight variation each
time, but always with the pause before the pots were taken out of the oven, he might have worked
out that that pose was crucial to the process. On the other hand, the teacher might have been a
better teacher if he had made that information explicit.
So, this story tells us that we learn fast, from very few examples, only when somebody explains
to us why things are done the way they are done. A child asks lots of “why”s and that is how a
child learns. This tells me that we cannot disassociate learning to recognise objects from learning
why each object is the way it is. One may consider the following exchange between a teacher and
a learner:
“What is this?”
“This is a window.”
“Why?”
“Because it lets the light in and allows the people to look out.”



“How?”
“By having an opening at eye level.”
“Does it really?”
This sequence of learning is shown in Fig. 1. This figure proposes that knowledge in our brain
is represented by a series of networks, forming a complex structure that I call the “tower of
knowledge”. The network of nouns is a network of object names, labels, e.g. “window”,
“chimney”, “door”, etc. The network of verbs or actions, is a network of functionalities, e.g. “to
look out”, “to enter”, “to exit”, etc. The network of appearances is a network of basic shapes
necessary for a functionality to be fulfilled, e.g. “it is an opening of human size at floor level”. So,
the flow of knowledge goes like the fragment of conversation given above. The loop closes when
we confirm that the object we are looking at has the right characteristics for its functional purpose
to be fulfilled.
The task, therefore, for the artificial vision scientist, is to model these layers of networks and
their inter-connections. We have various tools at our disposal: Markov Random Fields [8],
grammars [19], inference rules [24], Bayesian networks [16], Fuzzy inference [27], etc. I would
exclude from the beginning any deterministic crisp approaches, either because things are
genuinely random in nature (or at least have a significant random component), or because our
models and our knowledge is far too gross and imperfect for creating crisp rules and dogmatic
decisions.

5 Markov Random Fields

Some recent work [17] showed evidence that the network of nouns (better described in
psychophysical terms as network of “ideas”) is topologically a random network, while the
network of relations, made up from pairs of ideas, is topologically scale-free. For example, pairs
like “fork-knife”, “door-window” come up much more frequently in trains of thought than “door”
alone, or “window” alone. This indicates that the connections in these networks are of varied
strength, and actually are not always symmetric. For example, the idea “door” may trigger the
idea “window” more frequently than the idea “window” triggers the idea “door”. This asymmetry
in the interactions is a manifestation that Markov Random Fields (MRFs) are not applicable here
in their usual form in which they are applied in image processing. An example of the interactions
in a neighbourhood of an MRF, defined on a grid, is shown in Fig. 2b. This MRF, and the weights
it gives for neighbouring interactions, cannot be expressed by a Gibbs joint probability density
function. For example, the cell at the centre is influenced by its top left neighbour with weight −1,
but itself, being the bottom right neighbor of the cell at the top left, influences it with weight +1.
This asymmetry leads to instability when one tries to relax such a random field, because local
patterns created are not globally consistent (and therefore not expressible by global Gibbs
distributions) [18]. According to Li [9,10,11], relaxations of such MRFs do not converge, but
oscillate between several possible states. (Optimisations of Gibbs distributions either converge to
the right interpretation, but more often than not, they hallucinate, i.e. they settle on wrong
interpretations.)
So, one could model the network at each level of the tower of knowledge shown in Fig. 1,
using a non- Gibbsian MRF [5]. The interdependences between layers might also be modelled by
such networks, but perhaps it is more appropriate to use Bayesian models, as the inter-layer



dependencies are causal or diagnostic, rather than peer-to-peer.
The question that arises then is: “where are we going to get the knowledge to construct these
networks?” Where does the mother that teaches her child get it from? There is no “ground truth”
or universal knowledge the mother transfers to her child: she sees something and talks about it to
the child, then she remembers something else, according to her own network of related ideas that
invoke each other and are prompted by her own sensory input, talks again to the child, and so on.
So, all the mother (the teacher) does is to transfer to the child her own connections between ideas
and concepts. If the mother tells the child “This is a pencil and that is a rubber. The pencil helps
us write and the rubber helps us erase what we wrote.”, the child will make the same connections
as the mother had in her own brain. Pencil-rubber will have a strong mutual recall in the child’s
network of nouns, as well as write-erase in the child’s network of verbs. So, one thing we can do
is to model our own mental connections between ideas and functionalities. Then let the child (the
computer) ask the right questions. For every answer, the strength of the corresponding connection
is increased. We may turn these strengths into probabilities. Then a totally new scene may be
shown to the computer. The childcomputer must be able to use the connections it has learnt to
interpret this new scene.

In practice, this is done by using manually annotated images. Heesch and Petrou [5] did exactly
this to interpret outdoor scenes of buildings: they used hundreds of annotated images to learn the
Markov dependencies of region configurations, defining the neighbourhood of a region to be the
six regions that fulfil one of the following geometric constraints: it is above, below, to the left, to
the right, it is contained by, or contains the region under consideration. An unknown scene was



then labelled using a preliminary labelling performed on the basis of individual measurements
made on each region, and relaxing the MRF defined on the segmented regions, using graph
colourings and drawing labels for each region according to the local conditional probability of
labels, conditioned on the current labels of the neighbours. No global consistency is guaranteed
that way, but no global consistency exists, when the interdependencies between labels are
asymmetric. We may intuitively understand this, as in an outdoor environment the long range
interactions between objects are probably too weak to have a significant effect on the identity of a
region. For example, if this region that belongs to this house here is a door, that region that is at
the other end of the Fig. 1. The tower of knowledge: how knowledge may be organised. The
double-headed arrows represent contextual interactions. The thin continuous arrows represent
queries. The dashed arrows represent answers, i.e. transfer of information. The level of interest in
a cognitive vision task is the level of nouns, where we wish to assign labels to objects. Examples
of nodes with contextual connotations in the network of nouns are “door”, “window”, “balcony”.
Examples of nodes with contextual connotations in the network of functionality are “lets air in”,
“lets light in”, “allows a person to enter”. Examples of nodes with contextual connotations in the
network of descriptions are “has a glass pane”, “is at eye-level”, “has a handle to open it”.field of
view may be a car, a bush, a window, a house, or a tree. The differentiation of such options in
terms of probabilities must be very small: the correlation function between labels, referring to
objects beyond a certain distance apart, flattens out and provides no useful information. So, no
global model is expected to be relevant or useful.

6 Bayesian Inference

Bayesian approaches have been used so far in two ways: either in the form of probabilistic
relaxation (PR) [7] or in the form of Pearl-Bayes networks of inference [16]. Probabilistic
relaxation has its origins in the seminal work on constraint propagation byWaltz [25], who used
crisp constraints and solved once and for all the problem of globally inconsistent labellings that
led to impossible objects [6]. Probabilistic relaxation updates the probabilities of various labels of
individual objects by taking into consideration contextual information [7]. As this contextual
information is in effect peer-to-peer, probabilistic relaxation is not an appropriate tool for
modelling causal relationships. It is rather an alternative tool to MRFs discussed in the previous
section for modelling influences at the same layer. Probabilistic relaxation, just like MRF
relaxation, is not guaranteed to converge to a unique global solution, unless special conditions are
obeyed [21]. We discussed earlier that this is not an issue in reality: labellings of scenes do not
have to be globally consistent, but only locally consistent. This statement seems to be in
contradiction with a previous statement, saying that probabilistic relaxation is the generalisation
of Waltz’s algorithm which solved the problem of inconsistent labellings in the 60s. This
contradiction, however, is only superficial. The problem of inconsistent labellings of the 60s was
referring to the labellings of single solid objects, by labelling their sub-parts [4] and not the
labellings of scenes that contain many different objects, where constraints between objects are far
weaker than constraints within the subparts of the same solid object.
The second form of Bayesian approach is that of Pearl-Bayes networks of inference. Here the
relations may be causal, and so these networks are appropriate for inter- layer inference. Bayesian
approaches depend on conditional probabilities. How to choose these conditional probabilities has



always been a problem for such methods. Conditional probabilities may have to be learnt
painfully slowly from hundreds of examples. Stassopoulou et al. [20] solved the problem of
learning the conditional probabilities, by mapping the class boundaries expressed by such a
network, to a neural network, which was subsequently trained to learn these conditional
probabilities. Alternatively, conditional probabilities may be transferred ready from another
already trained network: the network of the teacher. This transference is equivalent to choosing
them to have some parametric form (e.g. Gaussian) with parameters chosen “arbitrarily”. The
arbitrary choice of formand parameters usually leads to the criticism of the approach being ad-hoc
or unjustified. It is not, if the teacher simply transfers their own hard gained knowledge to the
pupil (the computer). Such an approach leads us to new theories, like for example the so called
“utility theory” [12].
Utility theory is a decision theory. Assigning labels to objects depicted in an image is a
decision. In the Bayesian framework we make this decision by maximizing the likelihood of a
label given all the information we have. In utility theory, this likelihood has to be ameliorated
with a function called “utility function”, that expresses subjective preferences or possible
consequences of each label we may assign. The utility function multiplied with the Bayesian
probability of each label and summed over all possibilities leads in one pass only to the final label.
So, this approach avoids the iterations used by MRFs and PR. The utility function may be
identified with the innate meta-knowledge somebody has acquired about the world. It is that
knowledge, that might have been learnt algorithmically and from many examples, but which now
is expressed in the form of conditions and prejudices that cannot be fully justified by the
measurements we make. It is the knowledge that tells us to be cautious when we want to buy a car
from a man that postponed the appointment we made several times, that did not produce
immediately the maintenance record of the car we requested, and so on. Such ideas have been
around for some time, without people using the term “utility function”. For example,
psychologists in the mid-nineties were talking about the so called pmaps and m-maps. The
p-maps were meant to be the prior knowledge we have about various possible patterns that we
may encounter in life. A p-map guides us to sample a scene more or less carefully at places where
it matters or it does not matter, respectively, producing the m-map that is specific to the present
situation. One may identify here the p-maps as being the utility functions of today and the
m-maps the Bayesian part of labels conditioned on the measurements we have made1.
In the computer vision context, utility theory has been used by Marengoni [13] to select the
features and operators that should be utilised to label aerial images. Further, one may interpret the
work of Miller et al. [14] as using a utility function that penalises the unusual transformations that
will have to be adopted to transform what is observed to what the computer thinks it is. The
authors effectively choose labels by maximising the joint likelihood of the probability density
function of the observed transforms and the probability density function of the labels and
observations, assuming that transforms and labelsmeasurements are independent.


7 Modelling the “Why” and the “How” in Order to Answer the “What”

Let us consider the tower of knowledge presented in Fig. 1. We shall formulate
here the problem of learning to recognise objects in a scene, using this hierarchical



representation of knowledge and utility theory
.Let us assume that we use maximum likelihood to assign labels to a scene. In the conventional
way of doing so, object oi will be assigned label lj with probability pij, given by:
pij = p(lj |mi)p(mi) = p(mi|lj)p(lj) (1)
where mi represents all the measurements we have made on object oi, and p(mi) and p(lj) are the
prior probabilities of measurements and labels, respectively. Probabilistic relaxation will update
these probabilities according to the contextual information received from neighbouring regions.
We do not follow that route here. Instead, we shall use the information coming from the other
layers of knowledge to moderate this formula. Let us identify the units in the “verbs”
1 The ideas of p-maps and m-maps first came to my knowledge by Robin Shirley of the
Psychology Department of Surrey University, who passed away before he had the chance to make
them more concrete and publish them.
level of Fig. 1 by fk, and the units at the descriptor level of Fig. 1 by dl. Then we may choose
label lji for object oi as follows:
j
i
argmax

ujk

v
kl
c
il
p
ij

2

j
kl
where ujk indicates how important is for an object with label lj to fulfil functionality fk; vkl
indicates how important characteristic dl is for an object to have the possibility to fulfil
functionality fk, and cik is the confidence we have that descriptor dl applies to object oi.
Note that the value of the utility function expresses the evidence we have that region oi has the
necessary characteristics to fulfil its role as object lj. For example, if the label we consider of
assigning to object oi is “balcony”, the utility function must express whether this object has
dimensions big enough to allow a human to stand on it, whether it is attached on a wall, and
whether there is a door leading to it. All these are conditions that will allow an object to play the
role of a balcony. A learning scheme must be able to learn the values of ujk and vkl either directly
from examples (slowly and painfully), or by trusting its teacher, who having learnt those values
himself, slowly and painfully over many years of human life experiences, directly inserts them to
the computer learner. The computer learner then must have a tool box of processors of sensory
inputs that will allow it to work out the values of cil.

Fig. 2. (a) A local neighbourhood at the pixel level with globally consistent Markov parameters: if
this field is relaxed it will lead to horizontal strips of similar labels which will be distinct from the
labels above and below. In image processing it will lead to a texture pattern with strong horizontal
directionality. (b) A local neighbourhood at the pixel level with globally inconsistent Markov
parameters: the top left pixel tells the central pixel to be different from it; the central pixel, seen as
the bottom right neighbour of the top left pixel, tells it to be similar to it.




8 Conclusions

I have argued here that learning is characterised by the ability to generalise, and that this can only
be achieved if what is learnt is not the labels of the objects viewed, but the rules according to
which these labels are assigned. I have also argued that this meta-knowledge may be transferred
to the learner (the computer) directly by the teacher (the human developer), in the form of rules,
or in the simplest way, by the human using the parameters of the algorithms according to their
personal experience and intuition. This puts me at odds with the majority of the community of
reviewers who tend to reject papers on the grounds that the parameters have been chosen ad hoc
with no proper explanation: these are the cases of the teacher transplanting to the learner their
painstakingly acquired knowledge. The alternative is for the learner each time to acquire this
knowledge painfully slowly from thousands of examples.
I have also argued that we do not need globally consistent labellings of scenes. Global
consistency will never allow us to label correctly the scene painted by Magritte of a train storming
out of a fire place, because trains do not come out from fire places! It will never allow the
computer to recognise green horses with 5 legs, but we, humans, do. So, what we need is
fragments of reality and knowledge.
In computer vision, the idea of abandoning globally consistent solutions has now matured. This
is not in isolation from other sciences. Strategy analysts talk about “fragments of learnt actions”,
and even mathematicians have long ago abandoned the idea of a globally self-consistent
mathematical science: Bertrand Russell had to abandon the idea of globally consistent
mathematics based on a small number of axioms, when G¨odel’s proof was published [15].
Natural systems are not globally consistent: they oscillate between states, and we, humans,
manage to survive through this constantly dynamic, globally inconsistent and ambiguous world.
A robotic system must be able to do the same and perhaps the only way to succeed in doing that is
to be constructed so that it is content with a collection of fragments of understanding.
Acknowledgements. This work was supported by EU grant 027113.
References
1. Cortes, C., Vapnik, V.N.: Support-Vector Networks. Machine Learning Journal 20,
273–297 (1995)
2. Christmas, W.J., Kittler, J., Petrou, M.: Structural matching in Computer Vision using
Probabilistic Relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence 17,
749–764 (1995)
3. Devijver, P.A., Kittler, J.: On the edited nearest neighbour rule. In: Proc. 5th Int. Conf. on
Pattern Recognition, pp. 72–80 (1980)
4. Guzman, A.: Computer Recognition of three- dimensional objects in a visual scene. Tech. Rep.
MAC-TR-59, AI Laboratory, MIT (1968)
5. Heesch, D., Petrou, M.: Non-Gibbsian Markov Random Fields for object recognition. The
British Machine Vision Conference (submitted, 2007)
6. Huffman, D.A.: Impossible Objects as Nonsense Sentences. Machine Intelligence 6,295–323
(1971)
7. Hummel, R.A., Zucker, S.W.: One the foundations of relaxation labelling process. IEEE
Transactions PAMI 5, 267–287 (1983)



8. Kindermann, R., Snell, J.L.: Markov Random Fields and their Applications. First book of the
AMS soft-cover series in Contemporary Mathematics, American Mathematical Society (1980) 12
M. Petrou
9. Li, Z.: A neural model of contour integration in the primary visual cortex. Neural Computation
10, 903–940 (1998)
10. Li, Z.: Visual segmentation by contextual influences via intra-cortical interactions in the
primary visual cortex. Networks:Computation in Neural Systems 10, 187–212
11. Li, Z.: Computational design and nonlinear dynamics of a recurrent network model of the
primary visual cortex. Neural Computation 13, 1749–1780 (2001)
12. Lindley, D.V.: Making Decisions. John Wiley, Chichester (1985)
13. Marengoni, M.: Bayesian Networks and Utility Theory for the management of uncertainty and
control of algorithms in vision systems. PhD thesis, University of Massachusetts (2002)
14. Miller, E.G., Matsakis, N.E., Viola, P.A.: Learning from one example through
shared densities on transforms. In: CVPR (2000)
15. Nagel, E., Newman, J.R.: G¨odel’s Proof. Routledge and Kegan Paul (1959) ISBN:

16. Pearl, J.: Probabilistic reasoning in intelligent systems: Networks of plausible inference.
Morgan Kaufmann Publishers Inc., San Francisco (1988)
17. Petrou, M., Tabacchi, M., Piroddi, R.: Networks of ideas and concepts. IEEE Transactions of
Man Machine and Cybernetics (submitted, 2007)
18. Petrou, M., Garcia Sevilla, P.: Image Processing, Dealing with Texture. Wiley, Chichester
(2006)
19. Schlesinger, B.D., Hlavac, V.: Ten lectures on Statistical and Structural Pattern Recognition,
ch. 10. Kluwer Academic Publishers, Dordrecht, The Netherlands (2002)
20. Stassopoulou, A., Petrou, M.: Obtaining the correspondence between Bayesian and Neural
Networks. International Journal of Pattern Recognition and Artificial Intelligence 12, 901–920
(1998)
21. Stoddart, A.J., Petrou, M., Kittler, J.: On the foundations of Probabilistic Relaxation with
product support. Journal of Mathematical Imaging and Vision 9, 29–48 (1998)
22. Tenenbaum, J.B., Griffiths, T.L., Kemp, C.: Theory-based Bayesian models of inductive
learning and reasoning. Trends in Cognitive Sciences 10, 309–318 (2006)
23. Tong, S., Koller, D.: Support Vector Machine active learning with applications to text
classification. Journal of Machine Learning Research 2, 45–66 (2001)
24. Walker, T.C., Miller, R.K.: Expert Systems Handbook: An Assessment of Technology and
Applications. Prentice-Hall, Englewood Cliffs (1990)
25. Waltz, D.: Understanding line drawings of scenes with shadows. In: Winstone, P.
(ed.) The Psychology of Computer Vision, pp. 19–91. McGraw-Hill, New York (1975),
http:

cfs305
26. Winston, P.H.: Learning structural descriptions from examples. The psychology of computer
vision, 157–209 (1975)
27. Zadeh, L.H.: A fuzzy algorithmic approach to the definition of complex or impreciseconcepts.
Int. J. Man-Machine Studies 8, 249–291 (1976)
28. http:rning

广州铁路职业技术学院自主招生-山东历年一本线


安徽导游考试网-关于科学家的故事


关于禁毒的作文-个人简历教育经历


量贩式ktv是什么意思-西班牙住宿


常州市教育局-北京市委


圆周率的故事-2020日历


河北青年干部管理学院-太仓健雄职业技术学院


个人空白简历表格-诚实守信演讲稿