电气与自动化 外文翻译 外文文献 英文文献 中英对照 计算机视觉中的学习_几点想法
孟子全文-南宁人事考试
计算机视觉中的学习:几点想法
Maria Petrou
Communications and Signal Processing Group,
Electrical and Electronic Engineering
Department,
Imperial College,
London SW7
2AZ, UK
摘要:归纳的能力是学习的最重要的品质,并且归纳总结可以通过模型识别系统了解<
br>知识的规律而不是它的表面,一个叫知识库的东西已经被提上日程了,通过它所有的知识
都能过被
组织起来,一个通过使用这个知识库和效用理论方面的知识的翻译场景系统也被
提上日程了,同时有人认
为,全部一致的标签解决方案是一个人造认知体系既不可能,也
不可取。
1 引言 通常“学习”这个词在现代社会有很多种解释,特别是从计算机语言来看。它已经被
用来解释很多事
物,从传输数据到怎样去识别视觉结构,这些领域最有价值参数的定义都
将可以得到解释。所以,我们应
该试着去理解这个世界的现象,就像它出现在认知领域相
关数学方程式中的问题一样。在下面的部分,我
们首先要从电脑视角去了解学习的意义,
同时去探究这个世界的两种定义之间的可能联系。
2
学习的主要特征
和认知任务有关的第一算法的部分内容已经被提上日程,它们的主要内容主要是有关<
br>的神经网络的。神经网络的支持者们经常评论他们所开发的神经网络的归纳能力。归纳是
学习的主
要特点之一。事实上,我们在教我们的孩子时常常用童话故事和比喻,通常假定
他们有能力把它们推广到
实际情况中去。为了能确保我们能够教会我们的孩子,我们时常
希望把生活中可能遇到的情况都交给他们
,我想这种情形是十分可笑的。因此我们可以肯
定推断归纳总结的能力是学习中最重要的能力。
L. Rueda, D. Mery, and J. Kittler (Eds.):
CIARP 2007, LNCS 4756, pp. 1–12,
2007.c
Springer-Verlag Berlin Heidelberg 2007
这意味着这种利用训练样本区间跨越特征空间的经典模型识别方法并不是这个
世界认
知意识的真正的学习方法。为了能够更好的推广,这种建立在规律上的神经网络通常在很
大程度上依赖于有足够的训练样本才能填充足够的特征空间。神经网络的这种训练方法通
常被用来去靠近
与分段线性部分特征空间中的类边界。当一个未知的模式到来时,它能够
和已经确定的填充空间联系起来
,从而能够把模型出现的地方填充起来。一些旧的和一些
最近刚开发的方法是通过选择在类边界中其主要
作用的模型实现的,而不是使用高一级的
归纳能力,这些方法都是一些简单可行的方法。所以,神经网络
和模式识别方法并不是这个
世界认知意识上的学习方法。
问题在于:是这种被认知科学家和心
理学家理解意义上的学习就是算法,还是某种超越
目前掌握的数学形式?反对学习就是算法的证据是人类
的学习能力来自于一些简单的例子。
同样相反的观点是人类花大量的时间去学习,我们所见证的是一个简
单的学习是这些新方
法的应用的表现,比如一些通用的规则, 元知识
一直慢慢刻画在潜意识
里了。我要澄清一下,“学习”在这里不是指取得一个大学学
位。这的确是一个很漫长的过程,这是需要
大约三分之一的寿命的一个人。学习这里喻指生
存技能,操作技能在日常生活中。由于人类成长期在大约
12 - 15岁之间,我们可以假定自
然认为人类儿童已经准备好怎样面对生活在那个阶段。所以,我
们可以说,人类作为独立的
实体生命“学习”他们要学习的东西要经过12 - 15年的时间,这将是
他们生命中很长的
一段时间。所以,人类似乎是一个很慢的学习者。他们花很多时间去了解元知识的规则
。正
是这些被人们从数学算法的角度掌握的规则使人们能够很快的掌握知识,这就是我们经常
的
见到的例子。我们也许会从以上的观点得出以下结论:
- 归纳总结的一个重要特征是学习;
-
在算法学习中,概括必须得到足够的训练,从而能够填充整个类空间部分,或者是类
边界的部分地方;
- 我们都有真实的归纳命题能力,只有当由训练的例子学会知识变成如何间接地提取
物体的身
份而不是对象的类。如果这样的学习发生,完全未知的东西可能会得到正确的解释,
即使在没有先前的例
子的情况下。这个结论意味着我们必须教导计算机怎样去做,为的是建
造一个认知系统,这是关系,而不
是事实。例如,背字典的一种语言,没有教一个人的语言。
这个人必须去学习单词之间的关系,以掌握这
门语言。这是同意温斯顿的创新实验教计算机
认识拱门,但不给计算机所有它
可能遇到拱门。他把这个例子和柜台拱的例子教给它,教它
识别组件之间的关系,如“支持”和“被支持
”之间的关系。
3 知识和元知识
至少有50个学习方面的理论的已经被认知科学家写
出来了。在计算机科学中,我们往
往会区分两种形式的学习: 试验式的学习和证明式的学习。归纳学习
也被计算机科学家使
用了,但是我们不应该像那样去处理,因为它意味着学习者已经学会了逻辑法则。所
以,归纳
学习可能被认为是已学中的应用元知识规则实验的能力,学习意味着尝试和给出结果,并从中得出结论。它最简单的形式就像全自动的过程一样。全自动分割算法,例如,将工作如下:
进行分
割,评估结果质量,调整参数,然后再试一次。直接的结论是:实验式的学习需要反馈
回路。还要求一个
可用的标准进行系统工作怎样的一个判断依据。这个标准的成为独立知
识或信息用于进行信息分段,是非
常重要的,否则结果将是一个预言式的幻想而已。在半自
动系统,则是由人来决定。报告的结果则包括文
学的声明类型:“最好的阈值是通过尝试与
错误发现的。“这种方法是一种非常合理的学习方式。就像一
个受到监督式学习的雨伞,它
对于帮助人类学习来说就像一个教师。飞往完全自动化的方法是相当于假设
计算机可能通
过学习元知识阶段,需要构建规则,现在学习收益很快,利用这些元规则。例如,如果我们
知
道农业等行业倾向于有直的边界,可以分辨部分输出捕捉到的图像的好与坏,它是通过空气
传
播的传感器捕捉到的,根据它是否产生了地区边界直。那个领域的知识是一个元知识是
直的边界。部分可
能操作只使用纹理和边缘信息。边界推断何以直,可用于作为标准来驱动
系统使用其反馈回路,以制定出
一个更好的分割。问题于是变为:系统如何获得这元知识?早
些时候,那一定是元知识必须要由人类的孩
子(或计算机学习者)痛苦地缓慢掌握通过大量
观察农业领域的例子。虽然没有发现可以将这些元知识移
植到人类孩子的大脑中的有效方
法。从老师的大脑中,计算机的优势在这里:更像一个老师,就是我们人
类,我们可插入把元
知识进入系统当正在进行的标准算法的自我评估。从这个的论点,我们得出结论:
- 可能在出现时不仅有未注册的关系,而且特点,类对象都有。
-
在交互系统中,元知识被人类老师人为地插入到计算机学习者大脑中。
- 在自动化系统,
元知识被用来对计算机的学习者提供人类的老师绩效考核的标准的
形式。
出现了两个问题是:
- 什么把知识和元知识连接起来?
-
开始的时候如何学习元知识?
4 论证式学习
为了回答上述问题,我们得到了一个从我
们前面提到的第二种类型学习的机会,即通
过示范学习线索。这里的示范者是老师。接下来是一个故事,
我从我奶奶听到。请记住,
儿童的传统的教学方式一直被通过故事和比喻。这个故事提供的线索,是我们
正在寻找的
内容。“很久以前有一个陶工,他得到一个学徒并且可以让他有学习陶艺技术的机会。陶工把他的陶壶放在烤箱里。两个小时后,他把火灭了坐下来休息,抽起烟来,就像他是一
个老人一样
。然后,他把盆拿出烤箱。它们看上去真是太完美了。学徒后来决定做自己的
盆。他让他们拿出粘土,放
在烤箱里。两个小时之后,他也把他们拿出来。但是这些壶都
坏了。他重复同样的工作但却得到同样的结
果。他找到那个陶工并问他说:“你没真正的
教我。这样事情不可能发生如果你真的教我的话。”“难道
你停止吸烟后,您关掉火了?”
“不,我不吸烟。”“那是因为你拿出了烤箱盆出得太早了。”
我相信这个故事是有关我为了教我们要注意细节而讲的。事实上,如果陶工的徒弟看
了陶工来回几十次
表演的时候注意一些轻微变化,而且始终与前稍作停,他可能学到做陶
盘至关重要的过程。另一方面,如
果老师给出了明确的信息的话,那么这个老师可能就是
一个更好的老师了。所以,这个故事告诉我们,我
们学习新知识是从很少的例子当中获得
的,如果想快速获得的话只有当有人向我们解释他么不是怎样做到
的才行。一个孩子一直
问“为什么”,这才是一个孩子如何才能学习好的办法。这告诉我们“我们不能撇
清学习
认识到每个对象的学习,为什么它是这样的”。有人可能会想到下面的老师和学生之间的
交流和学习:
“这是什么?”
“这是一个窗口。”
“为什么?”
“因为它让光进来,并允许人们看到了。”
“那怎样做呢?”
“通过具有一个宽阔的视野。”
“这是是真的吗?”
这种学习顺序如图1所示。这
个数字建议在我们的大脑,知识是由一个网络,形成一系列
复杂的结构,我称之为‘知识之塔’。名词的网络是一个对象名称,标签网络,例如:
‘窗
口’,‘烟囱’,‘门’等动词或行动网络,是网络功能,例如:
‘看出来’,‘进入’,
‘退出’等表象网络是一个功能所需的基本形状网络实现,如: “这是人类以
自己的视野
对外认识的水平。”因此,知识的洪流是上面那段谈话内容一样。当系统循环结束时,我们可以证明,我们正在研究的对象其功能用途的特点可以实现这项任务,因此,对于人工
视觉科学家
来说,他们的任务是去建立网络及其相互连接层之间的模型。我们有我们所掌
握的各种工具:马尔可夫随
机域,推理规则,贝叶斯网络,模糊推理等我从一开始就排除
任何确定性的办法,或者是因为事情是真正
随机的性质(或至少有一个显着的随机成分),
或者当为制定清晰的决策规则和决定时因为我们的模式和
我们的知识是太粗糙和不完美而
不能胜任。
5 马尔可夫随机场
最近的一些工作发现的证据表明,网络名词(最贴切的说法是“思想”,如果从网络
心理加以描述)是一
个随机网络拓扑,而由一些想法组成的关系网络,它们是拓扑无尺度
的。例如,像“叉刀”,“门窗口”
比“门”或“窗口”出现地更加频繁。这表明,这些
网络相互之间的连接总是不断变化的,而且实际上并
不总是对称的。例如,想法“门”的
可能引发的“窗口”的想法比“窗口”引发的想法“门”的想法更加
频繁。这种不对称的
相互作用表明马尔可夫随机场(马尔科夫场)不适用于它们在图像处理应用上的一贯
的形
式。一个在网格上定义的马尔可夫随机场邻里之间的相互作用的例子如下图2b所示。马尔
可夫随机场以及它提供给邻近的相互作用的重力,不能由吉布斯联合概率密度函数来表示。
例如,在中心
的细胞被顶部左邻重力-1影响,而他自己,作为顶部左下角的单元格的右边
的邻居,通过重力-1来影
响它。当一个试图释放这样一个随机场时,这种不对称会导致系
统的不稳定,因为创建的本地模式不是全
部一致的(不是全可由吉布斯分布可表达出来的)。
这种马尔可夫随机场的一些发散是不收敛的,而是在
一些可能的状态之间摇摆。 (最佳化
的吉布斯分布偶尔会收敛到正确的解释,但更多的往往不是这样,
他们幻想他们得到了错
误解释。)
因此,通过使用一个非吉卜赛中期马尔科夫场,在每一个知
识水平的人们都可以来建
立网络模型,如图表1所示。层与层之间的相互依赖性,也可能通过这样的网络
来建立模
型,但也许使用贝叶斯模型比较适合,因为层与层之间的相关性是因果关系或诊断关系,
而不是点对点对等关系。然而问题是:“我们去哪儿得到这些知识来构建网络?”母
亲教
她的孩子从哪里得来?孩子的母亲没有‘最简单的真理’或普遍的知识可以教给她的孩子:
她明白一些东西,然后才告诉他的孩子,然后她想起别的东西,根据她自己的想法,这些
想法是相互交织
的并且由她自己的感觉来组织的,她会再次给孩子讲讲。因此,所有的母
亲(老师)的作用是传授给孩子
她自己关于这些想法和概念之间的联系。如果母亲告诉孩
子:“这是一支铅笔,那是一个橡胶。铅笔帮助
我们而橡皮帮助我们抹去我们写的东西。”
孩子会在自己的大脑中会有像他的母亲同样的反应。铅笔橡皮
现象将在孩子的大脑中产生
一个关于名词的网络很强的记忆,就像在孩子的动词网络。所以,有一件事情
我们可以做
的是建立属于我们自己关于这些想法和功能之间联系的模型。然后让孩子或计算机提出正确的问题。对于每一个答案,相应的连接强度会提高。我们可能会把这些优势变成概率。
然后,一个
全新的场景可能会显示到计算机上。该儿童或计算机必须能够使用他已经学会
的这些连接去解释这种新的
景象。
实际上,这是通过使用手动标注的图像实现的。 数学家Heesch和彼得鲁做这些是去解释建筑物室外的场景:他们使用了数百张地面的图像学习区域配置的马氏相依理论,确
定了居委会
是一个地区被分成六个地区,这些地区履行下列几何约束中的一个:它在上面,
下面,向左边,向右边,
它被装载,或把这个区域考虑在内。一个未知的场面被贴上标签,
通过使用一个初级的,建立在个人工具
基础上的一个标签来完成,这个个人工具由各个区
域和组成,以及放宽对分割区域定义的马尔科夫场,用
图的着色与绘图按各当地条件概率
地区的标签标记,这个标签是建立在当前周围标签之上的。全部一致是
不存在的,当标签
之间的相互依存是不对称的时候。我们可以直观地理解为在室外环境中物体之间的相互
作
用太弱而不可能对一个区域的影响产生显著效果。例如,如果属于该房子的该区域是
一个门, 这一区域就在图1的另一端。双头箭头代表上下文交互
。连续的细箭头表示
查询。虚线箭头代表的答案,即信息的传递。在认知视觉工作中兴趣的水平就是名词
的水
平,在这里我们要把标签分配到每个对象。与名词网络上下文内涵有关的例子有“门”,
“
窗口”,“阳台”。与语境网络的功能内涵节点有关的例子是“让空气进来”,“让光
进来”,“允许一
人进入”。与描述网络上下文内涵节点有关的例子是“有一个玻璃面板”,
“在水平面”,“有一个把手
来打开它”。视野可能是汽车,灌木,一个窗口,房子,或
一棵树。从可能性的角度来看这些选择的差别
是很小的:标签之间的相关函数,指在一定
距离之外的物体,变得平坦而且不能提供有用的信息。所以,
没有一个模型是相关的或有
用的。
6 贝叶斯推理
贝叶斯方法
迄今为止已被用在两个方面:要么是在概率松弛的形式(PR)方面或在推理珍
珠贝氏网络的形式。概率
松弛起源于华尔兹写的关于约束传播著作,他用简洁的制约条件
和一劳永逸地
方法解决所有的不符和条件的标号,这些标号导致其他不可能的问题。通过
考虑每一个对象的环境信息,
概率松弛不断更新单个物体的各种标签的概率。由于该上下
文信息是有效对等的,概率松弛对于建立因果
关系的方面的模型不是一个好的工具。对于
马科夫场来说它是一个替代的工具,在上一节讨论了在同一层
建模的影响。概率松弛,像
马尔科夫松弛一样,是不能保证得到一个独特的全方位的解决方案,除非有特
殊条件。我
们曾讨论过,虽然这是一个不现实的问题,那就是:场景标号不必是全部一致的,但在本地要一致。这项声明似乎与以前的声明是矛盾德,概率松弛是华尔兹的算法,这个算法可
以在60秒
之内解决标号不一致的问题。这一矛盾,然而,只是表面的。在60年代标号不
符的问题,是指单固体物
体的标号,通过在子部分上贴上标签,而不是包含许多不同对象
的场景,在这些对象之间的约束是远远弱
于其他相同的固态物体内的子模块之间的限制。
贝叶斯方法的第二种形式是推理珍珠贝氏网
络。这里的关系可能是因果关系,因此这
些网络对中间层推论来说是合适的。贝叶斯方法依赖于条件概率
。
如何为这种方法选择一个条件概率一直是一个问题。条件概率可能要通过数百个例子
才能慢
慢地被我们掌握。 Stassopoulou等人通过绘制类边界的图来解决了条件概率的学习
问题,
这个类边界通过从这个网络到另一个网络来表达,通常这种网络被用来作为学习条
件概率的一种方法。另
外,有的条件概率可能会从其他已经受过训练的网络转移过来:老
师的网络。这种转移相当于让他们具备
一些来自其他参数的一些参数(如高斯)。形式和
参数的任意选择通常导致对临时或不合理的做法的批评
。现在情况是,老师只是将自己的
获得的知识的转移到学生(或电脑)身上。这种做法使我们获得了新的
理论,例如像所谓
的“效用理论”。
效用理论是一种决策理论。在图像对象中向物体
分配标签就是一个决定。在贝叶斯框
架内,在
Bayesian
的结构理论中,我们做
出这个决定是通过最大限度地把我们所获得信息
全部输入到这个标签中。在效用理论中,这种可能性函数
被叫做“效用函数”,这表示主
观偏好或每个标签可能带来的后果。效用函数乘以每个标签的贝叶斯概率
再加上上面所有
可能性,这就导致只有一个可以传递到最后的标签。所以,这种方法避免了马尔科夫场和
PR的迭代。效用函数可用来鉴定一些人对这个世界已经获得的元知识。就是这样的知识,
可以
通过演算和从很多例子中学到,但现在通过我们现有的工具基础,它们是不能通过条
件和偏见这些形式来
表达的。就是这样的知识告诉我们,当我们想要从一个经常推迟约定
的人手中购买一辆车时要慎重,那样
就不会立即产生维修记录。没有人使用“实用功能”
这个术语的这种想法已经有一段时间了。例如,在九十年代中期,心理学家都在谈论所谓
的P
–map和m –map。P –map主要是关于在我们生活中我们可能遇到情况的一些知识。
一个P
-map可以或多或少的帮助我们抽样调查一个场景,无论这些地方重不重要,我们做
一个m-
map都是很特别的。人们可能会把p-map看作是今天的实用功能而m-
map主要是建
立在我们已有的工具之上的,这里的m-map是指标签的贝叶斯部分。
在计算机视觉方面,实用的理论已被Marengoni 用于选择本应被应用于空间图像方面
的特色功能。此外,人们可以把米勒等人的工作当作是用一个可以给那些不寻常的转换设
置障碍的实用功
能,这些转换必须和电脑认为的转换一致。在假定转换和标签测量都是独
立的情况下,通过最大限度的寻
找所观察到的形式的概率密度函数和标签以及所观察到事
物的密度方程之间的可能性,作者有效地提取了
一些样本。
7 建立“为什么”和“怎样”模型来回答“是什么”
让我们来
考虑一下在图1中提到的知识塔。为了在以后的情景中能够识别物体我们首
先要给学习的问题建立一个方
程,在这个场景中可以使用分层表述形式来说明知识和效用
理论。让我们假定我们可以最大可能的把标签
分到每一个场景中,在通常情况下标签
lj
以
及它的可能性
pij
将
被分配给
oi
,表达式如下:
pij = p(lj
|mi)p(mi) = p(mi|lj)p(lj) (1)
这里mi代
表我们在oi基础上所制造的所有工具,p(mi)和p(lj)分别是测量和标签之间的先验概
率。概
率松弛理论将根据来自邻近地区的上下文收到的资料来更新这些概率。在这里我们
可以不遵守那些规则。
相反,我们将使用来从其他层知识的信息来平衡这个公式。我们设
定图一中的在“动词”水平线上的单位
用fk来表示,而在描述水平线上的单元用dl表示,而
后我们lji把给oi,表达式如下:
j
i
argmax
ujk
v
kl
c
il
p
ij
(
2
)
j
kl
其中
ujk
表示对带标签
lj
的物体的重要程度,
lj用来满足
fk
vkl
表示特征量
dl
对于一个物
体有满足
fk
可能性的重要性,
cik
是指
dl
适
用于
oi
。
假定效用函数的值表示我们手中的证据,区域
oi
可以
满足作为区域
lj
的角色的特征,
例如,分给
oi
的标签是一个“阳
台”,效用函数必须能够表示出是否这个阳台有足够大空间
以至于可以站下一个人,是否靠近墙,是否有
个门面对它。所有这些因素在一起就容许一
个物体来扮演阳台的角色。一个学习计划要能够学到
ujk
和
vkl
的值,要么是通过一些例子
(这个过程表较缓慢的并且有点痛
苦),要么选择相信一个已经亲身经历过这个过程而且
了解那些值的老师,这
样就可以把他们直接输入电脑处理器中。电脑处理器必须有一个输
入处理系统的装置来计算出
c
il
的值。
图2(a)中,在边线上相邻的点和马尔科夫参数
都是一致的,如果这个区域被释放,
将会出现和上下不一样的彼此相似的标签水平带,在图像处理中它会
导致水平方向的纹理
图案。图(b), 在边线上相邻的点和马尔科夫参数是不一致的:左边的和中间的
是不一样
的。中间的看作是顶部左面的右底角邻居,它们是相似的。
8
总结
我认为学习是一种归纳总结能力的体现,如果要学的东西并不是可以看到的物体的表
面
而是被分配标签的内部规律,那么学习就可以学好。我也认为元知识可能被老师输送给
他的老师,方式可
以是一些规则,或是最简单的方法,这些都是人类通过自己的经验和体
会应用算术参数完成的。这使我和
社会上大多数审稿人一样都很排斥论文,理由是这些参
数已经被选过了:这就是老师把他们潜心学到的知
识传授给他的学生的例子。另一种方法
是学习者从大量的示例中慢慢地琢磨然后才学到他们要学的知识。
我同时认为我们不需要全部一致的场景标签。通过玛格丽特画的场景,我们可以得出
全部一致的
标签从来不会让我们建立一个合适的关于一辆火车从大火中冲出来的场景,这
是因为火车不是来自于有火
灾的地方。我永远不会让电脑来识别一个五条腿的马,但是我
们人可以。因此我们需要的是现实和知识的
碎片。
从电脑的视角,抛弃全部一致的方案的想法现在已经成熟了。这并不和其他科学相左。
战略分析家们会谈论‘学习行动的结构’,甚至是数学家们很早以前就应经抛弃了关于全
部一致的数学科
学的想法。当奥尔德的证明出版的时候,数学家罗素不得不抛弃建立在一
部分公理上全部一致算法的理念
。自然地系统都不是全部一致的:他们在一些状态之间摇
摆,而我们必须战胜这个动态的全部不一致的而
且含糊不清的世界才能生存。一个机器人
系统也必须能做到这些,或许那样做成功的方法是去自己构造,
这样才能和一些有限的理
解力相符。
Learning in Computer Vision: Some Thoughts
Maria Petrou
Communications and Signal
Processing Group,
Electrical and Electronic
Engineering Department,
Imperial College,
London SW7 2AZ, UK
Abstract. It is argued
that the ability to generalise is the most
important characteristic of learning
and that
generalisation may be achieved only if pattern
recognition systems learn the rules of
meta-
knowledge rather than the labels of objects. A
structure, called “tower of knowledge”,
according to which knowledge may be organised,
is proposed. A scheme of interpreting scenes
using the tower of knowledge and aspects of
utility theory is also proposed. Finally, it is
argued
that globally consistent solutions of
labellings are neither possible, nor desirable for
an artificial
cognitive system.
1
Introduction
The word “learning” has many
interpretations among the pattern recognition
community in
general, and the computer vision
community in particular. It has been used to
loosely mean
anything between the
identification of the best value of a parameter
from training data, to learning
how to
recognise visual structures. So, perhaps we should
try to distinguish the interpretation of
the
word as it appears in the mathematical formulation
of problems, from its interpretation as it
appears in cognition related tasks. In the
sections that follow, we shall try to understand
the
significance of learning in the context of
computer vision and identify a possible
relationship
between these two interpretations
of the word.
2 The Main
Characteristic of Learning
Some of the
first learning algorithms in relation to cognitive
tasks, that have been proposed in the
past,
are the various types of neural network.
Proponents of neural networks often comment on
the generalisation capabilities of the
networks they develop. Generalisation is one of
the
characteristics of learning. Indeed, we,
humans, teach our children often with fairy tales
and
parables, assuming that they have the
ability to generalise to real situations. It is
preposterous to
expect that we shall have to
teach our children about all individual possible
situations they may
encounter in life, in
order to make sure that we have taught them well.
We may safely conclude,
therefore, that the
ability to generalise is the most important
characteristic of learning.
L. Rueda, D. Mery,
and J. Kittler (Eds.): CIARP 2007, LNCS 4756, pp.
1–12, 2007._c
Springer-Verlag Berlin
Heidelberg 2007
This implies that
classical pattern recognition methods, that use
training examples to span the
feature space,
are not really learning methods in the cognitive
sense of the word. Even neural
network based
methods, in order to generalise well, rely heavily
on the availability of enough
training samples
to populate adequately the feature space. The
training patterns are used by the
neural
networks to approximate the class boundaries in
the feature space with piece-wise linear
segments. When an unknown pattern arrives, it
can be associated with the class that has been
identified to populate the part of the feature
space where the pattern appears. Some old [3] and
some more recently developed methods [1], that
can work with fewer training patterns than
straightforward methods, do so by selecting
the patterns that matter most in defining the
class
boundaries, rather than by using some
higher level generalisation abilities of the
classifier [23].
So, neural networks and
pattern classification methods are not learning
methods in the cognitive
sense of the word.
The question then that arises is: is learning,
as understood by cognitive scientists and
psychologists, algorithmic, or is it something
beyond the current grasp of mathematical formalism
Evidence against learning being algorithmic is
the ability of humans to learn even from single
examples. A counter-argument is that humans
take a lot of time to learn, and it is possible
that
what we witness as super-fast learning is
simply a manifestation of the application of some
meta-knowledge, some generic rules that have
been slowly and painfully learnt subconsciously. I
would like to clarify that “learning” here
does not refer to getting a University degree.
This is
indeed a very long process and it
takes almost one third of the life span of a
person. Learning here
refers to survival
skills, to skills needed to operate in everyday
life. Given that a human becomes
reproductive
roughly between the ages of 12–15, we may assume
that nature considers the human
child becoming
ready for life at that stage. So, we may say that
humans “learn” what they have to
learn, to be
ready for life as independent entities, over a
period of 12–15 years, which is still a
significant fraction of their life span.
Therefore, humans seem to be slow learners after
all. They
take a lot of time to work out the
rules of meta-knowledge. It is these rules, that
have possibly
been learnt in an algorithmic
way, that allow then the human to learn in the
super-fast, almost
magical, way that we often
witness. We may conclude from the above arguments
that:
– generalisation is an important
characteristic of learning;
– generalisation
in algorithmic learning may only be achieved by
having enough training
examples to populate
all parts of the class space, or at least the
parts that form the borders
between classes;
– we have true
generalisation capabilities, only when what is
learnt by training examples are rules
on how
to extract the identity of objects and not the
classes of objects directly. If such learning
has taken place, totally unknown objects may
be interpreted correctly, even in the absence of
any
previously seen examples.
This
conclusion implies that what we have to teach the
computer, in order to construct a
cognitive
system, are relations rather than facts. For
example,memorising the dictionary of a
language, does not teach a person the
language. The person has to learn the relations
between
words in order to master the language.
This is in agreement with Winstone’s pioneering
experiments on teaching the computer to
recognise arches. He did not show to the computer
all
possible types of arch it may encounter.
He showed it examples and counter examples of
arches
and taught it to recognise relations
between components, such as “supports” or “is
supported by”
[26].
3 Knowledge and
Meta-knowledge
There are at least 50
theories of learning developed by cognitive
scientists [28]. In computer
science, we tend
to distinguish two forms of learning: learning by
experimentation and learning
by demonstration.
Inductive learning is also used by computer
scientists, but we shall not deal
with that
here, because it implies that the learner has
already learnt the rules of logic. So,
inductive learning may be thought of as a way
of applying the already learnt rules of
meta-
knowledge [22].
Learning by experimentation
implies the ability to try, reason about the
results and draw
conclusions. In its simplest
form, this is what fully automatic programs do. A
fully automatic
segmentation algorithm, for
example, will work as follows: perform
segmentation, assess the
quality of the
result, adjust the parameters and try again. The
immediate conclusion is that
learning by
experimentation requires the presence of a feed-
back loop. It also requires the
availability
of a criterion that says how well the system has
performed each time. It is important
for this
criterion to be independent of the knowledge or
information used to perform the
segmentation
in the first place, otherwise the result will be a
self-fulfilling prophecy. In
semi-automatic
systems, the criterion is provided by the human.
The reported results in the
literature then
include a statement of the type: “the best
threshold was found by trial and error.”
This
method is a perfectly legitimate method of
learning. It comes under the umbrella of
supervised
learning and it corresponds to
human learning with the help of a teacher.
Proceeding to fully automated methods is
equivalent to assuming that the computer has
somehow passed that stage of learning the
meta-knowledge, necessary to construct rules, and
now
learning proceeds very fast, making use of
these meta-rules. For example, if we know that
agricultural fields tend to have straight
borders, we may judge the output of a segmenter of
an
image, captured by an airborne sensor, as
good or bad, according to whether it produced
regions
with straight borders or not. The
knowledge that fields have straight boundaries is
a
meta-knowledge. The segmenter might have
operated using only texture and edge information.
The straightness of the inferred borders may
be used as a criterion to drive the system to use
its
feed-back loop to work out a better
segmentation. The question then is: how did the
system
acquire this meta-
knowledge? As argued earlier, it must be the meta-
knowledge that had to be
learnt by the human
child (or the computer learner) painfully slowly
by seeing lots of examples of
agricultural
fields. And although no method has been found yet
to transplant this meta-knowledge
to the brain
of the human child from the brain of the teacher,
computers have an advantage here:
the teacher,
i.e. the human, may insert the meta-knowledge into
the system while developing the
criterion of
self assessment of the algorithm. From this line
of argument, we conclude that:
– meta-
knowledge may take the form not only of relations,
but also of generic characteristics
that
categories of objects have;
– in interactive
systems, meta-knowledge is inserted into the
computer learner by the human
teacher
manually;
– in automatic systems, meta-
knowledge is supplied to the computer learner by
the human
teacher in the form of a criterion
of performance assessment.
Two questions then
arise:
– what connects the knowledge with the
meta-knowledge?
– how is meta-knowledge learnt
in the first place?
4 Learning by
Demonstration
To answer the above
questions, we get a clue from the second type of
learning we mentioned
earlier, namely learning
by demonstration. The demonstrator here is the
teacher. The next is a
story I heard from my
grandmother. Remember that the traditional way of
teaching children has
always been through
stories and parables. This story offers the clue
we are searching for.
‘Once upon a time there
was a potter who got an apprentice who wanted to
learn the art of pottery.
The potter made his
clay pots and put them in the oven. After two
hours, he turned the fire off,
and sat down to
rest and smoke, as he was an old man. Then he took
the pots out of the oven.
They were perfect.
The apprentice later decided to do his own pots.
He made them out of clay and
put them in the
oven. After two hours, he took them out. The pots
broke. He repeated the task and
he had the
same results. He went back to the potter: “You did
not teach me well. Such and such
happened.”
“Did you stop to smoke after you switched off the
fire?” “No, I am not a smoker.”
“So, you got
the pots out of the oven too soon.”’
I am
sure the story was related to me in order to teach
me to pay attention to the detail. Indeed,
if
the apprentice had seen the potter performing the
act dozens of times with slight variation each
time, but always with the pause before the
pots were taken out of the oven, he might have
worked
out that that pose was crucial to the
process. On the other hand, the teacher might have
been a
better teacher if he had made that
information explicit.
So, this story tells
us that we learn fast, from very few examples,
only when somebody explains
to us why things
are done the way they are done. A child asks lots
of “why”s and that is how a
child learns. This
tells me that we cannot disassociate learning to
recognise objects from learning
why each
object is the way it is. One may consider the
following exchange between a teacher and
a
learner:
“What is this?”
“This is a
window.”
“Why?”
“Because it lets the light
in and allows the people to look out.”
“How?”
“By having an opening at eye
level.”
“Does it really?”
This sequence
of learning is shown in Fig. 1. This figure
proposes that knowledge in our brain
is
represented by a series of networks, forming a
complex structure that I call the “tower of
knowledge”. The network of nouns is a network
of object names, labels, e.g. “window”,
“chimney”, “door”, etc. The network of verbs
or actions, is a network of functionalities, e.g.
“to
look out”, “to enter”, “to exit”, etc. The
network of appearances is a network of basic
shapes
necessary for a functionality to be
fulfilled, e.g. “it is an opening of human size at
floor level”. So,
the flow of knowledge goes
like the fragment of conversation given above. The
loop closes when
we confirm that the object we
are looking at has the right characteristics for
its functional purpose
to be fulfilled.
The task, therefore, for the artificial vision
scientist, is to model these layers of networks
and
their inter-connections. We have various
tools at our disposal: Markov Random Fields [8],
grammars [19], inference rules [24], Bayesian
networks [16], Fuzzy inference [27], etc. I would
exclude from the beginning any deterministic
crisp approaches, either because things are
genuinely random in nature (or at least have a
significant random component), or because our
models and our knowledge is far too gross and
imperfect for creating crisp rules and dogmatic
decisions.
5 Markov Random Fields
Some recent work [17] showed evidence
that the network of nouns (better described in
psychophysical terms as network of “ideas”) is
topologically a random network, while the
network of relations, made up from pairs of
ideas, is topologically scale-free. For example,
pairs
like “fork-knife”, “door-window” come up
much more frequently in trains of thought than
“door”
alone, or “window” alone. This
indicates that the connections in these networks
are of varied
strength, and actually are not
always symmetric. For example, the idea “door” may
trigger the
idea “window” more frequently than
the idea “window” triggers the idea “door”. This
asymmetry
in the interactions is a
manifestation that Markov Random Fields (MRFs) are
not applicable here
in their usual form in
which they are applied in image processing. An
example of the interactions
in a neighbourhood
of an MRF, defined on a grid, is shown in Fig. 2b.
This MRF, and the weights
it gives for
neighbouring interactions, cannot be expressed by
a Gibbs joint probability density
function.
For example, the cell at the centre is influenced
by its top left neighbour with weight −1,
but
itself, being the bottom right neighbor of the
cell at the top left, influences it with weight
+1.
This asymmetry leads to instability when
one tries to relax such a random field, because
local
patterns created are not globally
consistent (and therefore not expressible by
global Gibbs
distributions) [18]. According to
Li [9,10,11], relaxations of such MRFs do not
converge, but
oscillate between several
possible states. (Optimisations of Gibbs
distributions either converge to
the right
interpretation, but more often than not, they
hallucinate, i.e. they settle on wrong
interpretations.)
So, one could model
the network at each level of the tower of
knowledge shown in Fig. 1,
using a non-
Gibbsian MRF [5]. The interdependences between
layers might also be modelled by
such
networks, but perhaps it is more appropriate to
use Bayesian models, as the inter-layer
dependencies are causal or
diagnostic, rather than peer-to-peer.
The
question that arises then is: “where are we going
to get the knowledge to construct these
networks?” Where does the mother that teaches
her child get it from? There is no “ground truth”
or universal knowledge the mother transfers to
her child: she sees something and talks about it
to
the child, then she remembers something
else, according to her own network of related
ideas that
invoke each other and are prompted
by her own sensory input, talks again to the
child, and so on.
So, all the mother (the
teacher) does is to transfer to the child her own
connections between ideas
and concepts. If the
mother tells the child “This is a pencil and that
is a rubber. The pencil helps
us write and the
rubber helps us erase what we wrote.”, the child
will make the same connections
as the mother
had in her own brain. Pencil-rubber will have a
strong mutual recall in the child’s
network of
nouns, as well as write-erase in the child’s
network of verbs. So, one thing we can do
is
to model our own mental connections between ideas
and functionalities. Then let the child (the
computer) ask the right questions. For every
answer, the strength of the corresponding
connection
is increased. We may turn these
strengths into probabilities. Then a totally new
scene may be
shown to the computer. The
childcomputer must be able to use the connections
it has learnt to
interpret this new scene.
In practice, this is done by using
manually annotated images. Heesch and Petrou [5]
did exactly
this to interpret outdoor scenes
of buildings: they used hundreds of annotated
images to learn the
Markov dependencies of
region configurations, defining the neighbourhood
of a region to be the
six regions that fulfil
one of the following geometric constraints: it is
above, below, to the left, to
the right, it is
contained by, or contains the region under
consideration. An unknown scene was
then labelled using a preliminary labelling
performed on the basis of individual measurements
made on each region, and relaxing the MRF
defined on the segmented regions, using graph
colourings and drawing labels for each region
according to the local conditional probability of
labels, conditioned on the current labels of
the neighbours. No global consistency is
guaranteed
that way, but no global consistency
exists, when the interdependencies between labels
are
asymmetric. We may intuitively understand
this, as in an outdoor environment the long range
interactions between objects are probably too
weak to have a significant effect on the identity
of a
region. For example, if this region that
belongs to this house here is a door, that region
that is at
the other end of the Fig. 1. The
tower of knowledge: how knowledge may be
organised. The
double-headed arrows represent
contextual interactions. The thin continuous
arrows represent
queries. The dashed arrows
represent answers, i.e. transfer of information.
The level of interest in
a cognitive vision
task is the level of nouns, where we wish to
assign labels to objects. Examples
of nodes
with contextual connotations in the network of
nouns are “door”, “window”, “balcony”.
Examples of nodes with contextual connotations
in the network of functionality are “lets air in”,
“lets light in”, “allows a person to enter”.
Examples of nodes with contextual connotations in
the
network of descriptions are “has a glass
pane”, “is at eye-level”, “has a handle to open
it”.field of
view may be a car, a bush, a
window, a house, or a tree. The differentiation of
such options in
terms of probabilities must be
very small: the correlation function between
labels, referring to
objects beyond a certain
distance apart, flattens out and provides no
useful information. So, no
global model is
expected to be relevant or useful.
6
Bayesian Inference
Bayesian approaches
have been used so far in two ways: either in the
form of probabilistic
relaxation (PR) [7] or
in the form of Pearl-Bayes networks of inference
[16]. Probabilistic
relaxation has its origins
in the seminal work on constraint propagation
byWaltz [25], who used
crisp constraints and
solved once and for all the problem of globally
inconsistent labellings that
led to impossible
objects [6]. Probabilistic relaxation updates the
probabilities of various labels of
individual
objects by taking into consideration contextual
information [7]. As this contextual
information is in effect peer-to-peer,
probabilistic relaxation is not an appropriate
tool for
modelling causal relationships. It is
rather an alternative tool to MRFs discussed in
the previous
section for modelling influences
at the same layer. Probabilistic relaxation, just
like MRF
relaxation, is not guaranteed to
converge to a unique global solution, unless
special conditions are
obeyed [21]. We
discussed earlier that this is not an issue in
reality: labellings of scenes do not
have to
be globally consistent, but only locally
consistent. This statement seems to be in
contradiction with a previous statement,
saying that probabilistic relaxation is the
generalisation
of Waltz’s algorithm which
solved the problem of inconsistent labellings in
the 60s. This
contradiction, however, is only
superficial. The problem of inconsistent
labellings of the 60s was
referring to the
labellings of single solid objects, by labelling
their sub-parts [4] and not the
labellings of
scenes that contain many different objects, where
constraints between objects are far
weaker
than constraints within the subparts of the same
solid object.
The second form of Bayesian
approach is that of Pearl-Bayes networks of
inference. Here the
relations may be causal,
and so these networks are appropriate for inter-
layer inference. Bayesian
approaches depend on
conditional probabilities. How to choose these
conditional probabilities has
always been a problem for such methods.
Conditional probabilities may have to be learnt
painfully slowly from hundreds of examples.
Stassopoulou et al. [20] solved the problem of
learning the conditional probabilities, by
mapping the class boundaries expressed by such a
network, to a neural network, which was
subsequently trained to learn these conditional
probabilities. Alternatively, conditional
probabilities may be transferred ready from
another
already trained network: the network
of the teacher. This transference is equivalent to
choosing
them to have some parametric form
(e.g. Gaussian) with parameters chosen
“arbitrarily”. The
arbitrary choice of formand
parameters usually leads to the criticism of the
approach being ad-hoc
or unjustified. It is
not, if the teacher simply transfers their own
hard gained knowledge to the
pupil (the
computer). Such an approach leads us to new
theories, like for example the so called
“utility theory” [12].
Utility theory
is a decision theory. Assigning labels to objects
depicted in an image is a
decision. In the
Bayesian framework we make this decision by
maximizing the likelihood of a
label given all
the information we have. In utility theory, this
likelihood has to be ameliorated
with a
function called “utility function”, that expresses
subjective preferences or possible
consequences of each label we may assign. The
utility function multiplied with the Bayesian
probability of each label and summed over all
possibilities leads in one pass only to the final
label.
So, this approach avoids the iterations
used by MRFs and PR. The utility function may be
identified with the innate meta-knowledge
somebody has acquired about the world. It is that
knowledge, that might have been learnt
algorithmically and from many examples, but which
now
is expressed in the form of conditions and
prejudices that cannot be fully justified by the
measurements we make. It is the knowledge that
tells us to be cautious when we want to buy a car
from a man that postponed the appointment we
made several times, that did not produce
immediately the maintenance record of the car
we requested, and so on. Such ideas have been
around for some time, without people using the
term “utility function”. For example,
psychologists in the mid-nineties were talking
about the so called pmaps and m-maps. The
p-maps were meant to be the prior knowledge we
have about various possible patterns that we
may encounter in life. A p-map guides us to
sample a scene more or less carefully at places
where
it matters or it does not matter,
respectively, producing the m-map that is specific
to the present
situation. One may identify
here the p-maps as being the utility functions of
today and the
m-maps the Bayesian part of
labels conditioned on the measurements we have
made1.
In the computer vision context,
utility theory has been used by Marengoni [13] to
select the
features and operators that should
be utilised to label aerial images. Further, one
may interpret the
work of Miller et al. [14]
as using a utility function that penalises the
unusual transformations that
will have to be
adopted to transform what is observed to what the
computer thinks it is. The
authors effectively
choose labels by maximising the joint likelihood
of the probability density
function of the
observed transforms and the probability density
function of the labels and
observations,
assuming that transforms and labelsmeasurements
are independent.
7 Modelling the
“Why” and the “How” in Order to Answer the “What”
Let us consider the tower of knowledge
presented in Fig. 1. We shall formulate
here
the problem of learning to recognise objects in a
scene, using this hierarchical
representation of knowledge and utility theory
.Let us assume that we use maximum
likelihood to assign labels to a scene. In the
conventional
way of doing so, object oi will
be assigned label lj with probability pij, given
by:
pij = p(lj |mi)p(mi) = p(mi|lj)p(lj)
(1)
where mi represents all the measurements
we have made on object oi, and p(mi) and p(lj) are
the
prior probabilities of measurements and
labels, respectively. Probabilistic relaxation
will update
these probabilities according to
the contextual information received from
neighbouring regions.
We do not follow that
route here. Instead, we shall use the information
coming from the other
layers of knowledge to
moderate this formula. Let us identify the units
in the “verbs”
1 The ideas of p-maps and
m-maps first came to my knowledge by Robin Shirley
of the
Psychology Department of Surrey
University, who passed away before he had the
chance to make
them more concrete and publish
them.
level of Fig. 1 by fk, and the units at
the descriptor level of Fig. 1 by dl. Then we may
choose
label lji for object oi as follows:
j
i
argmax
ujk
v
kl
c
il
p
ij
(
2
)
j
kl
where ujk indicates how important
is for an object with label lj to fulfil
functionality fk; vkl
indicates how important
characteristic dl is for an object to have the
possibility to fulfil
functionality fk, and
cik is the confidence we have that descriptor dl
applies to object oi.
Note that the value of
the utility function expresses the evidence we
have that region oi has the
necessary
characteristics to fulfil its role as object lj.
For example, if the label we consider of
assigning to object oi is “balcony”, the
utility function must express whether this object
has
dimensions big enough to allow a human to
stand on it, whether it is attached on a wall, and
whether there is a door leading to it. All
these are conditions that will allow an object to
play the
role of a balcony. A learning scheme
must be able to learn the values of ujk and vkl
either directly
from examples (slowly and
painfully), or by trusting its teacher, who having
learnt those values
himself, slowly and
painfully over many years of human life
experiences, directly inserts them to
the
computer learner. The computer learner then must
have a tool box of processors of sensory
inputs that will allow it to work out the
values of cil.
Fig. 2. (a) A
local neighbourhood at the pixel level with
globally consistent Markov parameters: if
this
field is relaxed it will lead to horizontal strips
of similar labels which will be distinct from the
labels above and below. In image processing it
will lead to a texture pattern with strong
horizontal
directionality. (b) A local
neighbourhood at the pixel level with globally
inconsistent Markov
parameters: the top left
pixel tells the central pixel to be different from
it; the central pixel, seen as
the bottom
right neighbour of the top left pixel, tells it to
be similar to it.
8
Conclusions
I have argued here that
learning is characterised by the ability to
generalise, and that this can only
be achieved
if what is learnt is not the labels of the objects
viewed, but the rules according to
which these
labels are assigned. I have also argued that this
meta-knowledge may be transferred
to the
learner (the computer) directly by the teacher
(the human developer), in the form of rules,
or in the simplest way, by the human using the
parameters of the algorithms according to their
personal experience and intuition. This puts
me at odds with the majority of the community of
reviewers who tend to reject papers on the
grounds that the parameters have been chosen ad
hoc
with no proper explanation: these are the
cases of the teacher transplanting to the learner
their
painstakingly acquired knowledge. The
alternative is for the learner each time to
acquire this
knowledge painfully slowly from
thousands of examples.
I have also argued
that we do not need globally consistent labellings
of scenes. Global
consistency will never allow
us to label correctly the scene painted by
Magritte of a train storming
out of a fire
place, because trains do not come out from fire
places! It will never allow the
computer to
recognise green horses with 5 legs, but we,
humans, do. So, what we need is
fragments of
reality and knowledge.
In computer vision,
the idea of abandoning globally consistent
solutions has now matured. This
is not in
isolation from other sciences. Strategy analysts
talk about “fragments of learnt actions”,
and
even mathematicians have long ago abandoned the
idea of a globally self-consistent
mathematical science: Bertrand Russell had to
abandon the idea of globally consistent
mathematics based on a small number of axioms,
when G¨odel’s proof was published [15].
Natural systems are not globally consistent:
they oscillate between states, and we, humans,
manage to survive through this constantly
dynamic, globally inconsistent and ambiguous
world.
A robotic system must be able to do the
same and perhaps the only way to succeed in doing
that is
to be constructed so that it is
content with a collection of fragments of
understanding.
Acknowledgements. This work was
supported by EU grant 027113.
References
1. Cortes, C., Vapnik, V.N.: Support-Vector
Networks. Machine Learning Journal 20,
273–297
(1995)
2. Christmas, W.J., Kittler, J.,
Petrou, M.: Structural matching in Computer Vision
using
Probabilistic Relaxation. IEEE
Transactions on Pattern Analysis and Machine
Intelligence 17,
749–764 (1995)
3.
Devijver, P.A., Kittler, J.: On the edited nearest
neighbour rule. In: Proc. 5th Int. Conf. on
Pattern Recognition, pp. 72–80 (1980)
4.
Guzman, A.: Computer Recognition of three-
dimensional objects in a visual scene. Tech. Rep.
MAC-TR-59, AI Laboratory, MIT (1968)
5.
Heesch, D., Petrou, M.: Non-Gibbsian Markov Random
Fields for object recognition. The
British
Machine Vision Conference (submitted, 2007)
6.
Huffman, D.A.: Impossible Objects as Nonsense
Sentences. Machine Intelligence 6,295–323
(1971)
7. Hummel, R.A., Zucker, S.W.: One
the foundations of relaxation labelling process.
IEEE
Transactions PAMI 5, 267–287 (1983)
8. Kindermann, R., Snell, J.L.:
Markov Random Fields and their Applications. First
book of the
AMS soft-cover series in
Contemporary Mathematics, American Mathematical
Society (1980) 12
M. Petrou
9. Li, Z.: A
neural model of contour integration in the primary
visual cortex. Neural Computation
10, 903–940
(1998)
10. Li, Z.: Visual segmentation by
contextual influences via intra-cortical
interactions in the
primary visual cortex.
Networks:Computation in Neural Systems 10, 187–212
11. Li, Z.: Computational design and nonlinear
dynamics of a recurrent network model of the
primary visual cortex. Neural Computation 13,
1749–1780 (2001)
12. Lindley, D.V.: Making
Decisions. John Wiley, Chichester (1985)
13.
Marengoni, M.: Bayesian Networks and Utility
Theory for the management of uncertainty and
control of algorithms in vision systems. PhD
thesis, University of Massachusetts (2002)
14.
Miller, E.G., Matsakis, N.E., Viola, P.A.:
Learning from one example through
shared
densities on transforms. In: CVPR (2000)
15.
Nagel, E., Newman, J.R.: G¨odel’s Proof. Routledge
and Kegan Paul (1959) ISBN:
16. Pearl,
J.: Probabilistic reasoning in intelligent
systems: Networks of plausible inference.
Morgan Kaufmann Publishers Inc., San Francisco
(1988)
17. Petrou, M., Tabacchi, M., Piroddi,
R.: Networks of ideas and concepts. IEEE
Transactions of
Man Machine and Cybernetics
(submitted, 2007)
18. Petrou, M., Garcia
Sevilla, P.: Image Processing, Dealing with
Texture. Wiley, Chichester
(2006)
19.
Schlesinger, B.D., Hlavac, V.: Ten lectures on
Statistical and Structural Pattern Recognition,
ch. 10. Kluwer Academic Publishers, Dordrecht,
The Netherlands (2002)
20. Stassopoulou, A.,
Petrou, M.: Obtaining the correspondence between
Bayesian and Neural
Networks. International
Journal of Pattern Recognition and Artificial
Intelligence 12, 901–920
(1998)
21.
Stoddart, A.J., Petrou, M., Kittler, J.: On the
foundations of Probabilistic Relaxation with
product support. Journal of Mathematical
Imaging and Vision 9, 29–48 (1998)
22.
Tenenbaum, J.B., Griffiths, T.L., Kemp, C.:
Theory-based Bayesian models of inductive
learning and reasoning. Trends in Cognitive
Sciences 10, 309–318 (2006)
23. Tong, S.,
Koller, D.: Support Vector Machine active learning
with applications to text
classification.
Journal of Machine Learning Research 2, 45–66
(2001)
24. Walker, T.C., Miller, R.K.: Expert
Systems Handbook: An Assessment of Technology and
Applications. Prentice-Hall, Englewood Cliffs
(1990)
25. Waltz, D.: Understanding line
drawings of scenes with shadows. In: Winstone, P.
(ed.) The Psychology of Computer Vision, pp.
19–91. McGraw-Hill, New York (1975),
http:
∼
cfs305
26. Winston, P.H.:
Learning structural descriptions from examples.
The psychology of computer
vision, 157–209
(1975)
27. Zadeh, L.H.: A fuzzy algorithmic
approach to the definition of complex or
impreciseconcepts.
Int. J. Man-Machine Studies
8, 249–291 (1976)
28. http:rning