建议使用以下浏览器,以获得最佳体验。 ie 9.0 以上版本 chrome 31 谷歌浏览器 firefox 30 火狐浏览器
温馨提示

抱歉,您需设置社区昵称后才能参与社区互动!

前往修改
我再想想
选择版块
主题:324帖子:963

【酷哥说库】

opengauss内核分析(三):sql解析

2022/6/9 5313

在传统数据库中sql引擎一般指对用户输入的sql语句进行解析、优化的软件模块

sql的解析过程主要分为:

词法分析:将用户输入的sql语句拆解成单词(token)序列,并识别出关键字、标识、常量等。

• 语法分析:分析器对词法分析器解析出来的单词(token)序列在语法上是否满足sql语法规则。

• 语义分析:语义分析是sql解析过程的一个逻辑阶段,主要任务是在语法正确的基础上进行上下文有关性质的审查,在sql解析过程中该阶段完成表名、操作符、类型等元素的合法性判断,同时检测语义上的二义性。

opengauss在pg_parse_query中调用raw_parser函数对用户输入的sql命令进行词法分析和语法分析,生成语法树添加到链表parsetree_list中。完成语法分析后,对于parsetree_list中的每一颗语法树parsetree,会调用parse_**yze函数进行语义分析,根据sql命令的不同,执行对应的入口函数,最终生成查询树

词法分析

opengauss使用flex工具进行词法分析。flex工具通过对已经定义好的词法文件进行编译,生成词法分析的代码。词法文件是scan.l,它根据sql语言标准对sql语言中的关键字、标识符、操作符、常量、终结符进行了定义和识别。在kwlist.h中定义了大量的关键字,按照字母的顺序排列,方便在查找关键字时通过二分法进行查找。 在scan.l中处理“标识符”时,会到关键字列表中进行匹配,如果一个标识符匹配到关键字,则认为是关键字,否则才是标识符,即关键字优先. 以“select a, b from item”为例说明词法分析结果

名称

词性

内容

说明

关键字

keyword

select,from

如select/from/where等,对大小写不敏感

标识符

ident

a,b,item

用户自己定义的名字、常量名、变量名和过程名,若无括号修饰则对大小写不敏感

语法分析

opengauss中定义了bison工具能够识别的语法文件gram.y,根据sql语言的不同定义了一系列表达statement的结构体(这些结构体通常以stmt作为命名后缀),用来保存语法分析结果。以select查询为例,它对应的statement结构体如下。

typedef struct selectstmt
{
	nodetag		type;
	list	   *distinctclause; /* null, list of distinct on exprs, or
								 * lcons(nil,nil) for all (select distinct) */
	intoclause *intoclause;		/* target for select into */
	list	   *targetlist;		/* the target list (of restarget) */
	list	   *fromclause;		/* the from clause */
	node	   *whereclause;	/* where qualification */
	list	   *groupclause;	/* group by clauses */
	node	   *havingclause;	/* having conditional-expression */
	list	   *windowclause;	/* window window_name as (...), ... */
	withclause *withclause;		/* with clause */
	list	   *valueslists;	/* untransformed list of expression lists */
	list	   *sortclause;		/* sort clause (a list of sortby's) */
	node	   *limitoffset;	/* # of result tuples to skip */
	node	   *limitcount;		/* # of result tuples to return */
    ……
} selectstmt;

这个结构体可以看作一个多叉树,每个叶子节点都表达了select查询语句中的一个语法结构,对应到gram.y中,它会有一个selectstmt。代码如下:

从simple_select语法分析结构可以看出,一条简单的查询语句由以下子句组成:去除行重复的distinctclause、目标属性targetlist、select into子句intoclause、from子句fromclause、where子句whereclause、group by子句groupclause、having子句havingclause、窗口子句windowclause和plan_hint子句。在成功匹配simple_select语法结构后,将会创建一个statement结构体,将各个子句进行相应的赋值。对simple_select而言,目标属性、from子句、where子句是最重要的组成部分。selectstmt与其他结构体的关系如下

下面以“select a, b from item”为例说明简单select语句的解析过程,函数exec_simple_query调用pg_parse_query执行解析,解析树中只有一个元素

(gdb) p *parsetree_list
$47 = {type = t_list, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}

list中的节点类型为t_selectstmt

(gdb) p *(node *)(parsetree_list->head.data->ptr_value)
$45 = {type = t_selectstmt}

查看selectstmt结构体,targetlist 和fromclause非空

(gdb) set $stmt = (selectstmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = t_selectstmt, distinctclause = 0x0, intoclause = 0x0, targetlist = 0x7f5ffa43d588, fromclause = 0x7f5ff986c888, startwithclause = 0x0, whereclause = 0x0, groupclause = 0x0,
  havingclause = 0x0, windowclause = 0x0, withclause = 0x0, valueslists = 0x0, sortclause = 0x0, limitoffset = 0x0, limitcount = 0x0, lockingclause = 0x0, hintstate = 0x0, op = setop_none, all = false,
  larg = 0x0, rarg = 0x0, hasplus = false}

查看selectstmt的targetlist,有两个restarget

(gdb) p *($stmt->targetlist)
$55 = {type = t_list, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(node *)($stmt->targetlist->head.data->ptr_value)
$57 = {type = t_restarget}
(gdb) set $restarget1=(restarget *)($stmt->targetlist->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = t_restarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = t_columnref}
(gdb) p *(columnref *)$restarget1->val
$64 = {type = t_columnref, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((columnref *)$restarget1->val)->fields
$66 = {type = t_list, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(node *)(((columnref *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = t_string}
(gdb) p *(value *)(((columnref *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = t_string, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}
(gdb) set $restarget2=(restarget *)($stmt->targetlist->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = t_restarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = t_columnref}
(gdb) p *(columnref *)$restarget2->val
$91 = {type = t_columnref, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((columnref *)$restarget2->val)->fields
$92 = {type = t_list, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(node *)(((columnref *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = t_string}
(gdb) p *(value *)(((columnref *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = t_string, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}}

查看selectstmt的fromclause,有一个rangevar

(gdb) p *$stmt->fromclause
$102 = {type = t_list, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(rangevar*)($stmt->fromclause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = t_rangevar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhopt = inh_default, relpersistence = 112 'p', alias = 0x0,
  location = 17, ispartition = false, issubpartition = false, partitionkeyvalueslist = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignoid = 0, withverexpr = false}

综合以上分析可以得到语法树结构

语义分析

在完成词法分析和语法分析后,parse_ana lyze函数会根据语法树的类型,调用transformselectstmt将parsetree改写为查询树

(gdb) p *result
$3 = {type = t_query, commandtype = cmd_select, querysource = qsrc_original, queryid = 0, cansettag = false, utilitystmt = 0x0, resultrelation = 0, hasaggs = false, haswindowfuncs = false,
  hassublinks = false, hasdistincton = false, hasrecursive = false, hasmodifyingcte = false, hasforupdate = false, hasrowsecurity = false, hassynonyms = false, ctelist = 0x0, rtable = 0x7f5ff5eb8c88,
  jointree = 0x7f5ff5eb9310, targetlist = 0x7f5ff5eb9110,…}
(gdb) p *result->targetlist
$13 = {type = t_list, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(node *)(result->targetlist->head.data->ptr_value)
$8 = {type = t_targetentry}
(gdb) p *(targetentry*)(result->targetlist->head.data->ptr_value)
$9 = {xpr = {type = t_targetentry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(targetentry*)(result->targetlist->tail.data->ptr_value)
$10 = {xpr = {type = t_targetentry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = t_list, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb)  p *(node *)(result->rtable->head.data->ptr_value)
$15 = {type = t_rangetblentry}
(gdb) p *(rangetblentry*)(result->rtable->head.data->ptr_value)
$16 = {type = t_rangetblentry, rtekind = rte_relation, relname = 0x7f5ff636efb0 "item", partattrnum = 0x0, relid = 24576, partitionoid = 0, iscontainpartition = false, subpartitionoid = 0……}

得到的查询树结构如下:


完成词法、语法和语义分析后,sql解析过程完成,sql引擎开始执行查询优化,在下一期中再具体分析。


回复3

0 0
2022/6/9 22:24

有道理

0 0
2022/6/10 14:21

有道理

0 0
2022/6/17 00:04

在传统数据库中sql引擎一般指对用户输入的sql语句进行解析、优化的软件模块

上划加载中
直达楼层
全部回复
正序浏览
标签
您还可以添加5个标签
  • 没有搜索到和“关键字”相关的标签
  • 云产品
  • 星辰平台的解决方案
  • 技术领域
  • 通用技术
  • 平台功能
取消

opengauss内核分析(三):sql解析-星辰平台

您已采纳当前回复为最佳回复

发帖: 162粉丝: 5

级别 : 版主,版块专家

[酷哥说库] opengauss内核分析(三):sql解析

在传统数据库中sql引擎一般指对用户输入的sql语句进行解析、优化的软件模块

sql的解析过程主要分为:

词法分析:将用户输入的sql语句拆解成单词(token)序列,并识别出关键字、标识、常量等。

• 语法分析:分析器对词法分析器解析出来的单词(token)序列在语法上是否满足sql语法规则。

• 语义分析:语义分析是sql解析过程的一个逻辑阶段,主要任务是在语法正确的基础上进行上下文有关性质的审查,在sql解析过程中该阶段完成表名、操作符、类型等元素的合法性判断,同时检测语义上的二义性。

opengauss在pg_parse_query中调用raw_parser函数对用户输入的sql命令进行词法分析和语法分析,生成语法树添加到链表parsetree_list中。完成语法分析后,对于parsetree_list中的每一颗语法树parsetree,会调用parse_**yze函数进行语义分析,根据sql命令的不同,执行对应的入口函数,最终生成查询树

词法分析

opengauss使用flex工具进行词法分析。flex工具通过对已经定义好的词法文件进行编译,生成词法分析的代码。词法文件是scan.l,它根据sql语言标准对sql语言中的关键字、标识符、操作符、常量、终结符进行了定义和识别。在kwlist.h中定义了大量的关键字,按照字母的顺序排列,方便在查找关键字时通过二分法进行查找。 在scan.l中处理“标识符”时,会到关键字列表中进行匹配,如果一个标识符匹配到关键字,则认为是关键字,否则才是标识符,即关键字优先. 以“select a, b from item”为例说明词法分析结果

名称

词性

内容

说明

关键字

keyword

select,from

如select/from/where等,对大小写不敏感

标识符

ident

a,b,item

用户自己定义的名字、常量名、变量名和过程名,若无括号修饰则对大小写不敏感

语法分析

opengauss中定义了bison工具能够识别的语法文件gram.y,根据sql语言的不同定义了一系列表达statement的结构体(这些结构体通常以stmt作为命名后缀),用来保存语法分析结果。以select查询为例,它对应的statement结构体如下。

typedef struct selectstmt
{
	nodetag		type;
	list	   *distinctclause; /* null, list of distinct on exprs, or
								 * lcons(nil,nil) for all (select distinct) */
	intoclause *intoclause;		/* target for select into */
	list	   *targetlist;		/* the target list (of restarget) */
	list	   *fromclause;		/* the from clause */
	node	   *whereclause;	/* where qualification */
	list	   *groupclause;	/* group by clauses */
	node	   *havingclause;	/* having conditional-expression */
	list	   *windowclause;	/* window window_name as (...), ... */
	withclause *withclause;		/* with clause */
	list	   *valueslists;	/* untransformed list of expression lists */
	list	   *sortclause;		/* sort clause (a list of sortby's) */
	node	   *limitoffset;	/* # of result tuples to skip */
	node	   *limitcount;		/* # of result tuples to return */
    ……
} selectstmt;

这个结构体可以看作一个多叉树,每个叶子节点都表达了select查询语句中的一个语法结构,对应到gram.y中,它会有一个selectstmt。代码如下:

从simple_select语法分析结构可以看出,一条简单的查询语句由以下子句组成:去除行重复的distinctclause、目标属性targetlist、select into子句intoclause、from子句fromclause、where子句whereclause、group by子句groupclause、having子句havingclause、窗口子句windowclause和plan_hint子句。在成功匹配simple_select语法结构后,将会创建一个statement结构体,将各个子句进行相应的赋值。对simple_select而言,目标属性、from子句、where子句是最重要的组成部分。selectstmt与其他结构体的关系如下

下面以“select a, b from item”为例说明简单select语句的解析过程,函数exec_simple_query调用pg_parse_query执行解析,解析树中只有一个元素

(gdb) p *parsetree_list
$47 = {type = t_list, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}

list中的节点类型为t_selectstmt

(gdb) p *(node *)(parsetree_list->head.data->ptr_value)
$45 = {type = t_selectstmt}

查看selectstmt结构体,targetlist 和fromclause非空

(gdb) set $stmt = (selectstmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = t_selectstmt, distinctclause = 0x0, intoclause = 0x0, targetlist = 0x7f5ffa43d588, fromclause = 0x7f5ff986c888, startwithclause = 0x0, whereclause = 0x0, groupclause = 0x0,
  havingclause = 0x0, windowclause = 0x0, withclause = 0x0, valueslists = 0x0, sortclause = 0x0, limitoffset = 0x0, limitcount = 0x0, lockingclause = 0x0, hintstate = 0x0, op = setop_none, all = false,
  larg = 0x0, rarg = 0x0, hasplus = false}

查看selectstmt的targetlist,有两个restarget

(gdb) p *($stmt->targetlist)
$55 = {type = t_list, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(node *)($stmt->targetlist->head.data->ptr_value)
$57 = {type = t_restarget}
(gdb) set $restarget1=(restarget *)($stmt->targetlist->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = t_restarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = t_columnref}
(gdb) p *(columnref *)$restarget1->val
$64 = {type = t_columnref, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((columnref *)$restarget1->val)->fields
$66 = {type = t_list, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(node *)(((columnref *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = t_string}
(gdb) p *(value *)(((columnref *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = t_string, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}
(gdb) set $restarget2=(restarget *)($stmt->targetlist->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = t_restarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = t_columnref}
(gdb) p *(columnref *)$restarget2->val
$91 = {type = t_columnref, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((columnref *)$restarget2->val)->fields
$92 = {type = t_list, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(node *)(((columnref *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = t_string}
(gdb) p *(value *)(((columnref *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = t_string, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}}

查看selectstmt的fromclause,有一个rangevar

(gdb) p *$stmt->fromclause
$102 = {type = t_list, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(rangevar*)($stmt->fromclause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = t_rangevar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhopt = inh_default, relpersistence = 112 'p', alias = 0x0,
  location = 17, ispartition = false, issubpartition = false, partitionkeyvalueslist = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignoid = 0, withverexpr = false}

综合以上分析可以得到语法树结构

语义分析

在完成词法分析和语法分析后,parse_ana lyze函数会根据语法树的类型,调用transformselectstmt将parsetree改写为查询树

(gdb) p *result
$3 = {type = t_query, commandtype = cmd_select, querysource = qsrc_original, queryid = 0, cansettag = false, utilitystmt = 0x0, resultrelation = 0, hasaggs = false, haswindowfuncs = false,
  hassublinks = false, hasdistincton = false, hasrecursive = false, hasmodifyingcte = false, hasforupdate = false, hasrowsecurity = false, hassynonyms = false, ctelist = 0x0, rtable = 0x7f5ff5eb8c88,
  jointree = 0x7f5ff5eb9310, targetlist = 0x7f5ff5eb9110,…}
(gdb) p *result->targetlist
$13 = {type = t_list, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(node *)(result->targetlist->head.data->ptr_value)
$8 = {type = t_targetentry}
(gdb) p *(targetentry*)(result->targetlist->head.data->ptr_value)
$9 = {xpr = {type = t_targetentry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(targetentry*)(result->targetlist->tail.data->ptr_value)
$10 = {xpr = {type = t_targetentry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = t_list, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb)  p *(node *)(result->rtable->head.data->ptr_value)
$15 = {type = t_rangetblentry}
(gdb) p *(rangetblentry*)(result->rtable->head.data->ptr_value)
$16 = {type = t_rangetblentry, rtekind = rte_relation, relname = 0x7f5ff636efb0 "item", partattrnum = 0x0, relid = 24576, partitionoid = 0, iscontainpartition = false, subpartitionoid = 0……}

得到的查询树结构如下:


完成词法、语法和语义分析后,sql解析过程完成,sql引擎开始执行查询优化,在下一期中再具体分析。


分享文章到朋友圈

分享文章到微博
您已采纳当前回复为最佳回复

发帖: 1粉丝: 1

发表于2022年06月09日 22:24:56

有道理

您已采纳当前回复为最佳回复

发帖: 0粉丝: 0

发表于2022年06月10日 14:21:30

有道理

您已采纳当前回复为最佳回复

乌龟哥哥

发帖: 85粉丝: 24

发表于2022年06月17日 00:04:05

在传统数据库中sql引擎一般指对用户输入的sql语句进行解析、优化的软件模块

您需要登录后才可以回帖 | 立即注册

您对问题的回复是否满意?
满意度
非常满意 满意 一般 不满意
我要反馈
0/200