{"id":232261,"date":"2020-12-28T10:32:56","date_gmt":"2020-12-28T02:32:56","guid":{"rendered":"http:\/\/4563.org\/?p=232261"},"modified":"2020-12-28T10:32:56","modified_gmt":"2020-12-28T02:32:56","slug":"spark-connector-reader-%e5%8e%9f%e7%90%86%e4%b8%8e%e5%ae%9e%e8%b7%b5","status":"publish","type":"post","link":"http:\/\/4563.org\/?p=232261","title":{"rendered":"Spark Connector Reader \u539f\u7406\u4e0e\u5b9e\u8df5"},"content":{"rendered":"<div>\n<div>\n<div>\n<h1>                  Spark Connector Reader \u539f\u7406\u4e0e\u5b9e\u8df5               <\/h1>\n<p> <\/p>\n<div>\n<div> <span>\u8cc7\u6df1\u5927\u4f6c : NebulaGraph <\/span>  <span><i><\/i> 7<\/span> <\/div>\n<div> <\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div isfirst=\"1\"> <\/p>\n<p><img decoding=\"async\" loading=\"lazy\" referrerpolicy=\"no-referrer\" rel=\"noreferrer\" src=\"http:\/\/4563.org\/wp-content\/uploads\/2020\/12\/20201229_5feba6549fcb4.png\" alt=\"Spark Connector Reader \u539f\u7406\u4e0e\u5b9e\u8df5\" \/><\/p>\n<p>\u672c\u6587\u4e3b\u8981\u8bb2\u8ff0\u5982\u4f55\u5229\u7528 Spark Connector \u8fdb\u884c Nebula Graph \u6570\u636e\u7684\u8bfb\u53d6\u3002<\/p>\n<h2>Spark Connector \u7b80\u4ecb<\/h2>\n<p>Spark Connector \u662f\u4e00\u4e2a Spark \u7684\u6570\u636e\u8fde\u63a5\u5668\uff0c\u53ef\u4ee5\u901a\u8fc7\u8be5\u8fde\u63a5\u5668\u8fdb\u884c\u5916\u90e8\u6570\u636e\u7cfb\u7edf\u7684\u8bfb\u5199\u64cd\u4f5c\uff0cSpark Connector \u5305\u542b\u4e24\u90e8\u5206\uff0c\u5206\u522b\u662f Reader \u548c Writer\uff0c\u800c\u672c\u6587\u4fa7\u91cd\u4ecb\u7ecd Spark Connector Reader\uff0cWriter \u90e8\u5206\u5c06\u5728\u4e0b\u7bc7\u548c\u5927\u5bb6\u8be6\u804a\u3002<\/p>\n<h2>Spark Connector Reader \u539f\u7406<\/h2>\n<p>Spark Connector Reader \u662f\u5c06 Nebula Graph \u4f5c\u4e3a Spark \u7684\u6269\u5c55\u6570\u636e\u6e90\uff0c\u4ece Nebula Graph \u4e2d\u5c06\u6570\u636e\u8bfb\u6210 DataFrame\uff0c\u518d\u8fdb\u884c\u540e\u7eed\u7684 map \u3001reduce \u7b49\u64cd\u4f5c\u3002<\/p>\n<p>Spark SQL \u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u6570\u636e\u6e90\uff0c\u652f\u6301\u5bf9\u5916\u90e8\u6570\u636e\u6e90\u8fdb\u884c\u6269\u5c55\u3002\u901a\u8fc7 Spark SQL \u8bfb\u53d6\u7684\u6570\u636e\u683c\u5f0f\u662f\u4ee5\u547d\u540d\u5217\u65b9\u5f0f\u7ec4\u7ec7\u7684\u5206\u5e03\u5f0f\u6570\u636e\u96c6 DataFrame\uff0cSpark SQL \u672c\u8eab\u4e5f\u63d0\u4f9b\u4e86\u4f17\u591a API \u65b9\u4fbf\u7528\u6237\u5bf9 DataFrame \u8fdb\u884c\u8ba1\u7b97\u548c\u8f6c\u6362\uff0c\u80fd\u5bf9\u591a\u79cd\u6570\u636e\u6e90\u4f7f\u7528 DataFrame \u63a5\u53e3\u3002<\/p>\n<p>Spark \u8c03\u7528\u5916\u90e8\u6570\u636e\u6e90\u5305\u7684\u662f <code>org.apache.spark.sql<\/code>\uff0c\u9996\u5148\u4e86\u89e3\u4e0b Spark SQL \u63d0\u4f9b\u7684\u7684\u6269\u5c55\u6570\u636e\u6e90\u76f8\u5173\u7684\u63a5\u53e3\u3002<\/p>\n<h3>Basic Interfaces<\/h3>\n<ul>\n<li>BaseRelation\uff1a\u8868\u793a\u5177\u6709\u5df2\u77e5 Schema \u7684\u5143\u7ec4\u96c6\u5408\u3002\u6240\u6709\u7ee7\u627f BaseRelation \u7684\u5b50\u7c7b\u90fd\u5fc5\u987b\u751f\u6210 StructType \u683c\u5f0f\u7684 Schema \u3002\u6362\u53e5\u8bdd\u8bf4\uff0cBaseRelation \u5b9a\u4e49\u4e86\u4ece\u6570\u636e\u6e90\u4e2d\u8bfb\u53d6\u7684\u6570\u636e\u5728 Spark SQL \u7684 DataFrame \u4e2d\u5b58\u50a8\u7684\u6570\u636e\u683c\u5f0f\u7684\u3002<\/li>\n<li>RelationProvider\uff1a\u83b7\u53d6\u53c2\u6570\u5217\u8868\uff0c\u6839\u636e\u7ed9\u5b9a\u7684\u53c2\u6570\u8fd4\u56de\u4e00\u4e2a\u65b0\u7684 BaseRelation \u3002<\/li>\n<li>DataSourceRegister\uff1a\u6ce8\u518c\u6570\u636e\u6e90\u7684\u7b80\u5199\uff0c\u5728\u4f7f\u7528\u6570\u636e\u6e90\u65f6\u4e0d\u7528\u5199\u6570\u636e\u6e90\u7684\u5168\u9650\u5b9a\u7c7b\u540d\uff0c\u800c\u53ea\u9700\u8981\u5199\u81ea\u5b9a\u4e49\u7684 shortName \u5373\u53ef\u3002<\/li>\n<\/ul>\n<h3>Providers<\/h3>\n<ul>\n<li>RelationProvider\uff1a\u4ece\u6307\u5b9a\u6570\u636e\u6e90\u4e2d\u751f\u6210\u81ea\u5b9a\u4e49\u7684 relation \u3002 <code>createRelation()<\/code>\u00a0 \u4f1a\u57fa\u4e8e\u7ed9\u5b9a\u7684 Params \u53c2\u6570\u751f\u6210\u65b0\u7684 relation \u3002<\/li>\n<li>SchemaRelationProvider\uff1a\u53ef\u4ee5\u57fa\u4e8e\u7ed9\u5b9a\u7684 Params \u53c2\u6570\u548c\u7ed9\u5b9a\u7684 Schema \u4fe1\u606f\u751f\u6210\u65b0\u7684 Relation \u3002<\/li>\n<\/ul>\n<h3>RDD<\/h3>\n<ul>\n<li>RDD[InternalRow]: \u4ece\u6570\u636e\u6e90\u4e2d Scan \u51fa\u6765\u540e\u9700\u8981\u6784\u9020\u6210 RDD[Row]<\/li>\n<\/ul>\n<p>\u8981\u5b9e\u73b0\u81ea\u5b9a\u4e49 Spark \u5916\u90e8\u6570\u636e\u6e90\uff0c\u9700\u8981\u6839\u636e\u6570\u636e\u6e90\u81ea\u5b9a\u4e49\u4e0a\u8ff0\u90e8\u5206\u65b9\u6cd5\u3002<\/p>\n<p>\u5728 Nebula Graph \u7684 Spark Connector \u4e2d\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5c06 Nebula Graph \u4f5c\u4e3a Spark SQL \u7684\u5916\u90e8\u6570\u636e\u6e90\uff0c\u901a\u8fc7 <code>sparkSession.read<\/code>\u00a0\u5f62\u5f0f\u8fdb\u884c\u6570\u636e\u7684\u8bfb\u53d6\u3002\u8be5\u529f\u80fd\u5b9e\u73b0\u7684\u7c7b\u56fe\u5c55\u793a\u5982\u4e0b\uff1a<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" referrerpolicy=\"no-referrer\" rel=\"noreferrer\" src=\"http:\/\/4563.org\/wp-content\/uploads\/2020\/12\/20201229_5feba65997e64.png\" alt=\"Spark Connector Reader \u539f\u7406\u4e0e\u5b9e\u8df5\" \/><\/p>\n<ol>\n<li>\u5b9a\u4e49\u6570\u636e\u6e90 NebulaRelatioProvider\uff0c\u7ee7\u627f RelationProvider \u8fdb\u884c relation \u81ea\u5b9a\u4e49\uff0c\u7ee7\u627f DataSourceRegister \u8fdb\u884c\u5916\u90e8\u6570\u636e\u6e90\u7684\u6ce8\u518c\u3002<\/li>\n<li>\u5b9a\u4e49 NebulaRelation \u5b9a\u4e49 Nebula Graph \u7684\u6570\u636e Schema \u548c\u6570\u636e\u8f6c\u6362\u65b9\u6cd5\u3002\u5728 <code>getSchema()<\/code>\u00a0\u65b9\u6cd5\u4e2d\u8fde\u63a5 Nebula Graph \u7684 Meta \u670d\u52a1\u83b7\u53d6\u914d\u7f6e\u7684\u8fd4\u56de\u5b57\u6bb5\u5bf9\u5e94\u7684 Schema \u4fe1\u606f\u3002<\/li>\n<li>\u5b9a\u4e49 NebulaRDD \u8fdb\u884c Nebula Graph \u6570\u636e\u7684\u8bfb\u53d6\u3002 <code>compute()<\/code>\u00a0\u65b9\u6cd5\u4e2d\u5b9a\u4e49\u5982\u4f55\u8bfb\u53d6 Nebula Graph \u6570\u636e\uff0c\u4e3b\u8981\u6d89\u53ca\u5230\u8fdb\u884c Nebula Graph \u6570\u636e Scan \u3001\u5c06\u8bfb\u5230\u7684 Nebula Graph Row \u6570\u636e\u8f6c\u6362\u4e3a Spark \u7684 InternalRow \u6570\u636e\uff0c\u4ee5 InternalRow \u7ec4\u6210 RDD \u7684\u4e00\u884c\uff0c\u5176\u4e2d\u6bcf\u4e00\u4e2a InternalRow \u8868\u793a Nebula Graph \u4e2d\u7684\u4e00\u884c\u6570\u636e\uff0c\u6700\u7ec8\u901a\u8fc7\u5206\u533a\u8fed\u4ee3\u7684\u5f62\u5f0f\u5c06 Nebula Graph \u6240\u6709\u6570\u636e\u8bfb\u51fa\u7ec4\u88c5\u6210\u6700\u7ec8\u7684 DataFrame \u7ed3\u679c\u6570\u636e\u3002<\/li>\n<\/ol>\n<h2>Spark Connector Reader \u5b9e\u8df5<\/h2>\n<p>Spark Connector \u7684 Reader \u529f\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u63a5\u53e3\u4f9b\u7528\u6237\u7f16\u7a0b\u8fdb\u884c\u6570\u636e\u8bfb\u53d6\u3002\u4e00\u6b21\u8bfb\u53d6\u4e00\u4e2a\u70b9 \/\u8fb9\u7c7b\u578b\u7684\u6570\u636e\uff0c\u8bfb\u53d6\u7ed3\u679c\u4e3a DataFrame \u3002<\/p>\n<p>\u4e0b\u9762\u5f00\u59cb\u5b9e\u8df5\uff0c\u62c9\u53d6 GitHub \u4e0a Spark Connector \u4ee3\u7801\uff1a<\/p>\n<pre><code>git clone -b v1.0 [email&#160;protected]:vesoft-inc\/nebula-java.git cd nebula-java\/tools\/nebula-spark mvn clean compile package install -Dgpg.skip -Dmaven.javadoc.skip=true <\/code><\/pre>\n<p>\u5c06\u7f16\u8bd1\u6253\u6210\u7684\u5305 copy \u5230\u672c\u5730 Maven \u5e93\u3002<\/p>\n<p>\u5e94\u7528\u793a\u4f8b\u5982\u4e0b\uff1a<\/p>\n<ol>\n<li>\u5728 mvn \u9879\u76ee\u7684 pom \u6587\u4ef6\u4e2d\u52a0\u5165 <code>nebula-spark<\/code> \u4f9d\u8d56<\/li>\n<\/ol>\n<pre><code>&lt;dependency&gt;   &lt;groupId&gt;com.vesoft&lt;\/groupId&gt;   &lt;artifactId&gt;nebula-spark&lt;\/artifactId&gt;   &lt;version&gt;1.1.0&lt;\/version&gt; &lt;\/dependency&gt; <\/code><\/pre>\n<ol>\n<li>\u5728 Spark \u7a0b\u5e8f\u4e2d\u8bfb\u53d6 Nebula Graph \u6570\u636e\uff1a<\/li>\n<\/ol>\n<pre><code>\/\/ \u8bfb\u53d6 Nebula Graph \u70b9\u6570\u636e val vertexDataset: Dataset[Row] =       spark.read         .nebula(\"127.0.0.1:45500\", \"spaceName\", \"100\")         .loadVerticesToDF(\"tag\", \"field1,field2\") vertexDataset.show()          \/\/ \u8bfb\u53d6 Nebula Graph \u8fb9\u6570\u636e val edgeDataset: Dataset[Row] =       spark.read         .nebula(\"127.0.0.1:45500\", \"spaceName\", \"100\")         .loadEdgesToDF(\"edge\", \"*\") edgeDataset.show() <\/code><\/pre>\n<p>\u914d\u7f6e\u8bf4\u660e\uff1a<\/p>\n<ul>\n<li>nebula(address: String, space: String, partitionNum: String)<\/li>\n<\/ul>\n<pre><code>address\uff1a\u53ef\u4ee5\u914d\u7f6e\u591a\u4e2a\u5730\u5740\uff0c\u4ee5\u82f1\u6587\u9017\u53f7\u5206\u5272\uff0c\u5982\u201cip1:45500,ip2:45500\u201d space: Nebula Graph \u7684 graphSpace partitionNum\uff1a \u8bbe\u5b9a spark \u8bfb\u53d6 Nebula \u65f6\u7684 partition \u6570\uff0c\u5c3d\u91cf\u4f7f\u7528\u521b\u5efa Space \u65f6\u6307\u5b9a\u7684 Nebula Graph \u4e2d\u7684 partitionNum\uff0c\u53ef\u786e\u4fdd\u4e00\u4e2a Spark \u7684 partition \u8bfb\u53d6 Nebula Graph \u4e00\u4e2a part \u7684\u6570\u636e\u3002 <\/code><\/pre>\n<ul>\n<li>loadVertices(tag: String, fields: String)<\/li>\n<\/ul>\n<pre><code>tag\uff1aNebula Graph \u4e2d\u70b9\u7684 Tag fields\uff1a\u8be5 Tag \u4e2d\u7684\u5b57\u6bb5\uff0c\uff0c\u591a\u5b57\u6bb5\u540d\u4ee5\u82f1\u6587\u9017\u53f7\u5206\u9694\u3002\u8868\u793a\u53ea\u8bfb\u53d6 fields \u4e2d\u7684\u5b57\u6bb5\uff0c* \u8868\u793a\u8bfb\u53d6\u5168\u90e8\u5b57\u6bb5 <\/code><\/pre>\n<ul>\n<li>loadEdges(edge: String, fields: String)<\/li>\n<\/ul>\n<pre><code>edge\uff1aNebula Graph \u4e2d\u8fb9\u7684 Edge fields\uff1a\u8be5 Edge \u4e2d\u7684\u5b57\u6bb5\uff0c\u591a\u5b57\u6bb5\u540d\u4ee5\u82f1\u6587\u9017\u53f7\u5206\u9694\u3002\u8868\u793a\u53ea\u8bfb\u53d6 fields \u4e2d\u7684\u5b57\u6bb5\uff0c* \u8868\u793a\u8bfb\u53d6\u5168\u90e8\u5b57\u6bb5 <\/code><\/pre>\n<h2>\u5176\u4ed6<\/h2>\n<p>Spark Connector Reader \u7684 GitHub \u4ee3\u7801\uff1ahttps:\/\/github.com\/vesoft-inc\/nebula-java\/tree\/master\/tools\/nebula-spark<\/p>\n<p>\u5728\u6b64\u7279\u522b\u611f\u8c22\u534a\u4e91\u79d1\u6280\u6240\u8d21\u732e\u7684 Spark Connector \u7684 Java \u7248\u672c<\/p>\n<h2>\u53c2\u8003\u8d44\u6599<\/h2>\n<p>[1] Extending Spark Datasource API: write a custom spark datasource [2] spark external datasource source code<\/p>\n<p>\u559c\u6b22\u8fd9\u7bc7\u6587\u7ae0\uff1f\u6765\u6765\u6765\uff0c\u7ed9\u6211\u4eec\u7684 GitHub \u70b9\u4e2a star \u8868\u9f13\u52b1\u5566~~       <\/p><\/div>\n<div> <b>\u5927\u4f6c\u6709\u8a71\u8aaa<\/b> (<span>0<\/span>)        <\/div>\n<div> <\/div>\n<\/p><\/div>\n<\/p><\/div>\n<ul>\n<li>\n","protected":false},"excerpt":{"rendered":"<p>Spark Connector R&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[],"_links":{"self":[{"href":"http:\/\/4563.org\/index.php?rest_route=\/wp\/v2\/posts\/232261"}],"collection":[{"href":"http:\/\/4563.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/4563.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/4563.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/4563.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=232261"}],"version-history":[{"count":0,"href":"http:\/\/4563.org\/index.php?rest_route=\/wp\/v2\/posts\/232261\/revisions"}],"wp:attachment":[{"href":"http:\/\/4563.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=232261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/4563.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=232261"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/4563.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=232261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}