Skip to content

Unable to retrieve a nested geo_shape #1248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BriceLegrand opened this issue Feb 11, 2019 · 9 comments
Open

Unable to retrieve a nested geo_shape #1248

BriceLegrand opened this issue Feb 11, 2019 · 9 comments

Comments

@BriceLegrand
Copy link

BriceLegrand commented Feb 11, 2019

What kind an issue is this?

  • [ X] Bug report

Issue description

Description
While using pyspark, I can't retrieve a nested geo_shape field. Non nested geo_shape are fine.
I can provide a geo_shape if needed.
Thanks for your help

Steps to reproduce

Code:

ES_CONF = {
    'es.nodes': "ES_HOST",
    'es.net.http.auth.user': "ES_USER",
    'es.net.http.auth.pass': "ES_PASS",
    'es.read.field.include': 'parent.nestedGeoshape',
}
spark.read.format("org.elasticsearch.spark.sql").options(**ES_CONF).load("es-index").show()

Strack trace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 166, in load
    return self._df(self._jreader.load(path))
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o811.load.
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unknown GeoShape [null]
	at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.doParseGeoShapeInfo(MappingUtils.java:210)
	at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.parseGeoInfo(MappingUtils.java:175)
	at org.elasticsearch.hadoop.rest.RestRepository.sampleGeoFields(RestRepository.java:303)
	at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:109)
	at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:91)
	at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:220)
	at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:220)
	at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:224)
	at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:224)
	at scala.Option.getOrElse(Option.scala:121)
	at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:224)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:431)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
### Version Info

OS:         :  
JVM         :  1.8.0_181
Hadoop/Spark:  2.3.1
ES-Hadoop   :  6.5.4
ES          :   6.5.4

@kevinfuture
Copy link

How to solve it

@motherhubbard
Copy link

Seeing this with ES version 7.4.2 and elasticsearch-hadoop-7.13.1.jar

@masseyke
Copy link
Member

I can reproduce this in the latest code. It looks like the problem is that RestClient.sampleForFields() doesn't do a nested query if the field is nested. And if it did, I don't think that RestRepository.sampleGeoFields() is able to handle a nested field either.

@motherhubbard
Copy link

I was hoping I could give it a schema but I couldnt get that to work either tbh.

@masseyke
Copy link
Member

Yeah a schema won't help unfortunately. We need to update the code. I haven't found a workaround yet.

@motherhubbard
Copy link

motherhubbard commented Jul 14, 2023

Ok thanks for your efforts its appreciated.
The way im working around it for now is to just include the fields I need with es.read.field.include and es.read.field.as.array.include and then hoping to upsert back just the fields I change.

@akk602
Copy link

akk602 commented Sep 29, 2023

Ok thanks for your efforts its appreciated.
The way im working around it for now is to just include the fields I need with es.read.field.include and es.read.field.as.array.include and then hoping to upsert back just the fields I change.

Did this workaround work for you?

@motherhubbard
Copy link

It did.

@akk602
Copy link

akk602 commented Sep 29, 2023

Can you share an example? I am having issues with this too and can't get it work properly. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants