Skip to content

[$40] Include "id" inside document "body" during indexing #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxceem opened this issue Feb 3, 2021 · 12 comments
Closed

[$40] Include "id" inside document "body" during indexing #20

maxceem opened this issue Feb 3, 2021 · 12 comments
Assignees
Labels

Comments

@maxceem
Copy link
Contributor

maxceem commented Feb 3, 2021

I've noticed that during indexing documents current implementation always omits "id" inside the document source, see code example https://github.com/topcoder-platform/taas-es-processor/blob/dev/src/services/JobProcessorService.js#L54

await esClient.createExtra({
    index: config.get('esConfig.ES_INDEX_JOB'),
    id: job.id,
    transactionId,
    body: _.omit(job, 'id'), // <---------------- omit "id" here
    refresh: constants.esRefreshOption
})

I haven't seen such an approach in other Topcoder ES Processsors and I don't know about any benefit of such approach.

@imcaizheng as I understand you implemented the initial code for this processor in the challenge https://www.topcoder.com/challenges/a7bc3928-5260-436e-af87-53adfa4c248f.
Could you please let me know, if this was done on propose, or we can update the code of the processor to keep the "id" inside the body?

There is no particular need to have "id" either, though it feels a bit inconsistent, as we define "id" in mapping but we don't populate it, so it is empty if we look at the data in ES:

image

@maxceem maxceem added the enhancement New feature or request label Feb 3, 2021
@imcaizheng
Copy link
Contributor

@maxceem The code exists in the original taas-apis before it was moved to the ES processor. I also agree that the id field should be populated like other fields so that it looks consistent.

@maxceem maxceem changed the title Include "id" inside document "body" during indexing [$40] Include "id" inside document "body" during indexing Feb 3, 2021
@maxceem
Copy link
Contributor Author

maxceem commented Feb 3, 2021

@imcaizheng ok, let's do this. This is ready for pickup.

  1. for all documents when indexing or updating keep id inside the body, don't omit it
  2. also, we've recently created commands to reindex documents inside TaaS API which follows the same pattern when indexing documents https://github.com/topcoder-platform/taas-apis/blob/dev/src/common/helper.js#L279. So we have to also include id there. Note, that BULK indexing in TaaS API looks like already includes id.
  3. the fix itself looks trivial, but could you please, verify that after this change data is indexed successfully and returned successfully. Same for reindexing data by id using new commands npm run index:jobs <id>, npm run index:job-candidates <id> and npm run index:resource-bookings <id>.

@maxceem
Copy link
Contributor Author

maxceem commented Feb 3, 2021

Challenge https://www.topcoder.com/challenges/647d40c1-9315-45a5-ba1f-94edd58a16c3 has been created for this ticket.

This is an automated message for maxceem via Topcoder X

@imcaizheng imcaizheng self-assigned this Feb 4, 2021
@maxceem
Copy link
Contributor Author

maxceem commented Feb 4, 2021

Challenge https://www.topcoder.com/challenges/647d40c1-9315-45a5-ba1f-94edd58a16c3 has been assigned to aaron2017.

This is an automated message for maxceem via Topcoder X

@imcaizheng
Copy link
Contributor

PR created #21 and topcoder-platform/taas-apis#131

@imcaizheng
Copy link
Contributor

Verify whether taas-es-processor works properly after changes

  1. Keeping Taas API and all dependent services(except taas-es-processor) running locally and start the taas-es-processor.

  2. Fire the create job with booking manager request in Postman. After a response is successfully returned, check inside ES:

    > curl http://localhost:9200/job/_search
    (output below is pretty formatted)
    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 1,
          "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
          {
            "_index": "job",
            "_type": "_doc",
            "_id": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
            "_score": 1.0,
            "_source": {
              "projectId": 111,
              "externalId": "1212",
              "description": "Dummy Description",
              "startDate": "2020-09-27T04:17:23.131Z",
              "endDate": "2020-09-27T04:17:23.131Z",
              "numPositions": 1,
              "resourceType": "Dummy Resource Type",
              "rateType": "hourly",
              "workload": "full-time",
              "skills": [
                "23e00d92-207a-4b5b-b3c9-4c5662644941",
                "7d076384-ccf6-4e43-a45d-1b24b1e624aa",
                "cbac57a3-7180-4316-8769-73af64893158",
                "a2b4bc11-c641-4a19-9eb7-33980378f82e"
              ],
              "title": "Dummy title - at most 64 characters",
              "id": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
              "createdAt": "2021-02-04T06:30:53.488Z",
              "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
              "status": "sourcing"
            }
          }
        ]
      }
    }

    id is included at hits.hits[0]._source.id.

  3. Fire the put job with booking manager request in Postman. After a response is successfully returned, check inside ES:

    > curl http://localhost:9200/job/_search
    (output below is pretty formatted)
    {
      "took": 0,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 1,
          "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
          {
            "_index": "job",
            "_type": "_doc",
            "_id": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
            "_score": 1.0,
            "_source": {
              "projectId": 111,
              "externalId": "1212",
              "description": "Dummy Description",
              "startDate": "2020-09-27T04:17:23.131Z",
              "endDate": "2020-09-27T04:17:23.131Z",
              "numPositions": 13,
              "resourceType": "Dummy Resource Type",
              "rateType": "hourly",
              "workload": "fractional",
              "skills": [
                "cbac57a3-7180-4316-8769-73af64893158",
                "a2b4bc11-c641-4a19-9eb7-33980378f82e"
              ],
              "title": "Dummy title - at most 64 characters",
              "id": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
              "createdAt": "2021-02-04T06:30:53.488Z",
              "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
              "status": "sourcing",
              "updatedBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
              "updatedAt": "2021-02-04T06:31:51.558Z"
            }
          }
        ]
      }
    }

    id inside the _source property remains unchanged.

  4. The verification steps shown above are same for JobCandiate and ResourceBooking services:

> curl http://localhost:9200/job_candidate/_search # after `create job candidate with booking manager`
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "job_candidate",
        "_type": "_doc",
        "_id": "00ec7211-bfa8-4e9d-a2cf-70e3a376689e",
        "_score": 1.0,
        "_source": {
          "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
          "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a",
          "externalId": "300234321",
          "resume": "http://example.com",
          "id": "00ec7211-bfa8-4e9d-a2cf-70e3a376689e",
          "createdAt": "2021-02-04T06:56:47.727Z",
          "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "status": "open"
        }
      }
    ]
  }
}


> curl http://localhost:9200/job_candidate/_search # after `put job candidate with booking manager`
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "job_candidate",
        "_type": "_doc",
        "_id": "00ec7211-bfa8-4e9d-a2cf-70e3a376689e",
        "_score": 1.0,
        "_source": {
          "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
          "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a",
          "externalId": "300234321",
          "resume": "http://example.com",
          "id": "00ec7211-bfa8-4e9d-a2cf-70e3a376689e",
          "createdAt": "2021-02-04T06:56:47.727Z",
          "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "status": "selected",
          "updatedBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "updatedAt": "2021-02-04T07:00:20.797Z"
        }
      }
    ]
  }
}


> curl http://localhost:9200/resource_booking/_search # after `create resource booking with booking manager`
{
  "_shards": {
    "failed": 0,
    "skipped": 0,
    "successful": 1,
    "total": 1
  },
  "hits": {
    "hits": [
      {
        "_id": "566fc543-d70e-4edc-967b-c276031cd069",
        "_index": "resource_booking",
        "_score": 1.0,
        "_source": {
          "createdAt": "2021-02-04T07:04:42.256Z",
          "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "customerRate": 13,
          "endDate": "2020-09-27T04:17:23.131Z",
          "id": "566fc543-d70e-4edc-967b-c276031cd069",
          "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
          "memberRate": 13.23,
          "projectId": 111,
          "rateType": "hourly",
          "startDate": "2020-09-27T04:17:23.131Z",
          "status": "assigned",
          "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a"
        },
        "_type": "_doc"
      }
    ],
    "max_score": 1.0,
    "total": {
      "relation": "eq",
      "value": 1
    }
  },
  "timed_out": false,
  "took": 1
}


> curl http://localhost:9200/resource_booking/_search # after `put resource booking with booking manager`
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "resource_booking",
        "_type": "_doc",
        "_id": "566fc543-d70e-4edc-967b-c276031cd069",
        "_score": 1.0,
        "_source": {
          "projectId": 111,
          "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a",
          "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
          "startDate": "2020-09-27T04:17:23.131Z",
          "endDate": "2020-09-27T04:17:23.131Z",
          "memberRate": 13.23,
          "customerRate": 13,
          "status": "assigned",
          "rateType": "hourly",
          "id": "566fc543-d70e-4edc-967b-c276031cd069",
          "createdAt": "2021-02-04T07:04:42.256Z",
          "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "updatedBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
          "updatedAt": "2021-02-04T07:06:29.186Z"
        }
      }
    ]
  }
}

Verify whether index scripts of taas-api includes id inside the body of a document

  1. Continued from the previous section, I ran the following commands to index data from db to es:

    npm run delete-index
    npm run index:jobs <job_id_previous_created>
    npm run index:job-candidates <job_candidate_id_previous_created>
    npm run index:resource-bookings <resource_booking_id_previous_created>
  2. Check inside ES, for each of index job, job_candidate and resource_booking, seeing id is populated along with other properties.

  3. Check with Postman:

    # Fire request `get job with booking manager` in Postman:
    {
        "id": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
        "projectId": 111,
        "externalId": "1212",
        "description": "Dummy Description",
        "title": "Dummy title - at most 64 characters",
        "startDate": "2020-09-27T04:17:23.131Z",
        "endDate": "2020-09-27T04:17:23.131Z",
        "numPositions": 13,
        "resourceType": "Dummy Resource Type",
        "rateType": "hourly",
        "workload": "fractional",
        "skills": [
            "cbac57a3-7180-4316-8769-73af64893158",
            "a2b4bc11-c641-4a19-9eb7-33980378f82e"
        ],
        "status": "in-review",
        "createdAt": "2021-02-04T06:30:53.488Z",
        "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
        "updatedAt": "2021-02-04T06:56:49.082Z",
        "updatedBy": "00000000-0000-0000-0000-000000000000"
    }
    
    
    # Fire request `get job candidate with booking manager` in Postman:
    {
        "id": "00ec7211-bfa8-4e9d-a2cf-70e3a376689e",
        "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
        "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a",
        "status": "selected",
        "externalId": "300234321",
        "resume": "http://example.com",
        "createdAt": "2021-02-04T06:56:47.727Z",
        "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
        "updatedAt": "2021-02-04T07:00:20.797Z",
        "updatedBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c"
    }
    
    
    # Fire request `get resource booking with booking manager` in Postman:
    {
        "id": "566fc543-d70e-4edc-967b-c276031cd069",
        "projectId": 111,
        "userId": "a55fe1bc-1754-45fa-9adc-cf3d6d7c377a",
        "jobId": "e68e42fe-aff6-4c8d-9ff8-7578a26e6ad4",
        "status": "assigned",
        "startDate": "2020-09-27T04:17:23.131Z",
        "endDate": "2020-09-27T04:17:23.131Z",
        "memberRate": 13.23,
        "customerRate": 13,
        "rateType": "hourly",
        "createdAt": "2021-02-04T07:04:42.256Z",
        "createdBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c",
        "updatedAt": "2021-02-04T07:06:29.186Z",
        "updatedBy": "57646ff9-1cd3-4d3c-88ba-eb09a395366c"
    }
    

@imcaizheng
Copy link
Contributor

@maxceem There are a few other issues I just found:

  1. The bulky index script(npm run index:all) populates the deletedAt field from DB to ES, which should be excluded.
  2. When ran get job with booking manager in Postman, I didn't see any candidate being populated though there was actually a candidate associated with the job. I don't know whether if is an issue. Will look into that later.

@maxceem
Copy link
Contributor Author

maxceem commented Feb 4, 2021

@imcaizheng. Good catch regarding the index script. I did more debugging there and there are some issues with soft-deleting, we would handle it separately.

Regarding the 2nd one, I couldn't reproduce it.

The current issue is done, so I'm closing it.

@maxceem maxceem closed this as completed Feb 4, 2021
@maxceem
Copy link
Contributor Author

maxceem commented Feb 4, 2021

This ticket was not processed for payment. If you would like to process it for payment, please reopen it, add the tcx_FixAccepted label, and then close it again

This is an automated message for maxceem via Topcoder X

@maxceem
Copy link
Contributor Author

maxceem commented Feb 4, 2021

Challenge https://www.topcoder.com/challenges/765d5478-361f-49e5-8027-6c2c975ed330 has been created for this ticket.

This is an automated message for maxceem via Topcoder X

@maxceem maxceem closed this as completed Feb 4, 2021
@maxceem
Copy link
Contributor Author

maxceem commented Feb 4, 2021

Payment task has been updated: https://www.topcoder.com/challenges/765d5478-361f-49e5-8027-6c2c975ed330
Payments Complete
Winner: aaron2017
Copilot: maxceem
Challenge 765d5478-361f-49e5-8027-6c2c975ed330 has been paid and closed.

This is an automated message for maxceem via Topcoder X

@imcaizheng
Copy link
Contributor

@maxceem Cannot reproduce the second issues now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants