Download Larger Datasets

With a large number of records to download you would not be able to do that via a simple Elasticsearch search as Elasticsearch's paging limit is 10,000 records. Therefore, you need to use Elasticsearch's scroll mechanism. An example of such a curl request would be (Note: for normal queries you would not add the ?scroll=1s to the end): curl --location 'https://api.scicrunch.io/elastic/v1/<<YOUR<https://api.scicrunch.io/elastic/v1/%3C%3CYOUR> TARGET INDEX>>/_search?scroll=1s' \ --header 'Content-Type: application/json' \ --header 'apikey: <<YOUR API KEY>>' \ --data '{ <<YOUR QUERY HERE>> }'

Note: The above example would download an entire index. To download specific content you would add a query component to the initial scroll request.

This would yield results: { "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoAgAAAAAF-0fRFkhYNzFBb1loU1dLbVpicWpmSTl3b1EAAAAABftH0hZIWDcxQW9ZaFNXS21aYnFqZkk5d29R", "took": 19, "timed_out": false, "_shards": { "total": 2, "successful": 2, "skipped": 0, "failed": 0 }, "hits": { ... You would then use the scroll_id for future request: curl --location 'https://api.scicrunch.io/elastic/v1/_search/scroll' \ --header 'Content-Type: application/json' \ --header 'apikey: <<YOUR API KEY>>' \ --data '{ "scroll" : "1s", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAgAAAAAF-0fRFkhYNzFBb1loU1dLbVpicWpmSTl3b1EAAAAABftH0hZIWDcxQW9ZaFNXS21aYnFqZkk5d29R" }' And then grab the scroll_id from that results and iterate.

Last updated