Load s3 file to db without downloading locally pandas






















Add a comment. Active Oldest Votes. Improve this answer. TomAugspurger TomAugspurger Works great. Two things: 1.

I have boto installed and it imports fine as well as pandas, but still I get the I am trying this method with latest version of pandas 0. Is there a way to convert to stringIO? Using IgorK's example, it would be s3. Updated for Pandas 0. Update for pandas 0. Isaac Isaac 1 1 silver badge 5 5 bronze badges. I was at a loss for what to do until I saw your answer. I love it when I can solve a difficult problem with about 12 characters.

NotImplementedError: Support for generic buffers has not been implemented. Saeed Rahman Saeed Rahman 61 1 1 silver badge 2 2 bronze badges. When answering an old question, your answer would be much more useful to other StackOverflow users if you included some context to explain how your answer helps, particularly for a question that already has an accepted answer. See: How do I write a good answer. Ze Tang Ze Tang 39 3 3 bronze badges. Sign up or log in Sign up using Google.

Last active Nov 8, Code Revisions 2 Stars 55 Forks Embed What would you like to do? Embed Embed this gist in your website. Share Copy sharable link for this gist. Learn more about clone URLs. Download ZIP. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters.

Working with the University of Toronto Data Science Team on kaggle competitions, there was only so much you could do on your local computer. We chose AWS for its ubiquity and familiarity. To prepare the data pipeline, I downloaded the data from kaggle onto a EC2 virtual instance, unzipped it, and stored it on S3. Storing the unzipped data prevents you from having to unzip it every time you want to use the data, which takes a considerable amount of time.

However, this increases the size of the data substantially and as a result, incurs higher storage costs. Now that the data was stored on AWS, the question was: How do we programmatically access the S3 data to incorporate it into our workflow? The following details how to do so in python. Now you must set up your security credentials.

See boto3 Quickstart for more detail. There are two main tools you can use to access S3: clients and resources.



0コメント

  • 1000 / 1000