Shopify’s Leaky Bucket Algorithm BlowsApril 26th, 2018
The leaky bucket, for those of you who don’t know, is the metaphor as to the rate at which a bucket that is leaking can fill up/remain full.
For example, 40 litres go in the bucket. Then the bucket leaks at the rate of 2 litres per some unit of time.
This example pertains to the software limitations imposed by Shopify and the 40 initial calls, then 2 per second restricting their programmers.
But Jeff, what should the limit be?
API call limits should be infinite, even if by paid means.
Who does it differently?
WordPress…. and … pretty much everybody.
So it can be done?
How could WordPress achieve such a feat?
They never put one in.
What do you mean?
Leaky bucket algorithms don’t just fall out of the sky and embed themselves as software limitations at specific points in the code. Somebody had to have put that code there. WordPress didn’t do that.
Why would anybody put such a limitation in place?
No fucking clue.
It must have been necessary, right?
Nothing hardware can’t fix.
But Shopify would be massive and need to scale up, wouldn’t it?
Yes, they would need a scalable model to succeed.
So, what good does a 40 request, 2/sec limit do?
None. No good at all. It’s a terrible idea. I mean, if you can’t scale your hardware… if that is a constraint you have, it makes good sense to stretch out every byte of bandwidth. But if you can scale, you should scale. Throwing support behind you from the likes of Google or Amazon services is not that hard.
How much bandwidth would they need?
Well, put it this way. They will need more for all the products that people are seeing and buying off their websites. The thousands of users doing that at all times makes the amount that the developer does one batch update, kind of negligible.
Yeah, but how much more?
Let’s compare to somebody like Netflix. Way less. Nothing is more bandwidth intensive than streaming video. 30 fps of video + sound per second. Constant usage for Shopify would be transferring nothing close to that. They send small messages of a few KB.
Can you compare the two? One broadcasts and the other system [Shopify] needs to receive.
It’s still bandwidth. You gotta shop around for good bandwidth, but I don’t see that as being an issue. Not unless you pay ridiculously more for Upstream.
Ok, so they are limiting requests to ridiculously small amounts, and they are doing it for no reason?
Unless I’m wrong.
So what is the solution?
Take it out.
Take out the leaky bucket?
They probably can’t just take it out.
Maybe stuff will break. Ideally not. The person who put it in, in the first place would be the best person to ask. The system may fail somewhere else. It may have been the Leaky Bucket that prevented something from failing in the first place. If that is the case, more things need to be fixed. If it is more than 2 systems that would break because of their dependence on the leaky bucket, then … wow. That would be pretty sad.
Well firstly, look at what this restriction does. To update a product I send product data, about 50K data and 1MB image. I can send that to Shopify 40, then 2 per second. It takes 20 seconds for my bucket to empty back to 0. So I can run for 40 seconds, break 20 and start again. I can run 118/min. Updating 5000 products takes 42.37 minutes.
It doesn’t have to be 5000 products, it could be 2500 products twice, or 1000 products 5 times. Updating products should be as real-time as possible, if possible.
Who wants 5000 products on their site?
Well in the last year PrimeWalls and DOS Canada among others. If you combine an Etilize Database with distributors like Ingram, TechData, Synnex, SuppliesNetwork, etc. You have excellent quality organised content for millions of products. Still, think it’s a good idea to keep stores super small? Each one of these products could potentially be sold and make you money. Well, actually, they do. But no thanks to you Shopify. Instead of putting products in your system, I had to use my own system. So, the pages you see on DOS, are ones hosted on our servers that we have to transfer back to you and integrate with your theme and application/liquid systems. The products don’t actually exist on Shopify until somebody adds one to the cart. That is how I deal with your Leaky Bucket.
So this is something fixable Shopify should do?
Totally. Figure out what/if anything the Leaky Bucket breaks when it is disabled. Then throw hardware at that solution.
Give people better ways to scale programmatically. Give us a batch API, something that doesn’t make us queue up thousands of products.
How do you know you are not wrong?
I might be wrong. But the literature on the internet has directed me through this path.
- Google shopify api bulk update
- Click on the first result and see Alex RichterDeveloper Experience from Shopify say it’s not possible.
- Then go to his recommended reading and learn how you can build your own leaky bucket algorithm to mirror Shopify’s most effective.
Their solution is to tell every developer that uses their API their options are:
- Build a queue that passes information to them at the rate of 40, then 2 per second. But do it in parallel. (which I did)
- Manually Upload a CSV of products.
From a developer standpoint, I would be happy with being able to upload a CSV programmatically. The constraint of being able to update a limited number of customers, products, pages, collections whatever, is like working on dial-up with Shopify. A bulk import API call would also work. But really, I think just removing the limitation would be the best idea. I have considered that the bucket size and leak rate are doubled for Shopify Plus stores. But doubling a penny only makes it 2 cents.
I would love to work with Shopify to resolve this issue. I would be a happy beta tester. When I process this information outside of Shopify, on Linux-based systems, I can process about 250,000 products/min. Not image cropping/scaling, but stuff like price adjustments, inventory levels,etc. What I could transfer to Shopify would be more limited by bandwidth. But Shopify should be built to handle what I can throw at it in terms of product updates. After all, prices, customer lists, etc don’t need to be updated that often. A base package of $30/month that you collect for hosting a website would purchase about 6 CPUs of dedicated power/month. Can you really not spare us 1 CPU for 5 minutes of those 2,419,200 CPU minutes that this money could buy?
I think the best thing would be to have a common JSON database, like firebase that developers could gain store level access to. We essentially have this level of access already being able to write products to Shopfiy. So we just need programmatic access to this without any restrictions. That
jsturgis April 26th, 2018ABOUT THE AUTHOR:
Please excuse my lack of attention to spell checking and grammar. I was probably busy thinking about better variable names. I am a Model 3 owner and lover and TSLA long only ever looking to increase position as the exponential growth in TSLA revenue will inevitably dominate the failing FUD. Sending a dumb tweet is easy. Building a Gigafactory are not. (See what I mean about the grammar?) Place your bets accordingly.