Переходьте в офлайн за допомогою програми Player FM !
418: Mental Models For Reduce Functions
Manage episode 405955137 series 1401614
Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer.
The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. Joël and Stephanie discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming concepts can help simplify reduce expressions.
- Rails 7 EXPLAIN ANALYZE
- Blocks, symbol to proc, and symbols arguments for reduce
- Ruby tally
- Performance considerations for reduce in JavaScript
- Persistant data structures
- Avoid passing a block to map and reduce
- Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire
- monoids
- iteration anti-patterns
- Joël’s talk on “constructor replacement”
Transcript:
STEPHANIE: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn.
JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way.
STEPHANIE: So, Joël, what's new in your world?
JOËL: I've been doing a bunch of fiddling with query optimization this week, and I've sort of run across an interesting...but maybe it's more of an interesting realization because it's interesting in the sort of annoying way. And that is that, using ActiveRecord scopes with certain more complex query pieces, particularly unions, can lead to queries that are really slow, and you have to rewrite them differently in a way that's not reusable in order to make them fast.
In particular, if you have sort of two other scopes that involve joins and then you combine them using a union, you're unioning two sort of joins. Later on, you want to change some other scope that does some wares or something like that. That can end up being really expensive, particularly if some of the underlying tables being joined are huge. Because your database, in my case, Postgres, will pull a lot of this data into the giant sort of in-memory table as it's, like, building all these things together and to filter them out. And it doesn't have the ability to optimize the way it would on a more traditional relation.
A solution to this is to make sure that the sort of subqueries that are getting unioned are optimized individually. And that can mean moving conditions that are outside the union inside. So, if I'm chaining, I don't know, where active is true on the outer query; on the union itself, I might need to move that inside each of the subqueries. So, now, in the two or three subqueries that I'm unioning, each of them needs to have a 'where active true' chained on it.
STEPHANIE: Interesting. I have heard this about using ActiveRecord scopes before, that if the scopes are quite complex, chaining them might not lead to the most performant query. That is interesting. By optimizing the subqueries, did you kind of change the meaning of them? Was that something that ended up happening?
JOËL: So, the annoying thing is that I have a scope that has the union in it, and it does some things sort of on its own. And it's used in some places. There are also other places that will try to take that scope that has the union on it, chain some other scopes that do other joins and some more filters, and that is horribly inefficient. So, I need to sort of rewrite the sort of subqueries that get union to include all these new conditions that only happen in this one use case and not in the, like, three or four others that rely on that union.
So, now I end up with some, like, awkward query duplication in different call sites that I'm not super comfortable about, but, unfortunately, I've not found a good way to make this sort of nicely reusable. Because when you want to chain sort of more things onto the union, you need to shove them in, and there's no clean way of doing that.
STEPHANIE: Yeah. I think another way I've seen this resolved is just writing it in SQL if it's really complex and it becoming just a bespoke query. We're no longer trying to use the scope that could be reusable.
JOËL: Right. Right. In this case, I guess, I'm, like, halfway in between in that I'm using the ActiveRecord DSL, but I am not reusing scopes and things. So, I sort of have the, I don't know, naive union implementation that can be fine in all of the simpler use cases that are using it. And then the query that tries to combine the union with some other fancy stuff it just gets its own separate implementation different than the others that it has optimized. So, there are sort of two separate paths, two separate implementations. I did not drop down to writing raw SQL because I could use the ActiveRecord DSL. So, that's what I've been working with.
What's new in your world this week?
STEPHANIE: So, a couple of weeks ago, I think, I mentioned that I was working on a Rails 7 upgrade, and we have gotten it out the door. So, now the client application I'm working on is on Rails 7, which is exciting for the team. But in an effort to make the upgrade as incremental as possible, we did, like, back out of a few of the new application config changes that would have led us down a path of more work. And now we're kind of following up a little bit to try to turn some of those configs on to enable them.
And it was very exciting to kind of, like, officially be on Rails 7. But I do feel like we tried to go for, like, the minimal amount of work possible in that initial big change. And now we're having to kind of backfill a little bit on some of the work that was a little bit more like, oh, I'm not really sure, like, how big this will end up being.
And it's been really interesting work, I think, because it requires, like, two different mindsets. Like, one of them is being really patient and focused on tedious work. Like, okay, what happens when we enable this config option? Like, what changes? What errors do we see? And then having to turn it back off and then go in and fix them.
But then another, I think, like, headspace that we have to be in is making decisions about what to do when we come to a crossroads around, like, okay, now that we are starting to see all the changes that are coming about from enabling this config, is this even what we want to do? And it can be really hard to switch between those two modes of thinking.
JOËL: Yeah. How do you try to balance between the two?
STEPHANIE: So, I luckily have been pairing with another dev, and I've actually found that to be really effective because he has, I guess, just, like, a little bit more of that patience to do the more tedious, mundane [laughs] aspects of, like, driving the code changes. And I have been riding along.
But then I can sense, like, once he gets to the point of like, "Oh, I'm not sure if we should keep going down this road," I can step in a little bit more and be like, "Okay, like, you know, I've seen us do this, like, five times now, and maybe we don't want to do that." Or maybe being like, "Okay, we don't have a really clear answer, but, like, who can we talk to to find out a little bit more or get their input?"
And that's been working really well for me because I've not had a lot of energy to do more of that, like, more manual or tedious labor [chuckles] that comes with working on that low level of stuff. So yeah, I've just been pleasantly surprised by how well we are aligning our superpowers.
JOËL: To use some classic business speech, how does it feel to be in the future on Rails 7?
STEPHANIE: Well, we're not quite up, you know, up to modern days yet, but it does feel like we're getting close. And, like, I think now we're starting to entertain the idea of, like, hmm, like, could we be even on main? I don't think it's really going to happen, but it feels a little bit more possible. And, in general, like, the team thinks that that could be, like, really exciting. Or it's easier, I think, once you're a little bit more on top of it. Like, the worst is when you get quite behind, and you end up just feeling like you're constantly playing catch up. It just feels a little bit more manageable now, which is good.
JOËL: I learned this week a fun fact about Rails 7.1, in particular, which is that the analyze method on ActiveRecord queries, which allowed you to sort of get SQL EXPLAIN statements, now has the ability to pass in a couple of extra parameters. So, there are symbols, and you can pass in things like analyze or verbose, which allows you to get sort of more data out of your EXPLAIN query, which can be quite nice when you're debugging for performance.
So, if you're in the future and you're on Rails 7.1 and you want sort of the in-depth query plans, you don't need to copy the SQL into a Postgres console to get access to the sort of fully developed EXPLAIN plan. You can now do it by passing arguments to EXPLAIN, which I'm very happy for.
STEPHANIE: That's really nice.
JOËL: So, we've mentioned before that we have a developers' channel on Slack here at thoughtbot, and there's all sorts of fun conversations that happen there. And there was one recently that really got me interested, where people were talking about Ruby's reduce method, also known as inject. And it's one of those methods that's kind of complicated, or it can be really confusing.
And there was a whole thread where people were talking about different mental models that they had around the reduce method and how they sort of understand the way it works. And I'd be curious to sort of dig into each other's mental models of that today. To kick us off, like, how comfortable do you feel with Ruby's reduce method? And do you have any mental models to kind of hold it in your head?
STEPHANIE: Yeah, I think reduce is so hard to wrap your head around, or it might be one of the most difficult, I guess, like, functions a new developer encounters, you know, in trying to understand the tools available to them. I always have to look up the order of the arguments [laughs] for reduce.
JOËL: Every time.
STEPHANIE: Yep. But I feel like I finally have a more intuitive sense of when to use it. And my mental model for it is collapsing a collection into one value, and, actually, that's why I prefer calling it reduce rather than the inject alias because reduce kind of signals to me this idea of going from many things to one canonical thing, I suppose.
JOËL: Yeah, that's a very common use case for reducing, and I guess the name itself, reducing, kind of has almost that connotation. You're taking many things, and you're going to reduce that down to a single thing.
STEPHANIE: What was really interesting to me about that conversation was that some people kind of had the opposite mental model where it made a bit more sense for them to think about injecting and, specifically, like, the idea of the accumulator being injected with values, I suppose. And I kind of realized that, in some ways, they're kind of antonyms [chuckles] a little bit because if you're focused on the accumulator, you're kind of thinking about something getting bigger. And that kind of blew my mind a little bit when I realized that, in some ways, they can be considered opposites.
JOËL: That's really fascinating. It is really interesting, I think, the way that we can take the name of a method and then almost, like, tell ourselves a story about what it does that then becomes our way of remembering how this method works. And the story we tell for the same method name, or in this case, maybe there's a few different method names that are aliases, can be different from person to person.
I know I tend to think of inject less in terms of injecting things into the accumulator and more in terms of injecting some kind of operator between every item in the collection. So, if we have an array of numbers and we're injecting plus, in my mind, I'm like, oh yeah, in between each of the numbers in the collection, just inject a little plus sign, and then do the math. We're summing all the items in the collection.
STEPHANIE: Does that still hold up when the operator becomes a little more complex than just, you know, like, a mathematical operator, like, say, a function?
JOËL: Well, when you start passing a block and doing custom logic, no, that mental model kind of falls apart. In order for it to work, it also has to be something that you can visualize as some form of infix operator, something that goes between two values rather than, like, a method name, which is typically in prefix position.
I do want to get at this idea, though: the difference between sort of the block version versus passing. There are ways where you can just do a symbol, and that will call a method on each of the items. Because I have a bit of a hot take when it comes to writing reduce blocks or inject blocks that are more accessible, easier to understand.
And that is, generally, that you shouldn't, or more specifically, you should not have a big block body. In general, you should be either using the symbol version or just calling a method within the block, and it's a one-liner. Which means that if you have some complex behavior, you need to find a way to move that out of this sort of collection operation and into instance methods on the objects being iterated.
STEPHANIE: Hmm, interesting. By one-liner do you mean passing the name of the method as a proc or actually, like, having your block that then calls the method? Because I can see it becoming even simpler if you have already extracted a method.
JOËL: Yeah, if you can do symbol to proc, that's amazing, or even if you can use just the straight-up symbol way of invoking reduce or inject. That typically means you have to start thinking about the types of objects that you are working with and what methods can be moved onto them. And sometimes, if you're working with hashes or something like that that don't have domain methods for what you want, that gets really awkward. And so, then maybe that becomes maybe a hint that you've got some primitive obsession happening and that this hash that sort of wants a domain object or some kind of domain method probably should be extracted to its own object.
STEPHANIE: I'll do you with another kind of spicy take. I think, in that case, maybe you don't want a reduce at all. If you're starting to find that...well, okay, I think it maybe could depend because there could be some very, like, domain-specific logic. But I have seen reduce end up being used to transform the structure of the initial collection when either a different higher-order function can be used or, I don't know, maybe you're just better off writing it with a regular loop [laughs]. It could be clearer that way.
JOËL: Well, that's really interesting because...so, you mentioned the idea that we could use a different higher-order function, and, you know, higher-order function is that fancy term, just a method that accepts another method as an argument. In Ruby, that just means your method accepts a block. Reduce can be used to implement pretty much the entirety of enumerable. Under the hood, enumerable is built in terms of each. You could implement it in terms of reduce.
So, sometimes it's easy to re-implement one of the enumerable methods yourself, accidentally, using reduce. So, you've written this, like, complex reduce block, and then somebody in review comes and looks at it and is like, "Hey, you realize that's just map. You've just recreated map. What if we used map here?"
STEPHANIE: Yeah. Another one I've seen a lot in JavaScript land where there are, you know, fewer utility functions is what we now have in Ruby, tally. I feel like that was a common one I would see a lot when you're trying to count instances of something, and I've seen it done with reduce. I've seen it done with a for each. And, you know, I'm sure there are libraries that actually provide a tally-like function for you in JS. But I guess that actually makes me feel even more strongly about this idea that reduce is best used for collapsing something as opposed to just, like, transforming a data structure into something else.
JOËL: There's an interesting other mental model for reduce that I think is hiding under what we're talking about here, and that is the idea that it is a sort of mid-level abstraction for dealing with collections, as opposed to something like map or select or some of those other enumerable helpers because those can all be implemented in terms of reduce. And so, in many cases, you don't need to write the reduce because the library maintainer has already used reduce or something equivalent to build these higher-level helpers for you.
STEPHANIE: Yeah, it's kind of in that weird point between, like, very powerful [chuckles] so that people can start to do some funky things with it, but also sometimes just necessary because it can feel a little bit more concise that way.
JOËL: I've done a fair amount of functional programming in languages like Elm. And there, if you're building a custom data structure, the sort of lowest-level way you have of looping is doing a recursion, and recursions are messy. And so, what you can do instead as a library developer is say, "You know what, I don't want to be writing recursions for all of these." I don't know; maybe I'm building a tree library. I don't want to write a recursion for every different function that goes over trees if I want to map or filter or whatever. I'm going to write reduce using recursion, and then everything else can be written in terms of reduce.
And then, if people want to do custom things, they don't need to recurse over my tree. They can use this reduce function, which allows them to do most of the traversals they want on the tree without needing to touch manual recursion. So, there's almost, like, a low-level, mid-level, high-level in the library design, where, like, lowest level is recursion. Ideally, nobody touches that. Mid-level, you've got reducing that's built out on top of recursion. And then, on top of that, you've got all sorts of other helpers, like mapping, like filtering, things like that.
STEPHANIE: Hmm. I'm wondering, do you know of any performance considerations when it comes to using reduce built off a recursion?
JOËL: So, one of the things that can be really nice is that writing a recursion yourself is dangerous. It's so easy to, like, accidentally introduce Stack Overflow. You could also write a really inefficient one. So, ideally, what you do is that you write a reduce that is safe and that is fast. And then, everybody else can just use that to not have to worry about the sort of mechanics of traversing the collection. And then, just use this. It already has all of the safety and speed features built in. You do have to be careful, though, because reduce, by nature, traverses the entire collection. And if you want to break out early of something expensive, then reduce might not be the tool for you.
STEPHANIE: I was also reading a little bit about how, in JavaScript, a lot of developers like to stick to that idea of a pure function and try to basically copy the entire accumulator for every iteration and creating a new object for that. And that has led to some memory issues as well. As opposed to just mutating the accumulator, having, especially when you, you know, are going through a collection, like, really large, making that copy every single time and creating, yeah [chuckles], just a lot of issues that way. So, that's kind of what prompted that question.
JOËL: Yeah, that can vary a lot by language and by data structure. In more functional languages that try to not mutate, they often have this idea of what they call persistent data structures, where you can sort of create copies that have small modifications that don't force you to copy the whole object under the hood. They're just, like, pointers. So, like, hey, we, like, are the same as this other object, but with this extra element added, or something like that. So, if you're growing an array or something like that, you don't end up with 10,000 copies of the array with, like, a new element every time.
STEPHANIE: Yeah, that is interesting. And I feel like trying to adopt different paradigms for different tools, you know, is not always as straightforward as some wish it were [laughs].
JOËL: I do want to give a shout-out to an academic paper that is...it is infamously dense. The title of it is Functional Programming with Bananas, Lenses, and Barbed Wire.
STEPHANIE: It doesn't sound dense; it sounds fun. Well, I don't about barbed wire.
JOËL: It sounds fun, right?
STEPHANIE: Yeah, but certainly quirky [laughs].
JOËL: It is incredibly dense. And they've, like, created this custom math notation and all this stuff. But the idea that they pioneered there is really cool, this idea that kind of like I was talking about sort of building libraries in different levels. Their idea is that recursion is generally something that's unsafe and that library and language designers should take care of all of the recursion and instead provide some of these sort of mid-level helper methods to do things. Reducing is one of them, but their proposal is that it's not the only one. There's a whole sort of family of similar methods that are there that would be useful in different use cases.
So, reduce allows you to sort of traverse the whole thing. It does not allow you to break out early. It does not allow you to keep sort of track of a sort of extra context element if you want to, like, be traversing a collection but have a sort of look forward, look back, something like that. So, there are other variations that could handle those. There are variations that are the opposite of reduce, where you're, like, inflating, starting from a few parameters and building a collection out of them.
So, this whole concept is called recursion schemes, and you can get, like, really deep into some theory there. You'll hear fancy words like catamorphisms and anamorphisms. There's a whole world to explore in that area. But at its core, it's this idea that you can sort of slice up things into this sort of low-level recursion, mid-level helpers, and then, like, kind of userland helpers built on top of that.
STEPHANIE: Wow. That is very intense; it sounds like [chuckles]. I'm happy not to ever have to write a recursion ever again, probably [laughs]. Have you ever, as just a web developer in your day-to-day programming, found a really good use case for dropping down to that level? Or are you kind of convinced that, like, you won't really ever need to?
JOËL: I think it depends on the paradigm of the language you're working in. In Ruby, I've very rarely needed to write a recursion. In something like Elm, I've had to do that, eh, not infrequently. Again, it depends, like, if I'm doing more library-esque code versus more application code. If I'm writing application code and I'm using an existing, let's say, tree library, then I typically don't need to write a recursion because they've already written traversals for me. If I'm making my own and I have made my own tree libraries, then yes, I'm writing recursions myself and building those traversals so that other people don't have to.
STEPHANIE: Yeah, that makes sense. I'd much rather someone who has read that paper [laughs] write some traversal methods for me.
JOËL: And, you know, for those who are curious about it, we will put a link to this paper in the description.
So, we've talked about a sort of very academic mental model way of thinking about reducing. I want to shift gears and talk about one that I have found is incredibly practical, and that is the idea that reduce is a way to scale an operation that works on two objects to an operation that works on sort of an unlimited number of objects.
To make it more concrete, take something like addition. I can add two numbers. The plus operator allows me to take one number, add another, get a sum. But what if I want to not just add two numbers? I want to add an arbitrary number of numbers together. Reduce allows me to take that plus operator and then just scale it up to as many numbers as I want. I can just plug that into, you know, I have an array of numbers, and I just call dot reduce plus operator, and, boom, it can now scale to as many numbers as I want, and I can sum the whole thing.
STEPHANIE: That dovetails quite nicely with your take earlier about how you shouldn't pass a block to reduce. You should extract that into a method. Don't you think?
JOËL: I think it does, yes. And then maybe it's, like, sort of two sides of a coin because I think what this leads to is an approach that I really like for reducing because sometimes, you know, here, I'm starting with addition. I'm like, oh, I have addition. Now, I want to scale it up. How do I do that? I can use reduce. Oftentimes, I'm faced with sort of the opposite problem. I'm like, oh, I need to add all these numbers together. How do I do that? I'm like, probably with a reduce. But then I start writing the block, and, like, I get way too into my head about the accumulator and what's going to happen.
So, my strategy for writing reduce expressions is to, instead of trying to figure out how to, like, do the whole thing together, first ask myself, how do I want to combine any two elements that are in the array? So, I've got an array of numbers, and I want to sum them all. What is the thing I need to do to combine just two of those? Forget the array. Figure that out.
And then, once I have that figured out, maybe it's an existing method like plus. Maybe it's a method I need to define on it if it's a custom object. Maybe it's a method that I write somewhere. Then, once I have that, I can say, okay, I can do it for two items. Now, I'm going to scale it up to work for the whole array, and I can plug it into reduce. And, at that point, the work is already basically done, so I don't end up with a really complex block. I don't end up, like, almost ending in, like, a recursive infinite loop in my head because I do that.
STEPHANIE: [laughs].
JOËL: So, that approach of saying, start by figuring out what is the operation you want to do to combine two elements, and then use reduce as a way to scale that to your whole array is a way that I've used to keep things simple in my mind.
STEPHANIE: Yeah, I like that a lot as a supplement to the model I shared earlier because, for me, when I think about reducing as, like, collapsing into a value, you kind of are just like, well, okay, I start with the collection, and then somehow I get to my single value. But the challenge is figuring out how that happens [laughs], like, the magic that happens in between that.
And I think another alias that we haven't mentioned yet for reduce that is used in a lot of other languages is fold. And I actually like that one a lot, and I think it relates to your mental model. Because when I think about folding, I'm picturing folding up a paper like an accordion. And you have to figure out, like, what is the first fold that I can make? And just repeating that over and over to get to your little stack of accordion paper [laughs]. And if you can figure out just that first step, then you pretty much, like, have the recipe for getting from your initial input to, like, your desired output.
JOËL: Yeah. I think fold is interesting in that some languages will make a distinction between fold and reduce. They will have both. And typically, fold will require you to pass an initial value, like a starting accumulator, to start it off. Whereas reduce will sort of assume that your array can use the first element of the array as the first accumulator.
STEPHANIE: Oh, I just came up with another visual metaphor for this, which is, like, folding butter into croissant pastry when the butter is your initial value [laughs].
JOËL: And then the crust is, I guess, the elements in the array.
STEPHANIE: Yeah. Yeah. And then you get a croissant out of it [laughs]. Don't ask me how it gets to a perfectly baked, flaky, beautiful croissant, but somehow that happens [laughs].
JOËL: So, there's an interesting sort of subtlety here that I think happens because there are sort of two slightly different ways that you can interact with a reduce. Sometimes, your accumulator is of the same type as the elements in your array. So, you're summing an array of numbers, and your accumulator is the sum, but each of the elements in the array are also numbers. So, it's numbers all the way through. And sometimes, your accumulator has a different type than the items in the array. So, maybe you have an array of words, and you want to get the sum of all of the characters and all the words. And so, now your accumulator is a number, but each of the items in the array are strings.
STEPHANIE: Yeah, that's an interesting distinction because I think that's where you start to see the complex blocks being passed and reduced.
JOËL: The complex blocks, definitely; I think they tend to show up when your accumulator has a different type than the individual items. So, maybe that's, like, a slightly more complicated use case. Oftentimes, too, the accumulator ends up being some, like, more complex, like, hash or something that maybe would really benefit from being a custom object.
STEPHANIE: I've never done that before, but I can see why that would be really useful. Do you have an example of when you used a custom object as the accumulator?
JOËL: So, I've done it for situations where I'm working with objects that are doing tally-like operations, but I'm not doing just a generic tally. There's some domain-specific stuff happening. So, it's some sort of aggregate counter on multiple dimensions that you can use, and that can get really ugly. And you can either do it with a reduce or you can have some sort of, like, initial version of the hash outside and do an each and mutate the hash and stuff like that. All of these tend to be a little bit ugly. So, in those situations, I've often created some sort of custom object that has some instance methods that allow to sort of easily add new elements to it.
STEPHANIE: That's really interesting because now I'm starting to think, what if the elements in the collection were also a custom object? [chuckles] And then things could, I feel like, could be really powerful [laughs].
JOËL: There's often a lot of value, right? Because if the items in the collection are also a custom object, you can then have methods on them. And then, again, the sort of complexity of the reduce can sort of, like, fade away because it doesn't own any of the logic. All it does is saying, hey, there's a thing you can do to combine two items. Let's scale it up to work on a collection of items. And now you've sort of, like, really simplified what logic is actually owned inside the reduce.
I do want to shout out for those listeners who are theory nerds and want to dig into this. When you have a reduce, and you've got an operation where all the values are of the same type, including the accumulator, typically, what you've got here is some form of monoid. It may be a semigroup. So, if you want to dig into some theory, those are the words to Google and to go a deep dive on.
The main thing about monoids, in particular, is that monoids are any objects that have both a sort of a base case, a sort of empty version of themselves, and they have some sort of combining method that allows you to combine two values of that type. If your object has these things and follows...there's a few rules that have to be true. You have a monoid. And they can then be sort of guaranteed to be folded nicely because you can plug in their base case as your initial accumulator. And you can plug in their combining method as just the value of the block, and everything else just falls into place.
A classic here is addition for numbers. So, if you want to add two numbers, your combining operator is a plus. And your sort of empty value is a zero. So, you would say, reduce initial value is zero, array of numbers. And your block is just plus, and it won't sum all of the numbers. You could do something similar with strings, where you can combine strings together with plus, and, you know, your empty string is your base case. So, now you're doing sort of string concatenation over arbitrary number of strings.
Turns out there's a lot of operations that fall into that, and you can even define some of those on your custom object. So, you're like, oh, I've got a custom object. Maybe I want some way of, like, combining two of them together. You might be heading in the direction of doing something that is monoidal, and if so, that's a really good hint to know that it can sort of, like, just drop into place with a fold or a reduce and that that is a tool that you have available to you.
STEPHANIE: Yeah, well, I think my eyes, like, widened a little bit when you first dropped the term monoid [laughs]. I do want to spend the last bit of our time talking about when not to use reduce, and, you know, we did talk a lot about recursion. But when do you think a regular old loop will just be enough?
JOËL: So, you're suggesting when would you want to use something like an each rather than a reduce?
STEPHANIE: Yeah. In my mind, you know, you did offer, like, a lot of ways to make reduce simpler, a lot of strategies to end up with some really nice-looking syntax [chuckles], I think. But, oftentimes, I think it can be equally as clear storing your accumulator outside of the iteration and that, like, is enough for me to understand. And reduce takes a little bit of extra overhead to figure out what I'm looking at. Do you have any thoughts about when you would prefer to do that? Or do you think that you would usually reach for something else?
JOËL: Personally, I generally don't like the pattern of using each to iterate over a collection and then mutate some external accumulator. That, to me, is a bit of a code smell. It's a sign that each is not quite powerful enough to do the thing that I want to do and that I'm probably needing some sort of more specialized form of iteration. Sometimes, that's reduce. Oftentimes, because each can suffer from the same problem you mentioned from reduce, where it's like, oh, you're doing this thing where you mutate an external accumulator. Turns out what you're really doing is just map. So, use map or use select or, you know, some of the other built-in iterators from the enumerable library.
There's a blog post on the thoughtbot blog that I continually link to people. And when I see the pattern of, like, mutating an external variable with each, yeah, I tend to see that as a bit of a code smell. I don't know that I would never do it, but whenever I see that, it's a sign to me to, like, pause and be like, wait a minute, is there a better way to do this?
STEPHANIE: Yeah, that's fair. I like the idea that, like, if there's already a method available to you that is more specific to go with that. But I also think that sometimes I'd rather, like, come across that pattern of mutating a variable outside of the iteration over, like, someone trying to do something clever with the reduce.
JOËL: Yeah, I guess reduce, especially if it's got, like, a giant block and you've got then, like, things in there that break or call next to skip iterations and things like that, that gets really mind-bending really quickly. I think a case where I might consider using an each over a reduce, and that's maybe generally when I tend to use each, is when I'm doing side effects. If I'm using a reduce, it's because I care about the accumulated value at the end. If I'm using each, it's typically because I am trying to do some amount of side effects.
STEPHANIE: Yeah, that's a really good call out. I had that written down in my notes, and I'm glad you brought it up because I've seen them get conflated a little bit, and perhaps maybe that's the source of the pain that I'm talking about. But I really like that heuristic of reduce as, you know, you're caring about the output, as opposed to what's going on inside. Like, you don't want any unexpected behavior.
JOËL: And I think that applies to something like map as well. My sort of heuristic is, if I'm doing side effects, I want each. If I want transformed values that are sort of one-to-one in the collection, I want map. If I want a single sort of aggregate value, then I want reduce.
STEPHANIE: I think that's the cool thing about mixing paradigms sometimes, where all the strategies you talked about in terms of, you know, using custom, like, objects for your accumulator, or the elements in your collection, like, that's something that we get because, you know, we're using an object-oriented language like Ruby. But then, like, you also are kind of bringing the functional programming lens to, like, when you would use reduce in the first place. And yeah, I am just really excited now [chuckles] to start looking for some places I can use reduce after this conversation and see what comes out of it.
JOËL: I think I went on a bit of an interesting journey where, as a newer programmer, reduce was just, like, really intense. And I struggled to understand it. And I was like, ban it from code. I don't want to ever see it. And then, I got into functional programming. I was like, I'm going to do reduce everywhere. And, honestly, it was kind of messy.
And then I, like, went really deep on a lot of functional theory, and I think understood some things that then I was able to take back to my code and actually write reduce expressions that are much simpler so that now my heuristic is like, I love reduce; I want to use it, but I want as little as possible in the reduce itself. And because I understand some of these other concepts, I have the ability to know what things can be extracted in a way that will feel very natural, in a way that myself from five years ago would have just been like, oh, I don't know. I've got this, you know, 30-line reduce expression that I know is complicated, but I don't know how to improve.
And so, a little bit of the underlying theory, I don't think it's necessary to understand these simplified reduces, but as an author who's writing them, I think it helps me write reduces that are simpler. So, that's been my journey using reduce.
STEPHANIE: Yeah. Well, thanks for sharing. And I'm really excited. I hope our listeners have learned some new things about reduce and can look at it from a different light.
JOËL: There are so many different perspectives. And I think we keep discovering new mental models as we talk to different people. It's like, oh, this particular perspective. And there's one that we didn't really dig into but that I think makes more sense in a functional world that's around sort of deconstructing a structure and then rebuilding it with different components. The shorthand name of this mental model, which is a fairly common one, is constructor replacement. For anyone who's interested in digging into that, we'll link it in the show notes.
I gave a talk at an Elm meetup where I sort of dug into some of that theory, which is really interesting and kind of mind-blowing. Not as relevant, I think, for Rubyists, but if you're in a language that particularly allows you to build custom structures out of recursive types or what are sometimes called algebraic data types, or tagged unions, or discriminated unions, this thing goes by a bajillion names, that is a really interesting other mental model to look at.
And, again, I don't think the list that we've covered today is exhaustive. You know, I would love it for any of our listeners; if you have your own mental models for how to think about folding, injecting, reducing, send them in: hosts@bikeshed.fm. We'd love to hear them.
STEPHANIE: And on that note, shall we wrap up?
JOËL: Let's wrap up.
STEPHANIE: Show notes for this episode can be found at bikeshed.fm.
JOËL: This show has been produced and edited by Mandy Moore.
STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.
JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.
STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.
JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.
ALL: Byeeeeeeee!!!!!!!
AD:
Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us.
More info on our website at: tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions.
448 епізодів
Manage episode 405955137 series 1401614
Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer.
The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. Joël and Stephanie discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming concepts can help simplify reduce expressions.
- Rails 7 EXPLAIN ANALYZE
- Blocks, symbol to proc, and symbols arguments for reduce
- Ruby tally
- Performance considerations for reduce in JavaScript
- Persistant data structures
- Avoid passing a block to map and reduce
- Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire
- monoids
- iteration anti-patterns
- Joël’s talk on “constructor replacement”
Transcript:
STEPHANIE: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn.
JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way.
STEPHANIE: So, Joël, what's new in your world?
JOËL: I've been doing a bunch of fiddling with query optimization this week, and I've sort of run across an interesting...but maybe it's more of an interesting realization because it's interesting in the sort of annoying way. And that is that, using ActiveRecord scopes with certain more complex query pieces, particularly unions, can lead to queries that are really slow, and you have to rewrite them differently in a way that's not reusable in order to make them fast.
In particular, if you have sort of two other scopes that involve joins and then you combine them using a union, you're unioning two sort of joins. Later on, you want to change some other scope that does some wares or something like that. That can end up being really expensive, particularly if some of the underlying tables being joined are huge. Because your database, in my case, Postgres, will pull a lot of this data into the giant sort of in-memory table as it's, like, building all these things together and to filter them out. And it doesn't have the ability to optimize the way it would on a more traditional relation.
A solution to this is to make sure that the sort of subqueries that are getting unioned are optimized individually. And that can mean moving conditions that are outside the union inside. So, if I'm chaining, I don't know, where active is true on the outer query; on the union itself, I might need to move that inside each of the subqueries. So, now, in the two or three subqueries that I'm unioning, each of them needs to have a 'where active true' chained on it.
STEPHANIE: Interesting. I have heard this about using ActiveRecord scopes before, that if the scopes are quite complex, chaining them might not lead to the most performant query. That is interesting. By optimizing the subqueries, did you kind of change the meaning of them? Was that something that ended up happening?
JOËL: So, the annoying thing is that I have a scope that has the union in it, and it does some things sort of on its own. And it's used in some places. There are also other places that will try to take that scope that has the union on it, chain some other scopes that do other joins and some more filters, and that is horribly inefficient. So, I need to sort of rewrite the sort of subqueries that get union to include all these new conditions that only happen in this one use case and not in the, like, three or four others that rely on that union.
So, now I end up with some, like, awkward query duplication in different call sites that I'm not super comfortable about, but, unfortunately, I've not found a good way to make this sort of nicely reusable. Because when you want to chain sort of more things onto the union, you need to shove them in, and there's no clean way of doing that.
STEPHANIE: Yeah. I think another way I've seen this resolved is just writing it in SQL if it's really complex and it becoming just a bespoke query. We're no longer trying to use the scope that could be reusable.
JOËL: Right. Right. In this case, I guess, I'm, like, halfway in between in that I'm using the ActiveRecord DSL, but I am not reusing scopes and things. So, I sort of have the, I don't know, naive union implementation that can be fine in all of the simpler use cases that are using it. And then the query that tries to combine the union with some other fancy stuff it just gets its own separate implementation different than the others that it has optimized. So, there are sort of two separate paths, two separate implementations. I did not drop down to writing raw SQL because I could use the ActiveRecord DSL. So, that's what I've been working with.
What's new in your world this week?
STEPHANIE: So, a couple of weeks ago, I think, I mentioned that I was working on a Rails 7 upgrade, and we have gotten it out the door. So, now the client application I'm working on is on Rails 7, which is exciting for the team. But in an effort to make the upgrade as incremental as possible, we did, like, back out of a few of the new application config changes that would have led us down a path of more work. And now we're kind of following up a little bit to try to turn some of those configs on to enable them.
And it was very exciting to kind of, like, officially be on Rails 7. But I do feel like we tried to go for, like, the minimal amount of work possible in that initial big change. And now we're having to kind of backfill a little bit on some of the work that was a little bit more like, oh, I'm not really sure, like, how big this will end up being.
And it's been really interesting work, I think, because it requires, like, two different mindsets. Like, one of them is being really patient and focused on tedious work. Like, okay, what happens when we enable this config option? Like, what changes? What errors do we see? And then having to turn it back off and then go in and fix them.
But then another, I think, like, headspace that we have to be in is making decisions about what to do when we come to a crossroads around, like, okay, now that we are starting to see all the changes that are coming about from enabling this config, is this even what we want to do? And it can be really hard to switch between those two modes of thinking.
JOËL: Yeah. How do you try to balance between the two?
STEPHANIE: So, I luckily have been pairing with another dev, and I've actually found that to be really effective because he has, I guess, just, like, a little bit more of that patience to do the more tedious, mundane [laughs] aspects of, like, driving the code changes. And I have been riding along.
But then I can sense, like, once he gets to the point of like, "Oh, I'm not sure if we should keep going down this road," I can step in a little bit more and be like, "Okay, like, you know, I've seen us do this, like, five times now, and maybe we don't want to do that." Or maybe being like, "Okay, we don't have a really clear answer, but, like, who can we talk to to find out a little bit more or get their input?"
And that's been working really well for me because I've not had a lot of energy to do more of that, like, more manual or tedious labor [chuckles] that comes with working on that low level of stuff. So yeah, I've just been pleasantly surprised by how well we are aligning our superpowers.
JOËL: To use some classic business speech, how does it feel to be in the future on Rails 7?
STEPHANIE: Well, we're not quite up, you know, up to modern days yet, but it does feel like we're getting close. And, like, I think now we're starting to entertain the idea of, like, hmm, like, could we be even on main? I don't think it's really going to happen, but it feels a little bit more possible. And, in general, like, the team thinks that that could be, like, really exciting. Or it's easier, I think, once you're a little bit more on top of it. Like, the worst is when you get quite behind, and you end up just feeling like you're constantly playing catch up. It just feels a little bit more manageable now, which is good.
JOËL: I learned this week a fun fact about Rails 7.1, in particular, which is that the analyze method on ActiveRecord queries, which allowed you to sort of get SQL EXPLAIN statements, now has the ability to pass in a couple of extra parameters. So, there are symbols, and you can pass in things like analyze or verbose, which allows you to get sort of more data out of your EXPLAIN query, which can be quite nice when you're debugging for performance.
So, if you're in the future and you're on Rails 7.1 and you want sort of the in-depth query plans, you don't need to copy the SQL into a Postgres console to get access to the sort of fully developed EXPLAIN plan. You can now do it by passing arguments to EXPLAIN, which I'm very happy for.
STEPHANIE: That's really nice.
JOËL: So, we've mentioned before that we have a developers' channel on Slack here at thoughtbot, and there's all sorts of fun conversations that happen there. And there was one recently that really got me interested, where people were talking about Ruby's reduce method, also known as inject. And it's one of those methods that's kind of complicated, or it can be really confusing.
And there was a whole thread where people were talking about different mental models that they had around the reduce method and how they sort of understand the way it works. And I'd be curious to sort of dig into each other's mental models of that today. To kick us off, like, how comfortable do you feel with Ruby's reduce method? And do you have any mental models to kind of hold it in your head?
STEPHANIE: Yeah, I think reduce is so hard to wrap your head around, or it might be one of the most difficult, I guess, like, functions a new developer encounters, you know, in trying to understand the tools available to them. I always have to look up the order of the arguments [laughs] for reduce.
JOËL: Every time.
STEPHANIE: Yep. But I feel like I finally have a more intuitive sense of when to use it. And my mental model for it is collapsing a collection into one value, and, actually, that's why I prefer calling it reduce rather than the inject alias because reduce kind of signals to me this idea of going from many things to one canonical thing, I suppose.
JOËL: Yeah, that's a very common use case for reducing, and I guess the name itself, reducing, kind of has almost that connotation. You're taking many things, and you're going to reduce that down to a single thing.
STEPHANIE: What was really interesting to me about that conversation was that some people kind of had the opposite mental model where it made a bit more sense for them to think about injecting and, specifically, like, the idea of the accumulator being injected with values, I suppose. And I kind of realized that, in some ways, they're kind of antonyms [chuckles] a little bit because if you're focused on the accumulator, you're kind of thinking about something getting bigger. And that kind of blew my mind a little bit when I realized that, in some ways, they can be considered opposites.
JOËL: That's really fascinating. It is really interesting, I think, the way that we can take the name of a method and then almost, like, tell ourselves a story about what it does that then becomes our way of remembering how this method works. And the story we tell for the same method name, or in this case, maybe there's a few different method names that are aliases, can be different from person to person.
I know I tend to think of inject less in terms of injecting things into the accumulator and more in terms of injecting some kind of operator between every item in the collection. So, if we have an array of numbers and we're injecting plus, in my mind, I'm like, oh yeah, in between each of the numbers in the collection, just inject a little plus sign, and then do the math. We're summing all the items in the collection.
STEPHANIE: Does that still hold up when the operator becomes a little more complex than just, you know, like, a mathematical operator, like, say, a function?
JOËL: Well, when you start passing a block and doing custom logic, no, that mental model kind of falls apart. In order for it to work, it also has to be something that you can visualize as some form of infix operator, something that goes between two values rather than, like, a method name, which is typically in prefix position.
I do want to get at this idea, though: the difference between sort of the block version versus passing. There are ways where you can just do a symbol, and that will call a method on each of the items. Because I have a bit of a hot take when it comes to writing reduce blocks or inject blocks that are more accessible, easier to understand.
And that is, generally, that you shouldn't, or more specifically, you should not have a big block body. In general, you should be either using the symbol version or just calling a method within the block, and it's a one-liner. Which means that if you have some complex behavior, you need to find a way to move that out of this sort of collection operation and into instance methods on the objects being iterated.
STEPHANIE: Hmm, interesting. By one-liner do you mean passing the name of the method as a proc or actually, like, having your block that then calls the method? Because I can see it becoming even simpler if you have already extracted a method.
JOËL: Yeah, if you can do symbol to proc, that's amazing, or even if you can use just the straight-up symbol way of invoking reduce or inject. That typically means you have to start thinking about the types of objects that you are working with and what methods can be moved onto them. And sometimes, if you're working with hashes or something like that that don't have domain methods for what you want, that gets really awkward. And so, then maybe that becomes maybe a hint that you've got some primitive obsession happening and that this hash that sort of wants a domain object or some kind of domain method probably should be extracted to its own object.
STEPHANIE: I'll do you with another kind of spicy take. I think, in that case, maybe you don't want a reduce at all. If you're starting to find that...well, okay, I think it maybe could depend because there could be some very, like, domain-specific logic. But I have seen reduce end up being used to transform the structure of the initial collection when either a different higher-order function can be used or, I don't know, maybe you're just better off writing it with a regular loop [laughs]. It could be clearer that way.
JOËL: Well, that's really interesting because...so, you mentioned the idea that we could use a different higher-order function, and, you know, higher-order function is that fancy term, just a method that accepts another method as an argument. In Ruby, that just means your method accepts a block. Reduce can be used to implement pretty much the entirety of enumerable. Under the hood, enumerable is built in terms of each. You could implement it in terms of reduce.
So, sometimes it's easy to re-implement one of the enumerable methods yourself, accidentally, using reduce. So, you've written this, like, complex reduce block, and then somebody in review comes and looks at it and is like, "Hey, you realize that's just map. You've just recreated map. What if we used map here?"
STEPHANIE: Yeah. Another one I've seen a lot in JavaScript land where there are, you know, fewer utility functions is what we now have in Ruby, tally. I feel like that was a common one I would see a lot when you're trying to count instances of something, and I've seen it done with reduce. I've seen it done with a for each. And, you know, I'm sure there are libraries that actually provide a tally-like function for you in JS. But I guess that actually makes me feel even more strongly about this idea that reduce is best used for collapsing something as opposed to just, like, transforming a data structure into something else.
JOËL: There's an interesting other mental model for reduce that I think is hiding under what we're talking about here, and that is the idea that it is a sort of mid-level abstraction for dealing with collections, as opposed to something like map or select or some of those other enumerable helpers because those can all be implemented in terms of reduce. And so, in many cases, you don't need to write the reduce because the library maintainer has already used reduce or something equivalent to build these higher-level helpers for you.
STEPHANIE: Yeah, it's kind of in that weird point between, like, very powerful [chuckles] so that people can start to do some funky things with it, but also sometimes just necessary because it can feel a little bit more concise that way.
JOËL: I've done a fair amount of functional programming in languages like Elm. And there, if you're building a custom data structure, the sort of lowest-level way you have of looping is doing a recursion, and recursions are messy. And so, what you can do instead as a library developer is say, "You know what, I don't want to be writing recursions for all of these." I don't know; maybe I'm building a tree library. I don't want to write a recursion for every different function that goes over trees if I want to map or filter or whatever. I'm going to write reduce using recursion, and then everything else can be written in terms of reduce.
And then, if people want to do custom things, they don't need to recurse over my tree. They can use this reduce function, which allows them to do most of the traversals they want on the tree without needing to touch manual recursion. So, there's almost, like, a low-level, mid-level, high-level in the library design, where, like, lowest level is recursion. Ideally, nobody touches that. Mid-level, you've got reducing that's built out on top of recursion. And then, on top of that, you've got all sorts of other helpers, like mapping, like filtering, things like that.
STEPHANIE: Hmm. I'm wondering, do you know of any performance considerations when it comes to using reduce built off a recursion?
JOËL: So, one of the things that can be really nice is that writing a recursion yourself is dangerous. It's so easy to, like, accidentally introduce Stack Overflow. You could also write a really inefficient one. So, ideally, what you do is that you write a reduce that is safe and that is fast. And then, everybody else can just use that to not have to worry about the sort of mechanics of traversing the collection. And then, just use this. It already has all of the safety and speed features built in. You do have to be careful, though, because reduce, by nature, traverses the entire collection. And if you want to break out early of something expensive, then reduce might not be the tool for you.
STEPHANIE: I was also reading a little bit about how, in JavaScript, a lot of developers like to stick to that idea of a pure function and try to basically copy the entire accumulator for every iteration and creating a new object for that. And that has led to some memory issues as well. As opposed to just mutating the accumulator, having, especially when you, you know, are going through a collection, like, really large, making that copy every single time and creating, yeah [chuckles], just a lot of issues that way. So, that's kind of what prompted that question.
JOËL: Yeah, that can vary a lot by language and by data structure. In more functional languages that try to not mutate, they often have this idea of what they call persistent data structures, where you can sort of create copies that have small modifications that don't force you to copy the whole object under the hood. They're just, like, pointers. So, like, hey, we, like, are the same as this other object, but with this extra element added, or something like that. So, if you're growing an array or something like that, you don't end up with 10,000 copies of the array with, like, a new element every time.
STEPHANIE: Yeah, that is interesting. And I feel like trying to adopt different paradigms for different tools, you know, is not always as straightforward as some wish it were [laughs].
JOËL: I do want to give a shout-out to an academic paper that is...it is infamously dense. The title of it is Functional Programming with Bananas, Lenses, and Barbed Wire.
STEPHANIE: It doesn't sound dense; it sounds fun. Well, I don't about barbed wire.
JOËL: It sounds fun, right?
STEPHANIE: Yeah, but certainly quirky [laughs].
JOËL: It is incredibly dense. And they've, like, created this custom math notation and all this stuff. But the idea that they pioneered there is really cool, this idea that kind of like I was talking about sort of building libraries in different levels. Their idea is that recursion is generally something that's unsafe and that library and language designers should take care of all of the recursion and instead provide some of these sort of mid-level helper methods to do things. Reducing is one of them, but their proposal is that it's not the only one. There's a whole sort of family of similar methods that are there that would be useful in different use cases.
So, reduce allows you to sort of traverse the whole thing. It does not allow you to break out early. It does not allow you to keep sort of track of a sort of extra context element if you want to, like, be traversing a collection but have a sort of look forward, look back, something like that. So, there are other variations that could handle those. There are variations that are the opposite of reduce, where you're, like, inflating, starting from a few parameters and building a collection out of them.
So, this whole concept is called recursion schemes, and you can get, like, really deep into some theory there. You'll hear fancy words like catamorphisms and anamorphisms. There's a whole world to explore in that area. But at its core, it's this idea that you can sort of slice up things into this sort of low-level recursion, mid-level helpers, and then, like, kind of userland helpers built on top of that.
STEPHANIE: Wow. That is very intense; it sounds like [chuckles]. I'm happy not to ever have to write a recursion ever again, probably [laughs]. Have you ever, as just a web developer in your day-to-day programming, found a really good use case for dropping down to that level? Or are you kind of convinced that, like, you won't really ever need to?
JOËL: I think it depends on the paradigm of the language you're working in. In Ruby, I've very rarely needed to write a recursion. In something like Elm, I've had to do that, eh, not infrequently. Again, it depends, like, if I'm doing more library-esque code versus more application code. If I'm writing application code and I'm using an existing, let's say, tree library, then I typically don't need to write a recursion because they've already written traversals for me. If I'm making my own and I have made my own tree libraries, then yes, I'm writing recursions myself and building those traversals so that other people don't have to.
STEPHANIE: Yeah, that makes sense. I'd much rather someone who has read that paper [laughs] write some traversal methods for me.
JOËL: And, you know, for those who are curious about it, we will put a link to this paper in the description.
So, we've talked about a sort of very academic mental model way of thinking about reducing. I want to shift gears and talk about one that I have found is incredibly practical, and that is the idea that reduce is a way to scale an operation that works on two objects to an operation that works on sort of an unlimited number of objects.
To make it more concrete, take something like addition. I can add two numbers. The plus operator allows me to take one number, add another, get a sum. But what if I want to not just add two numbers? I want to add an arbitrary number of numbers together. Reduce allows me to take that plus operator and then just scale it up to as many numbers as I want. I can just plug that into, you know, I have an array of numbers, and I just call dot reduce plus operator, and, boom, it can now scale to as many numbers as I want, and I can sum the whole thing.
STEPHANIE: That dovetails quite nicely with your take earlier about how you shouldn't pass a block to reduce. You should extract that into a method. Don't you think?
JOËL: I think it does, yes. And then maybe it's, like, sort of two sides of a coin because I think what this leads to is an approach that I really like for reducing because sometimes, you know, here, I'm starting with addition. I'm like, oh, I have addition. Now, I want to scale it up. How do I do that? I can use reduce. Oftentimes, I'm faced with sort of the opposite problem. I'm like, oh, I need to add all these numbers together. How do I do that? I'm like, probably with a reduce. But then I start writing the block, and, like, I get way too into my head about the accumulator and what's going to happen.
So, my strategy for writing reduce expressions is to, instead of trying to figure out how to, like, do the whole thing together, first ask myself, how do I want to combine any two elements that are in the array? So, I've got an array of numbers, and I want to sum them all. What is the thing I need to do to combine just two of those? Forget the array. Figure that out.
And then, once I have that figured out, maybe it's an existing method like plus. Maybe it's a method I need to define on it if it's a custom object. Maybe it's a method that I write somewhere. Then, once I have that, I can say, okay, I can do it for two items. Now, I'm going to scale it up to work for the whole array, and I can plug it into reduce. And, at that point, the work is already basically done, so I don't end up with a really complex block. I don't end up, like, almost ending in, like, a recursive infinite loop in my head because I do that.
STEPHANIE: [laughs].
JOËL: So, that approach of saying, start by figuring out what is the operation you want to do to combine two elements, and then use reduce as a way to scale that to your whole array is a way that I've used to keep things simple in my mind.
STEPHANIE: Yeah, I like that a lot as a supplement to the model I shared earlier because, for me, when I think about reducing as, like, collapsing into a value, you kind of are just like, well, okay, I start with the collection, and then somehow I get to my single value. But the challenge is figuring out how that happens [laughs], like, the magic that happens in between that.
And I think another alias that we haven't mentioned yet for reduce that is used in a lot of other languages is fold. And I actually like that one a lot, and I think it relates to your mental model. Because when I think about folding, I'm picturing folding up a paper like an accordion. And you have to figure out, like, what is the first fold that I can make? And just repeating that over and over to get to your little stack of accordion paper [laughs]. And if you can figure out just that first step, then you pretty much, like, have the recipe for getting from your initial input to, like, your desired output.
JOËL: Yeah. I think fold is interesting in that some languages will make a distinction between fold and reduce. They will have both. And typically, fold will require you to pass an initial value, like a starting accumulator, to start it off. Whereas reduce will sort of assume that your array can use the first element of the array as the first accumulator.
STEPHANIE: Oh, I just came up with another visual metaphor for this, which is, like, folding butter into croissant pastry when the butter is your initial value [laughs].
JOËL: And then the crust is, I guess, the elements in the array.
STEPHANIE: Yeah. Yeah. And then you get a croissant out of it [laughs]. Don't ask me how it gets to a perfectly baked, flaky, beautiful croissant, but somehow that happens [laughs].
JOËL: So, there's an interesting sort of subtlety here that I think happens because there are sort of two slightly different ways that you can interact with a reduce. Sometimes, your accumulator is of the same type as the elements in your array. So, you're summing an array of numbers, and your accumulator is the sum, but each of the elements in the array are also numbers. So, it's numbers all the way through. And sometimes, your accumulator has a different type than the items in the array. So, maybe you have an array of words, and you want to get the sum of all of the characters and all the words. And so, now your accumulator is a number, but each of the items in the array are strings.
STEPHANIE: Yeah, that's an interesting distinction because I think that's where you start to see the complex blocks being passed and reduced.
JOËL: The complex blocks, definitely; I think they tend to show up when your accumulator has a different type than the individual items. So, maybe that's, like, a slightly more complicated use case. Oftentimes, too, the accumulator ends up being some, like, more complex, like, hash or something that maybe would really benefit from being a custom object.
STEPHANIE: I've never done that before, but I can see why that would be really useful. Do you have an example of when you used a custom object as the accumulator?
JOËL: So, I've done it for situations where I'm working with objects that are doing tally-like operations, but I'm not doing just a generic tally. There's some domain-specific stuff happening. So, it's some sort of aggregate counter on multiple dimensions that you can use, and that can get really ugly. And you can either do it with a reduce or you can have some sort of, like, initial version of the hash outside and do an each and mutate the hash and stuff like that. All of these tend to be a little bit ugly. So, in those situations, I've often created some sort of custom object that has some instance methods that allow to sort of easily add new elements to it.
STEPHANIE: That's really interesting because now I'm starting to think, what if the elements in the collection were also a custom object? [chuckles] And then things could, I feel like, could be really powerful [laughs].
JOËL: There's often a lot of value, right? Because if the items in the collection are also a custom object, you can then have methods on them. And then, again, the sort of complexity of the reduce can sort of, like, fade away because it doesn't own any of the logic. All it does is saying, hey, there's a thing you can do to combine two items. Let's scale it up to work on a collection of items. And now you've sort of, like, really simplified what logic is actually owned inside the reduce.
I do want to shout out for those listeners who are theory nerds and want to dig into this. When you have a reduce, and you've got an operation where all the values are of the same type, including the accumulator, typically, what you've got here is some form of monoid. It may be a semigroup. So, if you want to dig into some theory, those are the words to Google and to go a deep dive on.
The main thing about monoids, in particular, is that monoids are any objects that have both a sort of a base case, a sort of empty version of themselves, and they have some sort of combining method that allows you to combine two values of that type. If your object has these things and follows...there's a few rules that have to be true. You have a monoid. And they can then be sort of guaranteed to be folded nicely because you can plug in their base case as your initial accumulator. And you can plug in their combining method as just the value of the block, and everything else just falls into place.
A classic here is addition for numbers. So, if you want to add two numbers, your combining operator is a plus. And your sort of empty value is a zero. So, you would say, reduce initial value is zero, array of numbers. And your block is just plus, and it won't sum all of the numbers. You could do something similar with strings, where you can combine strings together with plus, and, you know, your empty string is your base case. So, now you're doing sort of string concatenation over arbitrary number of strings.
Turns out there's a lot of operations that fall into that, and you can even define some of those on your custom object. So, you're like, oh, I've got a custom object. Maybe I want some way of, like, combining two of them together. You might be heading in the direction of doing something that is monoidal, and if so, that's a really good hint to know that it can sort of, like, just drop into place with a fold or a reduce and that that is a tool that you have available to you.
STEPHANIE: Yeah, well, I think my eyes, like, widened a little bit when you first dropped the term monoid [laughs]. I do want to spend the last bit of our time talking about when not to use reduce, and, you know, we did talk a lot about recursion. But when do you think a regular old loop will just be enough?
JOËL: So, you're suggesting when would you want to use something like an each rather than a reduce?
STEPHANIE: Yeah. In my mind, you know, you did offer, like, a lot of ways to make reduce simpler, a lot of strategies to end up with some really nice-looking syntax [chuckles], I think. But, oftentimes, I think it can be equally as clear storing your accumulator outside of the iteration and that, like, is enough for me to understand. And reduce takes a little bit of extra overhead to figure out what I'm looking at. Do you have any thoughts about when you would prefer to do that? Or do you think that you would usually reach for something else?
JOËL: Personally, I generally don't like the pattern of using each to iterate over a collection and then mutate some external accumulator. That, to me, is a bit of a code smell. It's a sign that each is not quite powerful enough to do the thing that I want to do and that I'm probably needing some sort of more specialized form of iteration. Sometimes, that's reduce. Oftentimes, because each can suffer from the same problem you mentioned from reduce, where it's like, oh, you're doing this thing where you mutate an external accumulator. Turns out what you're really doing is just map. So, use map or use select or, you know, some of the other built-in iterators from the enumerable library.
There's a blog post on the thoughtbot blog that I continually link to people. And when I see the pattern of, like, mutating an external variable with each, yeah, I tend to see that as a bit of a code smell. I don't know that I would never do it, but whenever I see that, it's a sign to me to, like, pause and be like, wait a minute, is there a better way to do this?
STEPHANIE: Yeah, that's fair. I like the idea that, like, if there's already a method available to you that is more specific to go with that. But I also think that sometimes I'd rather, like, come across that pattern of mutating a variable outside of the iteration over, like, someone trying to do something clever with the reduce.
JOËL: Yeah, I guess reduce, especially if it's got, like, a giant block and you've got then, like, things in there that break or call next to skip iterations and things like that, that gets really mind-bending really quickly. I think a case where I might consider using an each over a reduce, and that's maybe generally when I tend to use each, is when I'm doing side effects. If I'm using a reduce, it's because I care about the accumulated value at the end. If I'm using each, it's typically because I am trying to do some amount of side effects.
STEPHANIE: Yeah, that's a really good call out. I had that written down in my notes, and I'm glad you brought it up because I've seen them get conflated a little bit, and perhaps maybe that's the source of the pain that I'm talking about. But I really like that heuristic of reduce as, you know, you're caring about the output, as opposed to what's going on inside. Like, you don't want any unexpected behavior.
JOËL: And I think that applies to something like map as well. My sort of heuristic is, if I'm doing side effects, I want each. If I want transformed values that are sort of one-to-one in the collection, I want map. If I want a single sort of aggregate value, then I want reduce.
STEPHANIE: I think that's the cool thing about mixing paradigms sometimes, where all the strategies you talked about in terms of, you know, using custom, like, objects for your accumulator, or the elements in your collection, like, that's something that we get because, you know, we're using an object-oriented language like Ruby. But then, like, you also are kind of bringing the functional programming lens to, like, when you would use reduce in the first place. And yeah, I am just really excited now [chuckles] to start looking for some places I can use reduce after this conversation and see what comes out of it.
JOËL: I think I went on a bit of an interesting journey where, as a newer programmer, reduce was just, like, really intense. And I struggled to understand it. And I was like, ban it from code. I don't want to ever see it. And then, I got into functional programming. I was like, I'm going to do reduce everywhere. And, honestly, it was kind of messy.
And then I, like, went really deep on a lot of functional theory, and I think understood some things that then I was able to take back to my code and actually write reduce expressions that are much simpler so that now my heuristic is like, I love reduce; I want to use it, but I want as little as possible in the reduce itself. And because I understand some of these other concepts, I have the ability to know what things can be extracted in a way that will feel very natural, in a way that myself from five years ago would have just been like, oh, I don't know. I've got this, you know, 30-line reduce expression that I know is complicated, but I don't know how to improve.
And so, a little bit of the underlying theory, I don't think it's necessary to understand these simplified reduces, but as an author who's writing them, I think it helps me write reduces that are simpler. So, that's been my journey using reduce.
STEPHANIE: Yeah. Well, thanks for sharing. And I'm really excited. I hope our listeners have learned some new things about reduce and can look at it from a different light.
JOËL: There are so many different perspectives. And I think we keep discovering new mental models as we talk to different people. It's like, oh, this particular perspective. And there's one that we didn't really dig into but that I think makes more sense in a functional world that's around sort of deconstructing a structure and then rebuilding it with different components. The shorthand name of this mental model, which is a fairly common one, is constructor replacement. For anyone who's interested in digging into that, we'll link it in the show notes.
I gave a talk at an Elm meetup where I sort of dug into some of that theory, which is really interesting and kind of mind-blowing. Not as relevant, I think, for Rubyists, but if you're in a language that particularly allows you to build custom structures out of recursive types or what are sometimes called algebraic data types, or tagged unions, or discriminated unions, this thing goes by a bajillion names, that is a really interesting other mental model to look at.
And, again, I don't think the list that we've covered today is exhaustive. You know, I would love it for any of our listeners; if you have your own mental models for how to think about folding, injecting, reducing, send them in: hosts@bikeshed.fm. We'd love to hear them.
STEPHANIE: And on that note, shall we wrap up?
JOËL: Let's wrap up.
STEPHANIE: Show notes for this episode can be found at bikeshed.fm.
JOËL: This show has been produced and edited by Mandy Moore.
STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.
JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.
STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.
JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.
ALL: Byeeeeeeee!!!!!!!
AD:
Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us.
More info on our website at: tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions.
448 епізодів
ทุกตอน
×Ласкаво просимо до Player FM!
Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.