The Stack Overflow Podcast

Building a PDF larger than the known universe

Episode Summary

On this home team episode: Massachusetts makes a welcome shift toward skills-based hiring, AI-generated content robs us of our appetite for mac and cheese, and large-scale crypto mining operations account for more than 2% of the US’s electricity generation. Plus: A PDF quite a bit bigger than Germany.

Episode Notes

Is it possible to make a PDF bigger than Germany? Here’s one larger than the known universe. As its creator says, “it’s mostly empty space, but so is the universe.”

Massachusetts is leading the way in the skills-based hiring revolution by eliminating degree requirements for state jobs.

Did you miss these deeply uncanny AI-generated food images, from the conjoined chickens to the macaroni and cheese rendered in shapes formerly unknown to geometry? Never fear; you can still see some here.

You may have forgotten about crypto (or at least tried), but more than 2% of the United States’s electricity generation goes to large-scale crypto mining.

Stack Overflow user Jeff Allen earned a Great Question badge for Create a Vector of All Days Between Two Dates, which has helped 85,000 R users.

Episode Transcription

[intro music plays]

Ben Popper Don’t start building your AI app from scratch. Save time and effort by visiting intel.com/edgeai. Get open source code snippets and sample apps for a head start on development so you can reach your seamless deployment faster. Go to intel.com/edgeai.

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my colleague, Ryan Donovan, who edits our blog and our newsletter. Ryan, how are you doing today?

Ryan Donovan Hey, Ben. Pretty good, pretty good.

BP Good.

RD So Cassidy gave us this link for the newsletter and I think it's too good not to talk about. Somebody was testing the limits of what a PDF can do, how big of a PDF you can get to, and I think there was a claim somewhere that you could make a PDF as big as Germany. And he's going through and I think the Acrobat software limits you to a point, and he went in and coded it by hand and he ran into a limit there, but then he found an unlimited in the Mac OS media box. Basically he could create a PDF that was larger than the known universe, and if you want to see it, you can download it from this site.

BP Well, how can I download it if it's infinitely big?

RD It's not infinitely big. It's 37 trillion lightyears square.

BP Oh, I see. Okay, good.

RD The known universe.

BP What is up with Adobe Acrobat? They limit you to 15 million x 15 million inches? Garbage. Do you have to pay for a bigger version?

RD There's probably reasonable limits there– 15 million x 15 million inches.

BP Well, we want to have a PDF that other species can see from space. That's how we solve Fermi's paradox. If they can't see the PDF from space, how are they going to know we exist?

RD This is the digital version of the Borges story, The Map is Not the Territory, where somebody is trying to perfectly map a territory and it becomes as big as the territory. So if you want a perfect map of the known universe, it has to be as big as the known universe.

BP I gotcha. “Who uses the user unit property in PDF?” This was asked four years ago and modified yesterday. So now there's a Stack Overflow question here if you want to know how to get to the largest possible PDF, we can help you out on that.

RD That's right. They get to their answer through Stack Overflow. So a little bit of a self link, but not mad.

BP Hey, what are we going to do? 37 trillion lightyears square. I don't really think of distance in terms of lightyears so I'm going to have trouble with that.

RD That is a measure of distance though.

BP I know. It’s hard for me to think about it that way.

RD I think a meter is also a measure of light over time.

BP Interesting.

RD It's light going like 1.some meters.

BP Do you know in Star Wars where they talk about parsecs? Have you ever heard that?

RD I've heard that.

BP I always thought that was made up, but apparently it's not. It's 3.26 lightyears.

RD I think I've heard a criticism where it's using the wrong– I forget what the quote is, but it makes the Kessel Run in 3.6 parsecs, and that's not a unit of time. It's a unit of distance.

BP Yeah, exactly. That famous line is incorrect. So we never measure time in terms of light. A lightyear is the distance that light would move in one year at its going speed.

RD Yep. That's right.

BP Okay. 6 trillion miles. A mere 6 trillion miles.

RD So if you got some time on your hands.

BP All right. I have one here for us, which I think is interesting and is kind of a trend. “Massachusetts governor leans into skills-based revolution by axing degree requirements for state jobs. The private sector is up next.” This is no surprise to anyone who's worked in the world of software for a long time. There are many sorts of certificates and boot camps that you can go to as an on-ramp to learn software development or be certified as a cloud administrator, security administrator. And then you can get an entry level job in IT and some of those jobs actually pay pretty well. But it's kind of cool to see the world responding to that by saying that a four year college degree maybe is not what we need when it comes to the IT department of our water and sanitation. What we need is somebody who has X skills. And one would hope this levels the playing field for people, because a college degree often can be quite expensive or time prohibitive. If you have to support yourself or support your family, despite your intelligence, a college education isn't always easily accessible.

RD I've always said there's parts of programming and IT that are essentially blue collar jobs. You're going through and you're building something, you just have to figure out how to build something. You don't need all the fancy math. And it's not even guaranteed that you'll get that fancy math in a college program.

BP Yeah. “Massachusetts is in the middle of a transition to a skills-based economy and the demand for talent is at an all-time high. It has the highest percentage of working adults with a four-year degree. We can be proud of that, but the other half of the workforce makes an immense contribution and should be welcomed in the public sector,” is what the governor had to say. And we live in a weird time where, from the tech sector side, we see a lot of layoffs. It is a bit unsettling. It's the first time in 10-12 years where being a software developer doesn't always mean you're going to have multiple options on the table to keep your current job or go to a different one. But the unemployment rate in the United States –and we're speaking to a global audience here, but just from our perspective sitting here in the US– is extremely low and the workforce participation rate for working age adults is extremely high. If you want to have a job, you can. Maybe not the job you want, maybe not the salary you used to have, but there's a demographic shift in the United States where baby boomers were this huge generation and they're retiring out, retiring a lot faster post-Covid. And so in some ways, saying that you don't need a college degree is a great thing. In some ways, maybe they're responding just to the fact that they don't have enough people. They have spots they have to fill.

RD And I think there's been a lot of talk about trying to get college degrees to be more skill-based, closer to trade schools in some ways, and I think that's a response to all these jobs requiring college degrees when they didn't really need one. What they need is the skills. So you can just take a skill-up program, a bootcamp or something, and get the skills. And the college degree I think initially was meant to be that it makes you a sort of well-rounded citizen.

BP Right. All right, we have to go there– a large food delivery business has set about quietly deleting its unsettling AI generated food pics. This is not the first instance of rogue AI. There was a car company that had a chatbot which was agreeing to sell cars to people for a dollar. But this one, you just wonder what they were thinking. There's a conjoined chicken here, there is macaroni and cheese in a shape formerly unknown to humans and geometrists. How does this stuff get past quality control? It's hard to say.

RD That's amazing. I get wanting to skip the queue and just have these pictures there, but you want pictures of real food when you're ordering food.

BP Well, these are recipes, so it's like what you expect to get. No, wait, that's not right. Oh, no, this is even worse. Maybe some of the recipes were also AI generated. That's not safe.

RD You’ve got to wonder if anybody was checking these for quality. You need a human in the loop here. You need somebody who is making sure you're not putting out bot crap, as they say. But there is a lot of food advertising that uses fake images, and in your ooey-gooey chocolate chip, they'll put Elmer's glue or something. They'll put glue in there and it won't be the real food, it'll just look more appetizing because they've made it with stuff you get at Home Depot.

BP Right. What you see in a food commercial when the cheese is stretching out to four feet and stuff is not real. The hot dog and the hamburger are always ridiculously plump. It's food porn as they say. But I guess this issue now goes beyond just one. So let's just say that this is not a single company, but a number that have tried this. Is this a cost-cutting measure where it's like, “Hey, we want to stop paying to have stock photography of this.” The medium whole pie pizza is now a picture of a delicious bakery pie dessert, and there's a brand of ranch dressing that does not exist. And it just throws you for a loop wondering who would feel safe ordering this stuff.

RD A lot of it could just be auto-generated, honestly. They're looking at what the searches are for recipes and they're like, “Well, let's generate a recipe for that,” and they're just trying to grab SEO, which, I've been there at jobs.

BP Cheddar and cream cheese sauce for mac and cheese. So they were just like, “These searches go together.” Somebody is like, “Can I use cream cheese for mac and cheese?” That gets searched 10,000 times a day and the AI is happy to tell you that it's got an idea also for how to plate it.

RD The chefs of the future.

BP Go forth with caution, user discretion is advised. Please be safe out there if you're looking at online recipes. Dear Lord. There was something really interesting that I heard today, which I will share. It was about the transformer model, which has been one of the most impactful sort of architectural updates to how we approach AI driving a lot of the recent Gen AI revolution. That paper was published in 2017/2018 by Google and has a bajillion citations as well. So this person pointed out that if you go back a few years earlier to 2014, there was actually publications that used the same attention mechanism to improve AI image recognition that would look at an image and then write a caption. And so the thing that made transformers great was that they would be able to think beyond just the last token. They would go back and look at the context of the whole sentence and from there they can evolve. Now they can think in large chunks of contents, paragraphs, sentences, books, essays. So this is also true for this image generator and the person was saying why didn't that paper blow up and why wasn't that the springboard? And they brought up something interesting which I had never heard of before called ‘the hardware lottery effect,’ which the idea is that a ton of research gets published every year and a lot of it has some kind of algorithmic improvement. And I've always thought that what made the transformer important was that it made an algorithmic improvement, it decided on a different way of doing things. And what this person was saying was, actually, it wasn't that the idea was so novel. It was that this research came out of Google and they have TPUs and other folks have GPUs, and in large language models, the transformer lets you parallelize the computation and take advantage of the parallel architecture and the hardware, therefore you can do it much bigger, much faster, and much cheaper. And in the world of image recognition, this was already true and so it was irrelevant. It wouldn't let you make some kind of big gain in industry, and so therefore the research wasn't widely adopted, and I thought that was really interesting to think about.

RD When we're talking about AI, a lot of times we do lean on talking about the transformer model there and we don't talk a lot about the gains in hardware, that people were using GPUs all over the place and now we've got this specialized. We were talking to someone yesterday, I was talking about how the GPU is just a furnace putting stuff in and churning it out and it's this big energy-hungry monster. And what you just need is the parallelization which is pretty computationally cheap in itself.

BP We'll have to dig a little deeper into this paper and see if there's some interesting examples in here, but what they were saying is that very productive research can fall by the wayside if it doesn't align with commercial opportunity, and that the advent of domain-specialized hardware that makes it increasingly costly to stray off the beaten path, you should definitely be thinking about this because you don't want to abandon research that could provide huge gains just because it's not going to let you max out the style of chip. On the other hand, what are businesses going to do? Some of them I'm sure have pure research labs and the government has it or whatever, but for things to become widely adopted, they have to work at scale, and the more things that are adopted at scale, the less they cost and then it's kind of a virtuous cycle there.

RD You need the simple version that can work with everybody. The transformer model came out in what– 2017/2018? It really didn't blow up until ChatGPT in 2022, because now there was a great front end on it. People could use it anytime.

BP Right. All right, so the hardware lottery is one way you can win. The other is social media influencers. If your AI research is shared by somebody who has a large social media presence, it is very likely to go on to get two to three times more citations in the future, meaning other scholars are building on your research. So being an online influencer, that's how we chart our path of scientific progress here.

RD That's right. It's all popularity contests all the way down.

BP I guess so, I guess so. I'm sure that's always been true. Scientists who were convincing and charismatic got people to follow them more than others. They didn't have social media to do it.

RD I read about some research that said that the artists that make it big, they aren't the most creative or the best or the most interesting– they have the most international friends.

BP Is that right?

RD Read it on the internet, must be true.

BP It must be true, all right. Over 2% of the US’s electricity generation now goes to Bitcoin. I don't understand this. For me, Bitcoin has been over. I guess that means that I'm probably missing another opportunity to create generational wealth. It's probably the third time I've done this. Maybe I should just stop trying to be smart about this and just always buy a little bit of Bitcoin every week for the rest of my life just so I don't miss it again. Fool me thrice, shame on me.

RD I think it has felt over as a gold rush, but I remember reading a few months ago that Bitcoin hit a new high.

BP Did it? Wow.

RD Yeah, I think so.

BP I'm so out of touch with this now, I guess because it's completely fallen out of mainstream press.

RD Well, it's culled all the also-rans. It's now just the big players that are still there hanging on trying to make this into something that is usable across whatever the use case is.

BP This is fascinating. Since Bitcoin mining is the antithesis of an essential activity, several mining operations have signed up for demand response programs where they agree to take their operations offline if electricity demand is likely to exceed generating capacity in return for compensation by the grid operator. It has been widely reported that one facility in Texas at an aluminum smelter site earned $30 million by shutting down during the heatwave of 2023. They're taking us all hostage here, Ryan. I don't like it. They're mining their digital gold and then they're charging us not to mine it? Unacceptable.

RD Oh, okay. So I'm looking at the Bitcoin charts here. It hit a peak in about late 2021– 60,000.

BP Yeah, I remember that.

RD And then it crashed for a bit and then it got back up to 40,000 about a month ago.

BP Okay. I got into Bitcoin once or twice at 7,000, at 12,000 at 30 or 40,000. I don't know if I held on to my 7 and my 12. I remember thinking, “Whatever.” I bought like a hundred dollars worth of it, so it was never going to whatever, but I put some stakes in the ground I can feel proud of. 7 and 12, those are my numbers.

RD I remember a friend of mine was talking about it when it was like 15 bucks and I was like, “Ugh, why would you buy that?”

BP That's a lot. That's a lot for some bytes.

RD Yeah.

[music plays]

BP All right, everybody. It is that time of the show. Let's shout out somebody who came on Stack Overflow and helped by contributing a little bit of knowledge or a little bit of curiosity. A Great Question Badge was awarded to Jeff Allen yesterday. “How to create a vector of all the days between two dates.” 85,000 people have checked this out as part of the RR language collective. So if you want to know how to create a vector of all days between two dates, Jeff has an answer for you and has earned a Lifeboat Badge. So congrats to you, Jeff. And if you don't know, collectives are a little subsection of Stack Overflow where you can hang out and talk about something like R language if that's what you're into, and then it has a discussion board so you don't just have to ask questions, you can also just chit-chat and have discussions or post things that are interesting like an interview with somebody. All right, y'all. That's all we got for you today. As always, I am Ben Popper, Director of Content here at Stack Overflow. You can find me on Twitter @BenPopper. Email us, podcast@stackoverflow.com. We've gotten a couple recently that we've since booked for episodes, so hit us up there. And if you liked the show today, leave us a rating and a review.

RD I remain Ryan Donovan. I edit the blog here at Stack Overflow, located at stackoverflow.blog. And if you want to contact me on X, my handle is @RThorDonovan.

BP Great. All right, everybody. Thanks for listening, and we will talk to you soon.

[outro music plays]