GitHub Is Having Issues

(githubstatus.com)

151 points | by Simpliplant 2 hours ago

31 comments

cpfohl 2 hours ago
I swear this is my fault. I can go weeks without doing infra work. Github does fine, I don't see any hiccups, status page is all green.
But the day comes that I need to tweak a deploy flow, or update our testing infra and about halfway through the task I take the whole thing down. It's gotten to the point where when there's an outage I'm the first person people ask what I'm doing...and it's pretty dang consistent....
[-]
- aezart 16 minutes ago
  Sounds like my Dad, who used to have an uncanny ability to get stuck in elevators. Even got stuck in one with his claustrophobia therapist.
- LollipopYakuza 1 hour ago
  Plot twist: cpfohl works at Github and actually messes with the infra.
  [-]
  - sidewndr46 24 minutes ago
    Second plot twist: cpfohl actually works at Microsoft on Copilot
- wolfi1 1 hour ago
  do you know the Pauli-Effect? https://en.wikipedia.org/wiki/Pauli_effect
  [-]
  - cperciva 15 minutes ago
    Related: In FreeBSD we used to talk often about the Wemm Field. Peter Wemm was one of the early FreeBSD developers and responsible for most of the early project server cluster, and hardware had a phenomenal habit of breaking in his vicinity. One notable story I heard involved transporting servers between data centers and hitting a Christmas tree in the middle of a highway... in March.
  - macintux 1 hour ago
    At my old job we’d call that Daily bogons (my last name). Didn’t know I was in such illustrious company.
  - cpfohl 48 minutes ago
    Brilliant. I love it
- hmokiguess 1 hour ago
  You should be promoted to SRE - Schrodinger Reliability Engineer
- trigvi 32 minutes ago
  Simple solution: do infra work every few months instead of every few weeks.
- RGamma 52 minutes ago
  Surely this would earn you loads of internet street cred.
- Imustaskforhelp 1 hour ago
  Just let us know in advance when you want to do infra work from now on, alright?
  [-]
  - cpfohl 48 minutes ago
    I’ll try. Lemme know if you need a day off too…
    [-]
    - Imustaskforhelp 39 minutes ago
      I know a guy who knows a guy who might need a day off haha
      And they are gonna give a pizza party if I get them a day off. I am gonna share a slice with ya too.
      Doing a github worldwide outage by magical quantum entanglement for a slice of pizza? I think I would take that deal! xD.
shykes 1 hour ago
In moments like this, it's useful to have a "break glass" mode in your CI tooling: a way to run a production CI pipeline from scratch, when your production CI infrastructure is down. Otherwise, if your CI downtime coincides with other production downtime, you might find yourself with a "bricked" platform. I've seen it happen and it is not fun.
It can be a pain to setup a break-glass, especially if you have a lot of legacy CI cruft to deal with. But it pays off in spades during outages.
I'm biased because we (dagger.io) provide tooling that makes this break-glass setup easier, by decoupling the CI logic from CI infrastructure. But it doesn't matter what tools you use: just make sure you can run a bootstrap CI pipeline from your local machine. You'll thank me later.
[-]
- nadirollo 30 minutes ago
  This is a must when your systems deal with critical workloads. At Fastly, we process a good chunk of the internet's traffic and can't afford to be "down" while waiting for the CI system to recover in the event of a production outage.
  We built a CI platform using dagger.io on top of GH Actions, and the "break glass" pattern was not an afterthought; it was a requirement (and one of the main reasons we chose dagger as the underlying foundation of the platform in the first place)
- alex_suzuki 1 hour ago
  100%. We used to design the pipeline a way that is easily reproducible locally, e.g. doesn’t rely on plugins of the CI runtime. Think build.sh shell script, normally invoked by CI runner but just as easy to run locally.
- tomwphillips 1 hour ago
  A while back I think I heard you on a podcast describing these pain points. Experienced them myself; sounded like a compelling solution. I remember Dagger docs being all about AI a year or two ago, and frankly it put me off, but that seems to have gone again. Is your focus back to CI?
  [-]
  - shykes 59 minutes ago
    Yes, we are re-focused on CI. We heard loud and clear that we should pick a lane: either a runtime for AI agents, or deterministic CI. We pick CI.
    Ironically, this makes Dagger even more relevant in the age of coding agents: the bottleneck increasingly is not the ability to generate code, but to reliably test it end-to-end. So the more we all rely on coding agents to produce code, the more we will need a deterministic testing layer we can trust. That's what Dagger aspires to be.
    For reference, a few other HN threads where we discussed this:
    - https://news.ycombinator.com/item?id=46734553
    - https://news.ycombinator.com/item?id=46268265
    [-]
    - tomwphillips 37 minutes ago
      That's good - I'll reconsider Dagger.
      Yes, I agree on your assessment. AI means a higher rate of code changes, so you need more robust and fast CI.
duggan 11 minutes ago
A directory over SSH can be your git server. If your CI isn't too complex, a post-receive hook looping into Docker can be enough. I wrote up about self hosting git and builds a few weeks ago[1].
There are heavier solutions, but even setting something like this up as a backstop might be useful. If your blog is being hammered by ChatGPT traffic, spare a thought for Github. I can only imagine their traffic has ballooned phenomenally.
1: https://duggan.ie/posts/self-hosting-git-and-builds-without-...
duckkg5 2 hours ago
I would so very much love to see GitHub switch gears from building stuff like Copilot etc and focus on availability
[-]
- adithyareddy 12 minutes ago
  The #1 priority at GitHub for this year is migrating from their own data center to Azure, any other work that gets in the way of this is being deprioritized: https://thenewstack.io/github-will-prioritize-migrating-to-a...
- coffeebeqn 1 hour ago
  This is an absurd state they are at! Weekly outages in 2025 and 2026. From developer beloved and very solid to Microslop went faster than I expected
  [-]
  - esseph 9 minutes ago
    [delayed]
- hrmtst93837 45 minutes ago
  I think GitHub shipping Copilot while suffering availability issues is a rational choice because they get more measurable business upside from a flashy AI product than from another uptime graph. In my experience the only things that force engineering orgs to prioritize uptime are public SLOs with enforced error budgets that can halt rollouts, plus solid observability like Prometheus and OpenTelemetry tracing, canary rollouts behind feature flags, multi-region active-active deployments, and regular chaos experiments to surface regressions. If you want them to change, push for public SLOs or pay for an enterprise SLA, otherwise accept that meaningful uptime improvements cost money and will slow down the flashy stuff.
- rschiavone 1 hour ago
  Unless a major out(r)age forces a change of leadership, expect more slop down our throats.
overshard 2 hours ago
I've taken to hosting everything critical like this myself on a single system with Docker Compose with regular off premises backups and a restore process that I know works because I test it every 6 months. I can swap from local hosting to a VPS in 30 mins if I need to. It seems like the majority of large services like GitHub have had increasingly annoying downtime while I try to get work done. If you know what you're doing it's a false premise that you'll just have more issues with self hosting. If you don't know what you are doing it's becoming an increasingly good time to learn. I've had 4 years of continuous uptime on my services at this point. I still push to third parties like GitHub as yet another backup and see the occasional 500 and my workflow keeps chugging along. I've gotten old and grumpy and rather just do it myself.
nlawalker 1 hour ago
The appearance of a thread here is so consistent that HN needs a black-bar style indicator for GH outages that points to it.
[-]
- Imustaskforhelp 1 hour ago
  At this point I am thinking of creating a 0 days until github outage website similar to how we had the running joke of 0 days until JS framework dropped.
  [-]
  - joecool1029 59 minutes ago
    Too slow: https://github-incidents.pages.dev/
    [-]
    - Night_Thastus 10 minutes ago
      That site could use a little more. Maybe a count of how many in the current month and year, tallies for each year, maybe even trends. Could be nice. :)
    - Imustaskforhelp 47 minutes ago
      Too late to create a 0 days since github outage, Too early to create a crypto rugpull about this whole situation.
      Born just in time to talk about this situation on hackernews xD (/jk)
      > Too slow: https://github-incidents.pages.dev/
      I am not even mad that I am slow honestly, this is really funny lol.
esafak 0 minutes ago
I spent hours trying to figure out what was wrong, &^$% Github.
terminalbraid 1 hour ago
I would prefer we have posts when github is not having issues to cut down on noise.
zthrowaway 1 hour ago
Microslop ruins everything it touches.
akoumjian 2 hours ago
Is this related to Cloudflare?
I'm getting cf-mitigated: challenge on openai API requests.
https://www.cloudflarestatus.com/ https://status.openai.com/
joecool1029 2 hours ago
codeberg might be a little slower on git cli, but at least it's not becoming a weekly 'URL returned error: 500' situation...
[-]
- popcornricecake 1 hour ago
  These days it feels like people have simply forgotten that you could also just have a bare repository on a VPS and use it over ssh.
  [-]
  - yoyohello13 1 hour ago
    Most developers don’t even know git and GitHub are different things…
- mynameisvlad 2 hours ago
  I mean, this isn't a 'URL returned error: 500' situation for anything that Codeberg provides considering this is an issue with Copilot and Actions.
  [-]
  - joecool1029 2 hours ago
    Except actually it was, that was what my git client was reporting trying to run a pull.
    [-]
    - mynameisvlad 1 hour ago
      I'm going to trust the constant stream of updates from the company itself which shows exactly what went down and came back up rather than a random anecdote.
      [-]
      - workethics 1 hour ago
        I only found this post because I decided to check HN after getting HTTP 500 errors pulling some repos.
- Imustaskforhelp 1 hour ago
  I used to use codeberg 2 years ago. I may have been ahead of my time.
- ocdtrekkie 2 hours ago
  I rarely successfully get Codeberg URLs to load. Which is sad because I actually would very much like to recommend it but I find it unreliable as a source.
  That being said, GitHub is Microsoft now, known for that Microsoft 360 uptime.
  [-]
  - Imustaskforhelp 1 hour ago
    I have never had this issue. IIRC Codeberg has a matrix community, they are a non-profit and they would absolutely love to hear your feedback of them. I hope that you can find their matrix community and join it and talk with them
    Actually here you go, I have pasted the matrix link to their community, hope it helps https://matrix.to/#/#codeberg-space:matrix.org
  - cyberax 1 hour ago
    > Microsoft 360 uptime
    I mean... It's right in the name! It's up for 360 days a year.
- IshKebab 2 hours ago
  I mean... you understand the scale difference right?
pothamk 1 hour ago
What’s interesting about outages like this is how many things depend on GitHub now beyond just git hosting. CI pipelines, package registries, release automation, deployment triggers, webhooks — a lot of infrastructure quietly assumes GitHub is always available. When GitHub degrades, the blast radius is surprisingly large because it breaks entire build and release chains, not just repo browsing.
[-]
- littlestymaar 1 hour ago
  > a lot of infrastructure quietly assumes GitHub is always available
  Which is really baffling when talking about a service that has at least weekly hicups even when it's not a complete outage.
  There's almost 20 outages listed on HN over the past two months: https://news.ycombinator.com/from?site=githubstatus.com so much for “always available”.
  [-]
  - pothamk 1 hour ago
    Part of it is probably historical momentum. GitHub started as “just git hosting,” so a lot of tooling gradually grew around it over the years — Actions, package registries, webhooks, release automation, etc. Once teams start wiring all those pieces together, replacing or decoupling them becomes surprisingly hard, even if everyone knows it’s a single point of failure.
dkhenry 52 minutes ago
I really wish Graphite had just gone down the path of better Git hosting and reviewing, instead of trying to charge me $40 a month for an AI reviewer. It would be nice to have a real first class alternative to Github
joshrw 2 hours ago
Happening very often lately
[-]
- risyachka 1 hour ago
  and we all know why
  [-]
  - rezonant 1 hour ago
    Because they're moving it to Azure and doing it far too quickly, not taking care to avoid availability issues
    [-]
    - Zanfa 1 hour ago
      It wasn't the migration to Azure that completely borked their PR UI.
    - risyachka 31 minutes ago
      yeah, ai slop rush
      everyone builds off vibes and moves fast! like no, if you are a mature company you don't need to move fast, in fact you need to move slow
      the only thing that can kill e.g. github is if they move fast and break things like they do recently
nor0x 1 hour ago
> This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
does anyone know where these "detailed root cause analysis" reports are shared? is there maybe an archive?
garciasn 2 hours ago
How reliable is githubstatus.com? I know that status pages are generally not updated until Leadership and/or PR has a chance to approve the changes; is that the case here?
Our health check checks against githubstatus.com to verify 'why' there may be a GHA failure and reports it, e.g.
Cannot run: repo clone failed — GitHub is reporting issues (Partial System Outage: 'Incident with Copilot and Actions'). No cached manifests available.
But, if it's not updated, we get more generic responses. Are there better ways that you all employ (other than to not use GHA, you silly haters :-))
[-]
- duckkg5 2 hours ago
  Right now the page says Copilot and Actions are affected but I can't even push anything to a repo from the CLI.
  [-]
  - alemanek 1 hour ago
    Yep getting 500 errors intermittently on fetch and checkout operations in my CI pretty consistently at the moment. Like 1 in 2 attempts
  - jjice 1 hour ago
    Agreed. I believe that's marked under "Git Operations" and it's all green. Just began being able to push again a minute ago.
littlestymaar 1 hour ago
In many companies I worked for, there were a bunch of infrastructure astronauts who made everything very complicated in the name of zero downtime and sold them to management as “downtime would kill pur credibility and our businesses ”, and then you have billion dollar companies everyone relies on (GitHub, Cloudflare) who have repeated downtime yet it doesn't seem to affect their business in any way.
[-]
- wiether 24 minutes ago
  It's a multitude of factors but basically they can act like that because they are dominant on the market.
  The classic "nobody ever gets fired for buying IBM".
  If you pick something else, and there's issue, people will complain about your choice being wrong, should have gone with the biggest player.
  Even if you provide metrics showing your solution's downtime being 1% of the big player.
  Something like Cloudflare is so big and ubiquitous, that, when there's a downtime, even your grandma is aware of it because they talk about it in the news. So nobody will put the blame on the person choosing Cloudflare.
  Even if people decides to go back (I had a few customers asking us to migrate to other solutions or to build some kind of failover after the last Cloudflare incidents), it costs so much to find the solutions that can replace it with the same service level and to do the migration, that, in the end, they prefer to eat the cost of the downtimes.
  Meanwhile, if you're a regular player in a very competitive market, yes, every downtime will result in lost income, customers leaving... which can hurt quite a lot when you don't have hundreds of thousands of customers.
- bonesss 1 hour ago
  Businesses are incommensurate.
  GitHub is a distributed version control storage hub with additional add-on features. If peeps can’t work around a git server/hub being down and don’t know to have independent reproducible builds or integrations and aren’t using project software wildly better that GitHubs’, there are issues. And for how much money? A few hundred per dev per year? Forget total revenue, the billions, the entire thing is a pile of ‘suck it up, buttercup’ with ToS to match.
  In contrast, I’ve been working for a private company selling patient-touching healthcare solutions and we all would have committed seppuku with outages like this. Yeah, zero downtime or as close to it as possible even if it means fixing MS bugs before they do. Fines, deaths, and public embarrassment were potential results of downtime.
  All investments become smart or dumb depending on context. If management agrees that downtime would be lethal my prejudice would be to believe them since they know the contracts and sales perspective. If ‘they crashed that one time’ stops all sales, the 0% revenue makes being 30% faster than those astronauts irrelevant.
- Krutonium 1 hour ago
  To be fair - it SUPER does. Being down frequently makes your competition look better.
  Of course, once you have the momentum it doesn't matter nearly as much, at least for a while. If it happens too much though, people will start looking for alternatives.
  The key to remember is Momentum is hard to redirect, but with enough force (reasons), it will.
- baggy_trough 1 hour ago
  The reality is that consumers don't really care about downtime unless it's truly frequent.
  [-]
  - littlestymaar 1 hour ago
    Exactly.
    And the frequency they can tolerate is surprisingly high given that we're talking about the 20th or so outage of 2026 for github. (See: https://news.ycombinator.com/from?site=githubstatus.com)
granzymes 1 hour ago
I have a bug bash in an hour and fixes that need to go in beforehand. So of course GitHub is down.
banga 1 hour ago
Only on days with a "y"...
yoyohello13 1 hour ago
How many 9s is GitHub at now? 2?
[-]
- jsheard 1 hour ago
  If you count every service together, it's deep into one nine.
  https://mrshu.github.io/github-statuses/
  Most individual services have two nines... but not all of them.
- kibwen 1 hour ago
  Github proudly boasts an industry-leading seven 9s of uptime. 49.999999%
  [-]
  - nurettin 55 minutes ago
    They could support up to 10 0.00000000009999999999
- z3ugma 39 minutes ago
  "78 incidents in last 90 days" per https://mrshu.github.io/github-statuses/
  that's....gobsmacking...I knew it was memeably bad but I had no idea it was going so badly
- modeless 1 hour ago
  90 day non-degraded uptime of Github Actions is 98.8% if the official numbers can be believed
- amarant 1 hour ago
  Due to a off-by-one error, they are now targeting "five eights". Why else would they migrate to Azure?
- jandrese 1 hour ago
  They're going to have to start advertising nine fives of reliability.
- whateveracct 1 hour ago
  they were down to a low 1 nine recently
- Imustaskforhelp 1 hour ago
  Github service has a better work life balance than many engineers here...
  Octocat (The OG github mascot) has a family that it goes to the park with anytime he wants.
  Luckily his boss Microslop, is busy with destroying windows of his house and banning people from its discord server.
cyberax 1 hour ago
You know that it's bad when the status page doesn't have the availability stats anymore.
m_w_ 1 hour ago
Seems like the xkcd [1] for internet infrastructure that was posted earlier [2] should have github somewhere on it, even if just for how often it breaks. Maybe it falls under "whatever microsoft is doing"
[1]: https://www.reddit.com/r/ProgrammerHumor/comments/1p204nx/ac... [2]: https://news.ycombinator.com/item?id=47230704
Imustaskforhelp 1 hour ago
Lowendtalk providers who take 7$ per year deals can provide more reliability than Github at this moment and I am not kidding.
If anyone is using Github professionally and pays for github actions or any github product, respectfully, why?
You can switch to a VPS provider and self host gitea/forejo in less time than you might think and pay a fraction of a fraction than you might pay now.
The point becomes more moot because github is used by developers and devs are so so much more likely to be able to spin up a vps and run forejo and run terminal. I don't quite understand the point.
There are ways to run github actions in forejo as well iirc even on locally hosted which uses https://github.com/nektos/act under the hood.
People, the time where you spent hundreds of thousands of dollars and expected basic service and no service outage issues is over.
What you are gonna get is service outage issues and lock-ins. Also, your open source project is getting trained on by the parent company of the said git provider.
PS: But if you do end up using Gitea/forejo. Please donate to Codeberg/forejo/gitea (Gitea is a company tho whereas Codeberg is non profit). I think that donating 1k$ to Codeberg would be infinitely better than paying 10k$ or 100k$ worth to Github.
rvz 1 hour ago
So Tay.ai and Zoe are still wrecking GitHub infrastructure.
Should have self hosted.
fredgrott 54 minutes ago
has anyone at MS tried unplugging azure and plugging azure back in yet?
[-]
- esafak 0 minutes ago
  It's Microsoft. You're supposed to Ctrl+Alt+Del.
fsflover 1 hour ago
Dupe: https://news.ycombinator.com/item?id=47237018
netcraft 1 hour ago
the day ends in y, water is wet. I really hate that github doesn't have any real competition. Yes, I know about gitlab, but it isnt real competition.
Imustaskforhelp 1 hour ago
Are we serious?
khaledh 2 hours ago
GitHub has been shit lately. What the fuck is going on?
[-]
- jsheard 2 hours ago
  Top-down mandates to use AI as much as possible, and to rip up their infrastructure and move everything to Azure.
  https://www.windowscentral.com/microsoft/using-ai-is-no-long...
  https://thenewstack.io/github-will-prioritize-migrating-to-a...
  [-]
  - politelemon 1 hour ago
    This is very worrying if their mandate doesn't include quality control.
    [-]
    - esseph 4 minutes ago
      [delayed]
    - xeonmc 1 hour ago
      Maybe they mandated to use AI for quality control?
  - khaledh 1 hour ago
    I figured that it would be something like that. But it's been so frequent that I expect the leadership to act decisively towards a long-term reliability plan. Unfortunately they have near monopoly in this space, so I guess there's not enough incentive to fix the situation.
    [-]
    - gobalini 1 hour ago
      How frequent? I think the obsession with uptime is annoying. If GitHub is down, if there’s something so critical, then you need some more control of the system. Otherwise take a couple hours and get a coffee or an early lunch.
      [-]
      - khaledh 1 hour ago
        Frequent enough to interrupt the flow of an entire organization, wasting thousands of hours. Take a look:
        https://mrshu.github.io/github-statuses
- drcongo 1 hour ago
  Does anything running on Azure have an acceptable uptime?
boxingdog 2 hours ago
[dead]