Lessons From CERN

In mid-May, I was very lucky to be able to see a lot of CERN while accompanying my daughter who'd been invited to work there for a week. They were so accommodating of her that they even found me a hot desk to work from and arranged for me to speak with many of the senior team regarding crisis and risk management, software development, computer systems and cyber security. Over the last week, I have collected my thoughts following the conversations and am writing this to share some lessons learned from my time at this very unique organisation. To make things easier to read I have broken down the conversations I had into different areas.

COLLABORATION

Throughout the entire week, I was blown away at how open CERN was and how much they actively seek the opportunity to put people in a room together, trusting that something good will come from it even if they don't know what that might be going into it. There exists a spirit of collaboration that I have never seen before, I have found that in the public sector everyone is protective of their bit of the pie, fiercely defensive for fear of budget cuts or worse yet, promotion to deal with a completely different area of work. The private sector is of course protective of its market share and intellectual property, with even the most open of firms nowhere near the level of collaborative working that CERN seems to foster in every department.

Connie Potter, of ATLAS Collaboration and the CERN Communications group is a master of this collaborative approach, she seems to maintain a directory in her head of all the people at CERN and what they do, it was Connie that arranged for me to speak with the other people at CERN when she learned about what I did for a living. Connie's eagerness to set up these meetings surprised me, but others responding immediately to accept them, taking time from their very busy schedules on very short notice for a meeting without a specific purpose I found perplexing. I would come to learn that this is ethos at CERN and, as it turned out, an ethos that would indeed produce a net positive for those involved.

Collaboration is yet further boosted by everyone being on campus, CERN is the size of a town, and nobody seems the slightest bit bothered by popping downstairs or to another building to chat with a colleague, smart people having near unlimited access to other smart people really seems to yield results. I do appreciate that there's a reason why they're all on-site, the extremely rare, advanced and expensive equipment is not something they could take home with them but coming from a world where so many work remotely, especially post-COVID, the difference really was clear.

In our offices, we try to foster a spirit of collaboration but are always conscious of wasting time. Following my time at CERN, I will be asking my team to try and factor in some intangible benefits when calculating their time and to not be afraid to take a meeting without concrete planned outcomes.

COMPUTING

I was very lucky to have spoken with several of CERN’s computing leads, including in software development and computer projects. With both I was amazed at the scale of everything they deal with. Every problem I thought we had, they have several orders of magnitude bigger and engineered their way around it. I thought our dataset increasing on one of our projects at more than 10TB per week was a serious problem that needed urgent engineering, CERN generate over 100TB per second and employ some thoroughly sensible ways of reducing what’s stored while keeping everything useful, and I'm very grateful to have learned about these. Similarly, any concerns I had about memory usage and efficiency while training our machine learning models seemed trivial when I learned about the size of the datasets used to train models at CERN, their very efficient models, models which Google are helping them to create, not with compute but with funding for people.

CERN's backups and general files storage are an insane logistical task, which they solve with distributed data stored at over 130 sites and having written their own document management system (which seems extremely good from what I could gather) which they have uploaded to GitHub free-of-charge for all. They even explained how they had tried to get companies to help market their document management system so that people could make use of it, but that nobody is thus far interested in marketing free software.

These specific examples aside, I think the biggest take away from my conversations with CERN's computing teams was how far ahead they think. Typically my developers manage a few timelines at once, a customer deadline if what they are working on was specifically requested by a customer for a certain date, a short-term timeline of the next 2 patches (2-3 months) and long-term timeline of goals we have for the product (over the next 6-12 months), for example, to gradually move all user alerts to a new format, or to convert logging to a new format. However, CERN are thinking ahead to problems they will have very far into the future, an example being that the amount of data generated by their detectors and beam monitoring systems when they come back up from the next upgrade in 2030 will be far more data than can currently be written. They physically will not be able to source enough hard disks to be able to write the data, and so are busy researching completely different data recording techniques. Thinking about that problem and investing in solutions 5 years in advance is already impressive, but when you consider that the shutdown and upgrades are planned and happening without there being a solution to this data writing issue yet, you get an insight into the faith they have in their own engineering abilities.

I have taken a lot away from these general computing conversations which I know will influence some of our practices for the better going forward. I will be encouraging our teams to give more thought to some of our longer term challenges and try to plan time early to get ahead of these, we are big enough now that we should dedicate some thought to issues and trends that will affect us more than 12 months out. Some of these will no doubt include challenges with the amount of data that we are collecting, analysing, storing and backing up, but luckily, I have some ideas now for how we can ease these pressures. The biggest area I've felt inspired in following these conversations though is our machine learning model training and I have agreed to feed back some of our ideas and results in this area in case it is in any way useful.

CYBER SECURITY

Cyber Security at CERN is naturally taken very seriously, but the challenges they face are also unique. I was lucky enough to meet with 2 very knowledgeable members of their cyber security team for an interesting chat exploring some of these challenges and of course talking about what we do.

CERN is such an open organisation, encouraging collaboration with thousands of organisations around the world which require them to publish unbelievable quantities of data. I’m sure most people are aware that Tim Berners-Lee created HyperText Markup Language (HTML) and later the World Wide Web initially as a convenient way of browsing the vast quantities of information that CERN make public. However, I assume most don’t appreciate the vast array of networking equipment and applications that are used to publish this ocean of data, or yet still the authentication and authorisation services that are required to control granular role-based access to it all.

I am sure that with a physics hat on, having thousands of researchers from hundreds of universities be able to access your data 24/7 is a fantastic idea, but with my cyber security hat on I had heart flutters just thinking about it. When you add in the amazing resources that are published to the public permanently and the fact that most of their software is posted on GitHub, the daunting nature of their task faced by their cyber security team becomes ever clearer.

I am of course an advocate for visibility being a key component of cyber security, you can only defend what you know about and can see (monitor) but consider the size of the monitoring and auditing jobs to be done at CERN! Their attack surface is enormous; they own more public IP addresses (over 350,000 IPv4 and over 1.5x10^29) than mid-sized Internet Service Providers (ISPs) and publish more web content (over 1.5 million pages across more that 10,000 active websites) than most media companies. Monitoring all of these live assets for suspicious activity, being aware when new vulnerabilities affect any of the technology stack and managing encryption standards across all of the services is a mammoth task. CERN have an expert team tending to this but with an estate so large automation is the only way to maintain visibility and we spent a lot of time discussing this, we also offered CERN use of our systems to help with their mammoth task.

The content that CERN publish ranges from static websites to streaming technical data and everything in between. Even the dashboards used in the control centres for each of the accelerators and detectors, which are also presented on flatscreens dotted around the entire campus, are published via a public website. For anyone that is interested in physics check out https://op-webtools.web.cern.ch/vistar/ which allows you to switch between multiple dashboards that show the activity of CERN in real time, including the activity of the different accelerators. Though I am not sure how long this will be interesting for given the imminent shutdown for upgrades.

The session with CERN's cyber security team was extremely interesting and I think useful for both sides. CERN's team took away some suggestions that I believe they'll be implementing to lighten the load, and I gratefully received some suggestions for new tools we can create for our toolkit, as well as an offer for continuous feedback on our systems for managing international supply chains. I've also come away with a much clearer picture of the challenges managed by truly huge organisations.

RISK/CRISIS MANAGEMENT

I hope I'm not offending anyone by saying that neither Risk Management nor Crisis Planning are particularly renowned for being exhilarating topics, but I have to say that I really enjoyed my conversation at CERN with a gentleman responsible for both. CERN's attitude to risk management and crisis planning if very similar to Alex Honnold, the renowned climber famous for climbing the "Freerider" route up El Capitan (rated 5.13a) without any ropes or safety equipment. I do not mean they are similar in the sense that they take insane risks (as many would view Alex Honnold's achievements to be), but rather that both believe that by thinking deeply about all possible outcomes and preparing for every one of them, risk is mitigated.

I have to say that I agree with this approach and think we could all learn from it. Taking the time for honest reflection on what could go wrong with your business and listing out every scenario with a grading based on immediate reaction to the severity. Then taking more time to explore what would happen in detail for each scenario and how your organisation would respond. Granted, these exercises are not likely to be as heart-racing for you as the they are for the teams at CERN considering whether they could accidentally create a black-hole or calculating the blast radius if the Antimatter they just created accidentally collides with matter. But we should all be doing this, even if it is a little more mundane for the rest of us.

Worst-case scenario planning is nothing new in cyber security, indeed we regularly carry out penetration tests and vulnerability scans to identify things we might not have thought of and play out practical tests known as red-teaming to explore what would actually happen, and how your team would react in any given scenario. I would say though, that I came away from the conversation with CERN wondering if I perform the same level of diligence about physical and commercial risk as we do about technical risks. This has inspired some planned sessions with my team where we will try to consider non-technical issues and, where applicable, role-play them out as we would red-team technical problems.

SUMMARY

This will be the final time I will use the words "amazed" or "amazing" in this post, I promise, and I saved this one until the end because it really did catch me of guard. While speaking with most of the people above, having very in-depth and technical conversations about storage, big data analytics, machine learning, cyber security and risk management, I found out through the course of our conversations that most of the people originally trained as physicists, not in IT. I guess I should have expected this given where I was, but I didn't and it really made me think about the way CERN utilises the smart people they attract and how everyone there is so focused on their core goal of the organisation because they truly understand it.

Having somebody with a doctorate in theoretical particle physics handle the organisation's major IT project, or building software, might seem like quite the pivot, but when you consider that the organisation entirely revolves around particle physics and their IT systems and software applications exist in order to record and analyse that data, having somebody that deeply understands the core goals of the project actually really makes sense. I'm not sure what direct lesson I can take from this into my world, but I did find it fascinating so thought I would share.

Considering that the conversations I had at CERN were the result of chance and in some cases set up the same day, I am eternally grateful that they happened. While I am sure my daughter learned much more in her week working her dream job at CERN, I feel like I learned a lot too and I will be implementing the ideas inspired by these conversations over the coming weeks, be forewarned, I may even do a follow up post if they worked! I am extremely grateful for everyone at CERN's generous donation of their time, and I look forward to continuing to work with their organisation going forward.