[Music] [Music] so you all know it uh who of you knows Disaster Recovery oh I wasn't expecting that I was expecting a lot more so when things go wrong there is the option of uh well having your work preor prepared and therefore you have a disaster recovery plan there are recovery time objectives there are recovery Point objectives um But basically you should have a backup idea when things go wrong it's called Disaster Recovery um but Disaster Recovery in your data center is like a little bit of ordinary we have today uh let's say Disaster Recovery

one year later and in this case when disas how many 10 years later uh and in this case when Disaster Recovery hits rocket science piston maker thank you all right uh thanks for coming so I'm going to try and tell you a little story about um trying to save a dead satellite from becoming space junk with a few little firware hacking tricks so so uh very briefly who am I I'm a computer science student from TU Berlin and I play CDF sometimes where I mostly do like reversing challes and I also make some mods for

video games but the important thing is I am not an aerospace engineer so a logical follow-up question might be um where do I get a satellite From well it turns out my University actually has quite a few of these things they've launched 30 and the interesting thing is the way these things are funded turns out to be they get funded to be built and then launched and then they get operated for like a year but the thing is a lot of them actually Outlast that oneyear Lifetime and so what do you do with them after

that one-ear Lifetime and so the unit came up with something called uh they they did a class Basically which teaches students how to operate spacecraft and once you pass the class you can then use the old spacecraft which are not funding anymore uh to keep doing more science and so I took this class in 2022 and worked on a satellite called bz 9 but actually today we're going to talk about a much older satellite than that called Bard one this is a one U Cub set so it's 10 x 10 x 10 cm and weighing

kilogram and sort of like a standardized form factor Basically and and at the time was a pretty new form factor this was like the 33d one of those that in space uh it actually was the first CU of the University too with like a new team everything designed from scratch on a pretty low budget so apparently a pretty adventurous project I'm told I was in Primary School um so and nowadays there's like 4,000 of these things in space it's had like a one-year primary Mission and it Successfully completed that and the thing that makes it

interesting is um it's actually in such a high orbit that it's going take more than 20 years for it to come back to earth so we have 20 years to use this thing one problem uh in 2011 so after the primary Mission uh it started returning invalid data so like nothing useful and so they did some diagnosis on this and they determined okay maybe this is like a radiation related failure because in space there's More radiation so sometimes Parts fail so what they did is they switched computers so they had two computers on board for

redundancy for precisely this reason and they switched to the other one and it worked again perfect then it happened again 2013 and they were out of computers unfortunate so at some point uh I knew about this and randomly met the Project Lead at a party at the University and we discussed this a bit um and know some theories on the precise Symptoms and we eventually actually kind of agreed that it didn't really seem like a hardware failure it seem more like a software failure and so maybe we actually have a chance to bring this thing

back to life so here's the plan we're going to figure out how a satellite works very briefly then we're going to figure out what the problem is then we're going to fix the problem and then we have a new satellite so this is a satellite um we don't have Time to go through all this in detail the thing I want to point out is you have lots of duplicated components and these organized in these different subsystems these different boxes and the subsystems on this sat are connected together by uh two can buses for redundancy and

can is a sort of Automotive bus protocol and it sort of found its way into lowcost um application where you need reliability and stuff um but we're mostly going to Talking about these two the communication system and the onboard computer because that's the thing that does the limiter so Communication System consists of uh two identical strings basically for redundancy and each string consists of an antenna pretty self-explanatory I think a transceiver that's basically like a radio it can transmit and receive and then you have a thing called a terminal node controller and this is like

the microcontroller That connects to the can bus and bridges the connection to the radio and this encodes and modulates the Telemetry that's the stuff you sent down from the spacecraft to the ground and receives and then demodulate decodes Tel commands which that's the commands you send up to the spacecraft and what this gets you is have duplex commun iation and so have duplex in this case means um you can only do one at a time you can't both send a command up and get Telemetry down At the same time so you have to sort of

Multiplex between these two and we'll talk about how that's done in a second the next thing is the onb mode computer and this is really the star of the show so this does everything right it uh figures out um it does all the algorithm right it collects the sensors it does Telemetry it processes command and controls everything this is based on an arm 7 tdmi core running at about 60 MHz in terms of compute power I think That's about equivalent to somewhere between a Game Boy Advance and Nintendo DS so it's not a huge amount

but you don't need that much right you have 2 megabytes of SRAM so again not a huge R then you have two flash chips you have 16 megabytes for software so it's firmware and configuration and then um you have a Telemetry flash chip and that's used to store Telemetry data uh so this looks like this uh these components all the different components Get integrated only these like 10 x 10 cm pcbs uh and you can actually see the Symmetry between both sides right so this is your on mode computer and you have one on the

left and then one on the right and they look identical and they then get stacked up uh and once this loads there you go they get stacked up like this to make a cube and then you put a bunch of solar panels on it and you have yourself a new satellite You then put that on a rocket and put in space so where is it in space uh it's located in a 700 km orbit around the Earth so it's 700 km above the Earth's surface and it's moving about 7.5 km/ second and so that means

it completes one full revolution around the Earth every about 100 minutes or so and so it's doing this you can see this is a graphic looking from the North Pole down and it's doing this mostly north to south and then the Earth where Mark bin Where the ground station of tuberin is it rotates underneath that um and then you have two places in the rotation of the Earth where the ground station and the satellite plane in which it Orit is aligned right one's in the morning and one's coming up just now in the night and

when you have that alignment and the satellite passes overhead then you have line of sight to your spacecraft and that's when you can actually talk to the spacecraft so there's a path just now And these are typically 10 to 15 minutes long um and you get three to four of these in the morning and three to four of these in the evening right and that's the only time you can talk to your spacecraft at least in this kind of orbit so what do you do in these passes uh you try and command your spacecraft right

so by default the salot is not transmitting Telemetry it's just listening and that's mainly for power reasons and so what it's doing is Waiting for commands and once it receive the command it switches into active mode and what it does then is it's going to transmit these uh bursts of telemetry data every about 3 seconds or so and the reason it's doing this in this burst pattern is it needs to leave scaps for you to be able to send more commands up to it right and so in those scaps between those burst of telemetry data

you can send more commands and then if it go 60 seconds go by without it Receiving any command then it's going to assume okay I'm out of range of the ground station I'm going to stop transmitting and go back into standby mode so the question you might ask is if we only get 15 minutes for eight times a day what do we do with the remaining 22 hours and whatever minutes so there's a concept called offline Telemetry and this is actually kind of common on spacecraft but it's differently uh everything every space AIT differently I

Think um and so what you do is you every 90 seconds the satellite takes this current Telemetry data and writes it to flash and then each of these Telemetry Transmissions that we talked about these frames is composed of one set of the Telemetry from just now and then fre sets of that recorded Telemetry from earlier and so during each pass you sort of play back what happened during the day to get a complete picture of what's happening to your space at any one time All right that completes the little crash course now let's try and

figure out what's going on so this anomaly we're going to focus on the second anomaly and this happened in 2013 in March and so they contact the spacecraft and instead of getting data down in these cemetry frames they were go empty frames and we'll look at what empty means in a second but they were curious fingers it didn't stop transmitting after that 60-second timeout we just Discussed it just kept going for some reason okay let's C what empty means exactly so Telemetry is a standardized thing there's a consultum committee for this called the consultative committee

for space Data Systems and they standardize how Petry can look and bizard sort of follows the standard everything green and below in the slide follows the standard the whole four burst thing is uh kind of a custom thing but basically you have these frames or Master frames and then each Master frame contains four soal transfer frames and this is like the transport layer in a network and then underneath that you have Source packets and that's kind of like a UDP packet on the layer below there and so the first of these uh transfer frames is

dedicated to the live TM from just now and the last three those are dedicated to the record imetry from earlier now I'm not going to bore you with all the details of ccsds Because it's a really long and boring standard but uh I'm going to just go through the one highlights so this is one transfer frame left on the left and what you have is you have a magic number in the beginning that's that ASM or attached synchronization marker then you have a virtual Channel identifier or VC ID and this basically tells you the context

of the Telemetry right because you can have the same sort of telemetry link Down live or it could be from a Recorded Telemetry from hours ago so that dises between that you have a master Channel frame counter or mcfc and this is just a little counter that ticks up every time a frame is sent and this is useful because you can um check that counter and if there's a gap in that counter you know you miss the frame then you have an apit and this really determines the type of t in the format of telemetry

sent so it could be like normal housekeeping Telemetry or a flash Dump or um I don't know other image data right and then you have the packet data field and that's the actual user data and that's formed according to that apid so this is what a hex stump looks like for a full frame so you have the different components highlighted again I just want to illustrate this is what it normally looks like and so this was what it looked like after that anomaly so you have a whole lot more zeros but you do have still

have some stuff left right so I'll do you the favor of passing it for you so that Asm that that magic number that's still correct that little counter thing that's still correct but apart from that the online transfer frames with the life tet Tre those are completely zero and the offline transfer frame that supposed to giving historic dmetry they have a little marker set that says I try to get offline dietry but there was nothing stored interesting so this is all the symptoms we get and So now from this we have to figure out what

the problem is and so the that we're going to go through very quickly a flowchart of how the onb computer generates um generates this Telemetry there so there's two tasks the first one generates the Telemetry the second one transmits it this is the assembly task this assembles the individual transfer frames which later then get packaged into Master frames so a little timer there at the top called assembly time And this is that 90c timer that we talked about um that generates a frame every 90 seconds to save it to flash and can also go to

a lower rate when you actually live down lry this assembly time thing is a configurable parameter SED in in Flash somewhere so it's a setting but it's set for 90 seconds and so when this 90 seconds elapse it generates a new transfer frame from collect allometry puts it in Flash For later down linking and also puts it in SRAM somewhere for live down linking then we have the transmission task and this is the thing that actually responsible for sending the Telemetry down so we have first have a three second timer on the very left there

and this is to enforce those gaps between frames in which we can send telec commands so it's waiting until 3 seconds have passed since we since the last frame then it's going to check that Second timeout we talked about and that again is a configural parameter called HDC timeout it's but it's set to 60 seconds and then we're going to construct the master frame out of we're going to take the online M online transfer frame from SRAM from just now and we're going to read free transfer frames from flash to playback the lemetry and then

we're going to attach that magic number and the counter and send it all to the com okay so now what We can do is um we can highlight and color all these different things and if they're working or not based on what we've observed so for example um that 3C timer that was normal on even with the empty Telemetry we also know that the ASM and mcfc those were still written so those are normal two but on the other hand that 60-second timeout that wasn't working anymore and we also have no evidence any was any

T was being written to flash because uh the marker said There's no flash data so now the challenge is what's the commonality between all these things and if you think about this for a second you end up realizing every single thing that's broken is controlled by these configuration parameters so if the HDC timeout parameter was really big it would look like it's never timing out but actually it's just timing out after a really long time and if the assembly time parameter Was really big well it would look like it's never generating Telemetry but actually just

doing that every 50 days or something so where do these parameters come from so they St an external flash along with the software so it's a shared flash chip there's three software images in the first part of The Flash and then in the back of the flash there's uh a bunch at one page per subsystem of the onboard computer for parameters and you have different FL Images because you want to protect against radiation induced bit bit flips so that if one image get corrupted you can use a different one so that's why there's multiple software

images so okay what could be corrupting these parameters in Flash so I literally just did control F for flash. write and we're going to go through everything that WR to flash because actually not a huge list because this is supposed to be mostly stable so first thing uh you can It by command so there's a command that allows you to change a given setting and uh adjust it like that but we didn't send any commands at the time that this anomaly occurred so we can rule that out the second thing that writes to uh writes

to flash is something called launch an early orbit phase and this has to do with um all the deploying and stuff that happens just after the satellite leaves the rocket because it's reconfiguring things there however that Was done in 2009 so we can roll that out too and the last thing that actually writes to flash during normal operations is the boot counter of the onboard computer so every time this thing reboots it's incrementing a little counter that says Okay um I've rebooted this many times and that happens on every boot and that's the only thing

that should be really running so let's have a look at that so a very interesting thing you notice first of All the boot counter is s in the Telemetry system parameter section um they had to put it somewhere so I guess they decided to put it there so what happens on every boot is um they read the parameter data from flash containing that entire page P from flash they modify it to increment that counter they then erase the flash page entirely and then rewrite it now you might be asking why do they do that in

this convoluted way and don't just directly rewrite it And that has to do with how flash works in flash memory or at least this type of flash memory you cannot program an individual zero bit back to a one if you want to change an individual zero back to a one what you have to do is you have to erase the entire page to bring it to an all one state like once is the erase State and then flip all the bits you want back to zeros and so they have to do this little dance if

they want to do any zeros there but the problem is there Is a window of time between then that page is erased and when that page is reprogrammed where the entire page is erased and if at this unlucky moment you have a reset of the spacecraft due to some reason like power loss or whatever then you're going to end up with that page entirely erased and all once and because these parameters are unsigned integers all want is the maximum value so the space is only going to generate limit every 50 days and unfortunately That's a

bit too slow for us to see so we have a theory of what could be wrong so now it's time to test the theory and so we can do that because there is a telecommand that temporarily lets us adjust this assembly time variable and so we can try in changing it back and see if that helps we can change any of the other parameters in that section back unfortunately um but we can change that one and that should be good enough to get us some Telemetry at least so this is what the the Telemetry page

looks like when you're operating the spacecraft and it's this classical cliche thing where you have black text green uh Black Background green text and this is how it would usually look when you're operating the spacecraft because you just get all these zeros so there's no valid packets in there and now we use this little magical little command and wouldn't you know it we actually can get some Telemetry back for the first time in over 10 years and so it turns out this thing fully functional after all right um all the Tet says it's exactly the

same state it was in um 10 years ago in 2013 um I also managed to find a telecommand that let actually directly dump flash contents down was never used before don't know why foreshadowing um but uh it turns out we can dump this down and that confirms to us that the section is Indeed erased the thing is we got to St Telemetry down now but we can't operate the spacecraft like this because a bunch of features are still broken that parameter page contains a lot more than just that one parameter and so a bunch of

voltages are like wrong because the configuration is wrong so we need to permanently fix this somehow but before we do that let's stop messing around on the real spacecraft and try messing around with one on the ground because if We mess up something on the real spacecraft we don't have a replacement in orbit unfortunately so we need some development setup the way this is usually done is you build multiple units of your satellite and um then you have one that you send to space and that's called the flight model and then you keep one on

the ground to do all your testing with so you test stuff on the ground before you send it to space and that's called an engineering model the Problem is because this thing was broken for 10 years and they needed the parts for something else they dismantled that that development model so it's no longer existed so what we did is because this is a satellite series I worked on B 9 in the class for example we took one of the later satellite models and then we swapped components around to make a Frankenstein um bizard one sort

of and that ends up looking like this so sort of barebones version of the satellite With without the outside uh skin you have little additional board in the middle there with s to tell you like what's on and what's off and then you have a bunch of stuff hanging off of it so you have a a jte adapter off it and you have um some that's the transceiver in the top right I couldn't manage to screw that down so we just sort of have it laying there um so that gives you jtech and everything and

so that sorts the hardware side out then now the Software side the problem is that no one that worked on the spacecraft still worked at the University um but unfortunately did have that contact to that one uh Project Lead and so using that I could um we managed to piece together the complete set of the mostly complete set of binaries and source code the problem is we had binaries we had source code but we didn't have symbol Maps mapping the binaries to the source code so unfortunately I did have to go Through and manually uh

disassemble the firmware images and add all the symbols everywhere it's not super hard right because you have the source code so it's not really reverse engineering but it's just annoying so we have that set up now let's try and actually fix the issue so the first thing you might say is okay can we just use a command to rewrite that area of Flash because I mentioned earlier there is a command to do this and that command is called command red Dward and here's what it does the problem is this only works on that parameter region

and the first thing it does is it reads a 16bit size field for that subsystem now unfortunately that size field is erased so it's going to read as all once or 64 kiloby the maximum size of a parameter page is 8 kilobytes this will not go well um the next thing it does is it reads that data into an Sr buffer sized for 8 kiloby so it will massively right overrun that Buffer but let's assume works out fine because everything behind their memory is unimportant it's not but let's assume um then it will erase the

flash page to try and do that rewrite in a second it will modify the buffer and Sr according to the command and then it will try to rewrite that 64k buffer into 8ks of Flash and this is where they finally have a bounce check which prevents this from happening and so what you end up is is Once a page has been fully erase you can never again using the existing onbo software write to it it's sort of burnt is the terminology we ended up using so we need to somehow enhance the existing onboard software so

we need software capability the problem is this spacecraft does not support software upload um so we're going to have to get a little bit creative so the first step to so what we're going to do is we're going to basically um patch in we're Going to have to do a little weird software upload and we're going to add in the software upad functionality so we can do a proper software upload later and the cool thing is because parameters and software are s in the same flashship that getting getting us software upload also let us Direct

rewrite that parameter section so we can both fix the bug and just directly fix it so the first step to doing a software we have to somehow get code into memory now the First option might be can we do it into SRAM somehow this would be appealing because uh SRAM is very safe it's gone after the next reset so if you mess something up um it's a non non-permanent change the problem turns out to be we don't have enough time to do that we get 15 minutes per pass right and the spacer is unfortunately unstable

enough that after one pass it might just set and wipe all of that clean and so we couldn't reason assemble like a payload In Asam the other thing is there's no actual feature to write to Asam so we'd have to do some weird thing where we like use parameters of telecommands that we queue up somewhere or abuse some of feature in short we could maybe do it but it's not desirable and so the only other way to put new code into memory is the only other thing is The Flash and the only thing that we

can use to write into flash is this Command right dword command and that can only write to Parameter uh space but that turns out to be okay because this is all one big flash chip and the permissions are everywhere there RX everywhere so you can execute from anywhere so we can put new code AS parameter data in that parameter section that does come with some caveats though and first of all we have this wiping risk again right if one of these un opportune resets occurs and wipes the page well now we've wiped all the parameters

in that page right and There's actually quite important configuration data in there we got a little bit lucky that the Telemetry section was the one that was white because if like the com section was erased it wouldn't be so cool fortunately there is one single unused parameter section that we can use for this and so this gets us 8 kilobytes of space where we can put new code in actually we only get 4 kilobytes of space because they have a bug in their Code which means that the top bit is masked off so only the

lower half is accessible well actually we only get two kilobytes of space because another another unrelated bug that means uh yeah so this does not fit a full 300 KOB software image it turns out uh so we're going to have to have to somehow leave the existing on mod software running and in like sort of enhance it so how are we going to do that well the first thing we going to need is we first someone need To even gain code execution and gain execution over our code that turns out to be the easy part

because there's this telecommand command jump to image which is intended to switch between those three different software images in the early part of flash but you can totally use that to go to anywhere else and this is intended to be like a um a permanent transfer of control but actually this is implemented using a c function point pointer call and so you can totally Actually just return from this and leave the original onard software running it reacts a little bit weird but uh it turns out to be fine and so we can do that so

this gets us code execution and leaves the original software running but now the question is how can we persist ourselves to to keep running after that because we can't just we have to change the onboard software somehow right but we can't change the flash one because we're running out of it right so that Would be a bank conflict we can't both read from it and write to it but even if we somewh like jump through SRAM or something it would be very risky because again we have to do this erase rewrite thing and if we

erase a page of the onboard software and then try to rewrite into it uh and we have a reset we might just end up ricking the onboard software which would be not so cool so the question now becomes how can we modify the onw software without modifying Flash And so we can use the trick for this um and we can use something called C++ virtual function tables uh mouth for but basically C++ that's the language has written in and that supports inheritance and so what you can do is you can have a base class and

you can have a derived class and you can overwrite methods but the compiler when you call a function on an object of the Base Class Type the compiler doesn't know what type that object has right it's polymorphic that's The whole idea and so the question is how can the compiler know which implementation to call if it's polymorphic type and so what the compiler does is in every object that has a virtual function or derives from something you have a hidden field at the very start it's always opposite zero not quite but as usually opposite zero

and this basically points to a table in Flash that says for my object for my type these are the right functions to Call now the good thing is that's an SRAM right because this is um because it's a part of the object variables and C++ inherit is actually used quite heavily in the onboard software and specifically what they do is for each part of the onboard software they have these command interface structures they have like um a command interface for power commands and a command interface for attitude control command whatever but these are Global variables

and so They are located at fixed addresses in SRAM and we can find those addresses by dising the firmware images so what we can do is we can hijack that vtable pointer ourselves to point at our own implementation and hijack control flow at that point specifically without but we can reuse all of the code for command passing from the existing onboard software without having to do it ourselves so let's have a look at how that looks like so this is some example Code it's paraphrase but the onboard computer this is what the onboard computer does

when it gets a command so it's going to be like okay is this uh the loader subsystem or is this the PCU subsystem and based on that it's going to call the execute function of some global object now let's say it's a loader command so it's going to then it because GL loader command Handler is one of these polymorphic types right it's going to look up in memory in SRAM that Virtual function table pointer that pointer will then point to a virtual function table in Flash again and that then has the actual pointer to the

implementation for that type and so then the compiler can go okay okay I will fetch the pointer and then call that okay so now what we can do is we can intercept the execution here so we can overwrite this vtable pointer in s to point Instead at the parameter flash where we put our new little shiny Payload that we designed and so we can give our own replacement V table that's compatible with the existing V table and point it at our own Handler function and so what we can do here is we can check if

the command codes is one of a new a new telecommand that we would want to add and if it is we can handle it and if it's not the Z Cod part we can just directly forward to the loader command Handler code and bypass the V table entirely and so actually for all the Existing commands we can use the existing onb mod software so we only have to implement we don't have to reimplement any code from the existing onbo software we can Implement just our new code and this helps really a lot in keeping with

this 2 kilobyte size boundary so that all that all works and it's great the problem you run into is one of bandwidth so a software image let's do some math a software image is 300 KOB big and unfortunately you Limited to four bytes of payload per telecommand uh that's a limitation in the com system we can change and there's also another limitation in the com system that says we can only do one command per second because it's intended you do one command per one of these gaps we get 15 minutes per each pass of the

satellite and we get six passs a day very optimistically we made up at least two weeks for a software update which is quite a lot and realistically it's going To be like five times as long probably because it doesn't go that way and we might want to do more than one software update too so the question is can we do any better and the answer as we can because in fact the designers of be that originally wanted to do software up so they added support for longer telec commands so they start that but an abandoned

and so the the CH system supports it perfect the OBC supports it too unfortunately they don't support the Same version of the protocol and so there's an incompatibility and they don't accept each other so we have to fix that so this is the problem and we're going to start the bottom this time every one of these rectangles is a message sent on the canbas with eight bytes of payload and so the way these transactions on the can work is you want to send more than 8 bytes you have multiple messages you have a start message

in the bottom left there with That control number that contains a stud do random control number then you have a bunch of data messages and then you have a stop message and this stop message contains that control number again and that's compared to make sure that you didn't have two transactions clashing on top of each other and you also have a check sum over all the data that was sent in that transaction the problem is the com system add an additional data message and so the pro what will happen Is when the OBC receives the

messages from the com system it will reinterpret that last n plus1 data message from the com as um the stop message and we'll try to use it as the control number and check sum and so of course that's not going to work right um yeah so that's not going to work because it's going to misinterpret the numbers so what do we do about that what we can do is we can actually hijack the Interrupt Handler of the onod software so what we're going to do is there's an interrupt Handler which gets called when a can

message is received and we can overwrite that with our own custom code and that's not a lot of code to reproduce so we can simply copy the existing code and add our own and what we can do is we can listen for this n plus1 of data message and if it arrives we're going to Simply store it to the right place in asram and then discard it And that way the onboard software never receives that spous message and misinterprets it as the stop message then we're going to wait for that stop message and we're going

to use that data we saved away to modify the check sum so that it looks like it's correct because that check sum of course is including that n plus one data method so we have to remove that out of there to make it work and then we forward it onto um the uh onto the main onboard Software okay perfect we have a plan test it on the laboratory model it works great time to do it in space um so we came with a little plan we have to somehow get this 2 kiloby image into space

and the risk we going deal with is this reset risk that if there's an accidental reset we might uh burn that one unused perameter page we have and then we're out of unused pages and so we came with a sort of three-step plan to do this we First have a little safety image and that's just four instructions and we put that not in the unused section but in the least important section so if that uh breaks during those four bites of upload the spacer is still going to be fine it's going to lose some functionality

but it's going to be fine and so all that safety image does is it's just going to call a flash R to reset the size fit of the unused page and unburn it right so we can use it to Write to again so this gives us insurance that even if we burn a page uh we can recover from that the next thing we have is we have a loader image and this is just some convenience features because it actually takes quite a while to upload the 2 kilobytes in just four byte chunks and so um

we included some convenience features like we fixed the Telemetry system temporarily so we get some Telemetry back and we also um did something like the 60-second time out we Brought that back so we don't have to worry about it's just sending for for 70 minutes or something and then finally we're going to upload the main software image and that contains all the actual features we talked about just now so this ends up being about 600 commands um which takes a while but okay let's get started so we started with the safety image uploaded that that

all went great we didn't have to use it yet so that's great we then installed that uh loader Image and that also worked right and we got some Telemetry back and the fixes the temporary fixes seem to work perfect now while uploading that main image I noticed something interesting so this is the main switch page which contains all the different switches on the spacecraft and tells you what's on and what's off and I would like to draw your attention to the Camera slide there so I was doing this and then at some point I sent

a flash Dump command that unused command we talked about to see how many of the commands I had sent up actually got back uh got received and which which ones weren't received properly so I know where there's gaps and it turns out after I did that the camera turned on which is kind of interesting because I did not command the camera to turn on so it's a little bit roed at this point I might be honest um but I eventually found the bug so this is the except of The code um does anyone see the

bug so um they forgot a break statement um so whenever you do a flash dump command you ask you also command the sell to take a picture um we did that quite a lot actually um so uh oopsie um now the good thing is I talked with the project lead and he told me the camera is actually incomplete the software it has never worked they've never taken any image using this camera and which is which is Kind of sad right and so what all it's going to do is it's going to power off by default

after 60 seconds and in that time consume 60 mils which is like fine right we can deal with it and the other thing is there was really no alternative like we need that command to dump the flash down so we're just going to continue onwards and upwards so uh ready we upload everything thing um uh it took nine passes to upload that entire image and then we Sent the magic of telecommand to rewrite that parameter section to give the proper value AR and for the first time in well still 10 years we actually got the

full set of telemetry down including all the voes and everything and so now we've restored the spacecraft back to the factory state it was in before it happened in 2013 so now it's time for my favorite chapter um now I could bore you with a bunch of so first of what what you have to do now Right you have to recommission the spacecraft so we have to go for every single system and check everything out right and I could do that and it's kind of boring the it's the spoiler is everything is working fine but

what I really want to focus on because this is actually what found my interest in the satellite is the camera because a sunlight that doesn't take pictures you know so the camera softw is incomplete we learned but no one could Actually tell me what exactly is missing from this camera software the problem is unfortunately I break the laboratory model camera because um what happened is then if you remember we tried to assemble uh a new development setup right from Frankenstein PS I used the right PCB but a strapping resistor on that PCB was in the

wrong place and so what I did is instead of connecting the camera to the onboard computer I connected the camera to a power Switch the camera was not very happy about that uh I think I've not tried swapping it back yet um uh so it's now drawing 25 it very briefly do 80 milliamps and now it's drawing 25 milliamps I think that's a bad sign um it's also not sold anymore except for like Turkish equivalent of eBay clance which I cannot get to If you happen to have one you don't need uh hit me up

um fortunately we found out earlier that um the flight model camera that's drawing 60 milliamps as it should right so that seems functional so we're going to do some on orbit debugging um so we can use the um flash dump command to actually download from SRAM instead because it's just a m copy it turns out now which again will that will try and take a picture in itself but we can try and picture to take a picture while we're taking a picture and it sort of works out so we can do that and so I

did that and um this is one of the Transactions from the camera to the obci trace and if you look at the data she for the camera this says I have taken image of size 9.5 kiloby and I'm ready to down link to you interesting I thought this thing was not supposed to work what happens if I press the download button it does work now I can tell you if at some point in the last 10 years it's started working or if it's always been working I Don't really know but so this is to explain

the picture in the top left that's actually Earth but the auto exposure has uh moved really far down because it turns out getting the proper exposure of Auto exposure on something that's 50% black and 50% pure bright is very difficult as that in that in the bottom right that's actually the sun blinding the thing but with a little bit of practice we can actually get the attitude control working to and actually Point it at the Earth and take some cool pictures so it's not the greatest quality of camera let's be honest um it's it's 600

40 by 480 you will also notice there's a slow little um the bottom right isn't quite right that's because someone wrote less than instead of less equals uh so we get some nice pictures so this is this is Poland so that little star struct I'm not sure if you can see it but there's sort of staring there if you Look at Google Maps that's actually warw and then we can other pictures so this is Hungary and in the top right you can see that like a little more Jang than a cloud would be and that's

actually the ORS from above and then down there is somewhere as Budapest now this is where we come to my biggest problem with this talk I tried so hard to get a picture of Hamburg but for the last month it has been non-stop cloudy over Europe and so the best I can do is This um so North is actually to the left in this image in the bottom left is Hamburg I promise and we can get finally a nice little image of the sun rising and I'm actually very happy to present uh the first research

result from bezard in the in the past 10 years the earth is in fact round and so this is where we at today everything that's green is we tested and It's working everything that's blue is we haven't tested it um so the only thing that's uh slightly not working is it's still working there's one sense that we're just drawing like 20 milliamps more than it should but it's still working so this thing is basically Factory new uh which is great and so I'm actually a little bit ahead of time but um I might do a

bonus but I want to close on a little note about reliability so this is um a plot from a paper and This plots Cube set reliability over time so the x- axis is 2 years here that's time and then the y axis reliability now the first thing you notice is this plot does not start off at one it starts at 08 So 20% of these things uh break they call Dead on Arrival like they never work to begin with and then after 2 years you drop to 60% reliability and so we've been going for 15

years now this thing is working like brand new basically and so we've We've done made some fun of the developers a little bit of some of the code right because it's funny but I really want to thank the entire bz1 team for making this awesome spacecraft that still works after 15 years and I would like to thank some people from University um my thesis adviser who allowed me to do as my bachelor's thesis which was cool and some other people and with that I'm actually done and I would like to ask if You have any

questions amazing thank you very much that was amazing uh no one can tell me that software engineering isn't creative that was really cool so uh gentlemen creatures ladies um people of the earth um if you have questions oh and space um if you have questions please line up there are microphones and also we have uh signal Angels um who is re is um reading your questions from the internet And while you're getting ready I also have a question which you've probably also already been asked um you have successfully captured a vehicle in space would you

consider yourself a space pirate you make a very good point I technically had permission which I think means not technically privacy but technically also the person I asked didn't own the spacecraft true true so perhaps by proxy I'm going to remember that I'll figure It out yeah thank you very much and then I see we ALS already have people at the microphones I would start with microphone number one here question please yes thank you great talk um you showed at us that you have basically fixed computer number two uh have you considered uh computer number

one as well an excellent question I have a slide for that so um the we fixed computer number one because that's the one was active after the failure in 2013 Eventually I actually found the logs from 20111 because they were migrated to a different system so I didn't have them it turns out it's actually quite interesting all of the symptoms are reversed instead of sending indefinitely the satellite sends one frame and then stops immediately and instead of not containing dat the frames do contain data but garbage and so I don't actually I have a theory

on this and I eventually figured out in the data sheet of The Flash chip what happens is um there's something called pre-programming which is you don't want to erase the same one bits back to one over and over again as far as I understand it so what you do is you program everything to zeros before you program it to one and so what I think happened is this reset in this case on OBC number zero it interrupted that pre-programming step and so instead of all the parameters being really high they're all zero and so that

completely RS the symptoms so it's a theory um we're probably not going to switch the onboard computer because there's a risk that in the last 10 years it broke so there's no there like I would really like to but there's no nothing to be gained for it so it's difficult to justify it but at some point OBC number one might fail and if that happens we will have to but it looks like I think it's the same malfunction even though it looks Differently thank you the interwebs has a question yeah there were a multiple question

let's start with how do you go about figuring this out is there only the source code or do you have other documentation to look into it so the it's spar right um because nowadays they do this better but back then this was like pretty it was pretty relatively early days and there wasn't a lot of archival of stuff so we actually this Had to do this thing of f emailing people to get the software images and it was kind actually kind of difficult because for some s images you don't know if it's the right one

unless you get the guarantee the thing is the the reason I actually came to this is um because I was working on bz 9 I had the source code for that thing and I figured out some Sunday evening at 1: a.m. that if you go far enough back in the git history you get to beard 1 and so I was Bored one Sunday night and was like okay let's see if we can figure it out and it turns out yes we can but um the documentation is is kind of Spar it's a lot of a

lot of the actual work I I did was like this this took a this whole thing took a while right but a lot of that was just like running around and getting the right documentation from people but I have a pretty complete set now fortunately but some components I still don't have it for but fortunately For the components I did need for this uh I had it I had a lot of share help from the people at the at my University because they do still operate satellites so for example we had to like fix the

ground station because um the ground station was not meant to operate with an old satellite anymore they had upgraded since so we had to actually backport all of that stuff to get the ground station working and stuff like that so there was a lot of human labor of like going Around and and talking to people involved but everyone was super helpful um and gave their time for us because they all want this is I think the oldest one that that all the satellite that Tu install controls themselves I think so it's kind of like a

mascot I suppose so everyone wanted to help which was really cool thank you yes and then there is another question on microphone number three please thank you very much for the talk uh since you started now bu fixing Do you uh will will will you continue with the camera system now too yeah so the question is what do you do with it now right um so there was some consideration if you take the code from the modern be satellites and like backport everything the problem is like some Hardware was changed so it's like kind of

difficult so the the I think I do want to do some more software updates for this um to get like it's like a test bed at this point right like I want to Get some one thing I really would like to do right I want to fix the camera bug right I want to change the exposure on that I would really like to get video if I can because that would be cool but uh it might be a little bit difficult so and but like the primary Mission at this point is just keep this thing

alive as long as possible because it's it's going to decay in 2047 is the projection if it's still going in 2047 I will be happy um but yeah amazing thank you uh The internet has another question yes how often does the satellite reboot and is this noral Behavior yeah that's interesting question right because um having that reset so it resets quite frequently actually but that's because it's there's a bug on it that means in safe mode it it resets every 15 minutes or so but actually it would still be a pretty big coincidence actually to

have it reset at exactly the right point in time but in That step between the flash arrays and rewrite wouldn't it so um there's actually oops this is some of the code that does that so um they erase the page right and that actually takes a little while and so they have this timeout function that Waits until the erase is done now sometimes on the flash chip the Aras fails um and so they have a timeout there to retry it in that case but if you if you look at this code for a Second you

will see that this is kind of backwards what they meant to write was while we're not done and the time that's elaps is less than 300 milliseconds we're going to keep waiting what they actually wrote is while we're not done or we've been waiting for 300 milliseconds of longer we're going to keep waiting indefinitely and so what happens is this gets into an infinite Loop in case of a single timeout and then there's a watchdog timer which Doesn't get triggered and that actually hard resets the Satellite by itself so there actually bugs which can lead

to a reset precisely in this moment and that's what makes us more likely thank you we have time for another question and up there is another one at the microphone thank you very much for this amazing talk my question why wasn't the parameter page or the RO paramor section of the flesh store multiple times like The firmware because it's like common practice for such things like even on the NES safe States so when I first talked about this with the Project Lead they had a design that said these primet pages are guarded by check sums

and so what they would do is in case of a check sum failure they would simply REO reload reasonable defaults and that would sort of protect in a similar way to storing multiple times that put protect against that the problem is this is the only Line that mentioned these check sums they they're never actually checked or written anywhere so they not so this is the kind of thing where they they probably plan for this but then ran out of time to do it because they had a pretty hectic launch schedule as I understand it so

they they they Tred to do this but then run out of time to do this thank you very much I think the internet has has an additional Question yeah there is a question how do you get involved in a project like this when you're not uh studying at T do you have an idea about that and another question or they're adding if you're an allowed pirate you're a space Privateer o I dig space Privateer that's good okay um how do you get involved with this yeah it's kind of tricky right um tuin has this thing

right as it's a class they did I I imagine other universities have the same problem with Having old spacecraft they can operate because I think they've found it similarly I think I don't want to say the wrong room but there's another University which has I think flying laptop which is another um stard thank you I was looking at you uh so they have a a similar student thing like that so maybe check if university has some old satellites they don't need um they might you know uh beyond that yeah yeah I I am Privileged right

I guess that my University had this so I I got lucky because I got to take this class but check out your local University because they might have something El or if they don't have maybe you can ask them if they all their old satellite they're not funded anymore and maybe you can start something would be cool true thank you additional questions from the internet I see at least a microphone Number four lining up is there possibility to um get the signal from the satellite for ourself yeah so um B is on the amateur radio

band it's 4 35. 95 MHz and uh it if you're in Europe you can totally just receive on that band and decode it yourself and because it's amateur radio actually all the documentation how the Tet is encode and stuff is actually public it has to be public because amure radio and so you can totally just um uh Receive it yourself it's not always transmitting over Europe right it's only transmitting when we're commanding it but if you if there's a morning pass and you get tried a few times you can get lucky on it and and

receive it or just text me and I'll activate it for you there's also um there also because it's amateur radio satellite also an amateur radio service on there it has a diggy Peter which is there's actually a format in which you Can send messages up to it yourself and both lay them back down and the cool thing about that is if you're in Berlin and Link it up then and it links back down you can see it everywhere the satellite has a footprint right and so it's like a relay mechanism so actually that you can

send to thank you very much there is another question up again hi hello thanks for your awesome work looks like a cool project um do you Happen to know what kind of real time o is running on bad one I think one or two of my old professors might be involved in that project before they move to verb are you in contact with Professor mono maybe so um I think I know what real time operators you're talking about Ros right yes it is indeed running on Ros yeah okay unfortunately this one no I shouldn't say

that um I'm told it got much better after this oh no actually I'm wrong sorry so the is it running on Boss or something everything after Bea 2 runs on Ros uh Bard one runs on something else Bard one runs on something called tiny boss and that's a custom thing but I think that's also from vsb to be honest I'm not sure but I'm actually not in contact with them because all of this was like before my time right I'm just doing archaeology on these things instead of developing them um if you want me to

I can establish some form of contact with Professor Monty I don't know them but uh text me on fed after and we can talk maybe yeah yeah yeah no problem thanks for your work yeah of course thank you I imagine these complicated emails asking for ah could you tell me something about this ancient technology it's quite helpful actually it sounds like uh microphone number one has a question uh yes about the data retention of the flashship was there a number provided by the Manufacturer so I did the math on this the thing is that um

I think the the the retention of like the ret how much long it keeps the how many cycles and how long it keeps it right so I did the math on this on if we actually need to worry about that within the lifetime of the satellite it's rated for I think one million cycles and 10e retention but the way I understand I'm not an expert on this but the way I understand it the retention is spe for the maximum Temperature and that's 70° and typically in space we're cycling between 20 and minus 10° and so

I found a formula says we should have like a thousand years of flash retention time so I think we'll be fine on that uh fortunately thank you at microphone number three is our next question yeah thank you for the awesome talk it was really good uh my question is is there some security buil into the telary commands to prevent people from up the satellite now that you've have fixed it t Telemetry everything that goes down is unencrypted it has to be unencrypted there's an exemption in the amateur radio standard for um up encrypting of Uplink

Telemetry I can't unfortunately talk about exactly what they do for B one I can I can say in general so ccsds standardizes both Telemetry and Tel commands and there's actually a stand for encrypting um both directions it's Called space data link security and this is basically like transport layer security um so this supports as GCM counter mode and hem I think but I can't tell you exactly what they do on B that one unfortunately thank you addition questions from the internet yeah would it help to fly along the path of the satellite to have some

more time to upload your code I mean let's see how fast is the Airplane moving compared to 7.5 km/ second um so one thing which does help is like putting your ground station in a different place right so tuin also has one at the both the North Pole and no not quite the North Pole but like in swart and near the South Pole and because of this north south rotation actually if the further north your ground station is the more time you have to Uplink and down link but because we also had to make some

ground station Changes to make it compatible again we actually didn't use that ground station um for this ultimately like yeah more Banas more better for what we did now we we didn't do a full software update yet right so we just did this fix and that was possible in like 10 passs or so so for that was fine for long time if you want if you have a better ground station inside that helps right yeah you said something about uh betet well if there's a b one there should be A b 2 what about using

these as proxies so the interesting thing is there's B one and there's two to 13 so this is actually the ground for a very long L of satellites but the sad thing is uh so B one is in a 7 kilm Orit right and the orbit height directly correlates with how long it stays in space and so actually there's only one other bizard left all of the other ones re-entered the atmosphere and burned up last year so beard one will actually be one the First be will become the last beard 2 so um unfortunately we

can't use them for testing anymore but it was helpful because you can compare some some things because the hardware is similar in some cases so for example because we don't have a laboratory model for the sun sensors anymore for for some of the components anymore we can check bizard one of the different bards the historical data and can check if the data is in line with what would expect And stuff and of course I also got um that class trained me to operate Bard 9 which is similar still in construction to Biz 1 so I

had a head start and learning how to operate so it it's helped a little bit but um unfortunately they're gone now we're very sad about that thank you I can imagine that this doesn't happen too often that a satellite is getting lost in space but in case it does do you have anything in mind is there a satellite where you wish That you could like play around with a bit the thing is people don't usually tell you after about their failing spacecraft um there's another one from tuin I want to look atos that which is

not failing but there's another thing ongoing down there if you have one you that's broken and you think might be uh fixable uh do please contact me um I would love to Unfortunately they they're they're pretty hard to come by unfortunately it turns out yeah oh you Don't say yeah it's I don't know that's still a five digit number but they're doing more now so maybe we'll get more I don't know very interesting thank you uh the internet has addition question we have still a little time for the uh questions from the internet uh there

is a question why would you send up a satellite in orbit with does with with with a not working P payload payload it's a good question so um I Mentioned that B1 is the 33 Cube set in space and so the actual payload is not the camera this was a technology demonstration mission to demonstrate um miniaturized components right that you can actually get all these things in such a small form factor and so this is actually the second react Cube at all that flew with um free reaction wheels and so actually the actual payload is

the reaction wheels and this is like a system to point the satellite in the Right direction so the camera is more of a dummy payload which is just there to you know demonstrate that you can have a payload on it but the actual payload is the reaction wheels and those actually did work fine and they spend a lot of time making sure that they do work was a good question thank you uh we're nearly by the end of our time slot um I would like to in to ask you if you would have time afterwards

for a quick Q&A or if you Would be available for other people to contact you with question absolutely then if you have additional questions uh please proceed um and uh please welcome and thank for the interesting talk and the creativity behind that uh piston Miner thank you very much thank you big Applause [Applause] [Music]

38C3 - Hacking yourself a satellite - recovering BEESAT-1