Implement syscall user memory validation #30

Closed
gk1623 wants to merge 0 commits from user-memory into master
gk1623 commented 2024-11-08 01:32:49 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

In addition to the checks previously implemented, we also needed to check the memory is accessible using either a page check, or a page fault check.

I opted for the page fault check simply because (as per spec) it has better performance. This stems from relying on the CPU MMU to interrupt rather than checking via the page table.

Changes:

  • validate with size sizeof (unsigned) instead of 1 when getting syscall_number.
  • use uint32_t explicitly when retrieving arguments (instead of uintptr_t), since spec mentions 32-bit words.
  • ensure syscall args loop doesn't go over our args buffer (of length 3).
  • implement get_user and put_user checks into validate_user_pointer, but make the write check optional.
  • make validate_user_pointer NOT call thread_exit () itself, because once we implement other syscalls, according to the spec we will endup with some mallocs and locks which will need to be freed before thread_exit () is called, but after the pointer is found to be invalid using validate_user_pointer.

TODO:

  • How do we handle memory ranges (buffers, strings, ...)? Literally just a for loop checking every pointer within, or what?
In addition to the checks previously implemented, we also needed to check the memory is accessible using either a page check, or a page fault check. I opted for the page fault check simply because (as per spec) it has better performance. This stems from relying on the CPU MMU to interrupt rather than checking via the page table. Changes: - validate with size `sizeof (unsigned)` instead of `1` when getting `syscall_number`. - use `uint32_t` explicitly when retrieving arguments (instead of `uintptr_t`), since spec mentions 32-bit words. - ensure syscall args loop doesn't go over our args buffer (of length 3). - implement `get_user` and `put_user` checks into `validate_user_pointer`, but make the write check optional. - make `validate_user_pointer` NOT call `thread_exit ()` itself, because once we implement other syscalls, according to the spec we will endup with some mallocs and locks which will need to be freed before `thread_exit ()` is called, but after the pointer is found to be invalid using `validate_user_pointer`. TODO: - [x] How do we handle memory ranges (buffers, strings, ...)? Literally just a for loop checking every pointer within, or what?
gk1623 commented 2024-11-08 01:41:33 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

requested review from @td1223

requested review from @td1223
sb3923 commented 2024-11-08 01:52:52 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

I see this is related to system calls. I will have a look tomorrow and reply accordingly.

I see this is related to system calls. I will have a look tomorrow and reply accordingly.
gk1623 commented 2024-11-08 09:12:35 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

changed the description

changed the description
gk1623 commented 2024-11-08 09:35:00 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

changed the description

changed the description
gk1623 commented 2024-11-08 12:00:03 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

changed the description

changed the description
gk1623 commented 2024-11-08 12:03:08 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

changed the description

changed the description
gk1623 commented 2024-11-08 13:00:05 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

I realise now this implementation doesn't handle ranges very well (e.g. buffers, strings). It feels like we have to do a for-loop for those, but that just feels wrong... what do you think?

I realise now this implementation doesn't handle ranges very well (e.g. buffers, strings). It feels like we have to do a for-loop for those, but that just feels wrong... what do you think?
gk1623 commented 2024-11-08 13:06:15 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

changed the description

changed the description
sb3923 commented 2024-11-08 13:25:55 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

Hi Gleb,

Thanks for the comments! Really appreciated.

Yes, we forgot to check that the user address does map to a kernel address. This will be added in the checks in the next commits.

I am not sure about the page fault check to be honest. Not only does it complicate reading the code but it feels (at least to me) like very bad practice. I will however give it a thought before the final PR of system-calls.

As for the changes:

  1. While it does not matter which one we use, I agree with you that it needs to be changed. I however suggest we use uintptr_t to match the rest of the code.
  2. The spec does mention that PintOS is always going to run in a 32-bit architecture, but so the two are equivalent really. I opted for uintptr_t for safety: uintptr_t was designed to always be able to carry a ptr without loss of information and also unsigned/signed ints. I chatted with Mark about this, and he said he's fine with both.
  3. Based on the current values it shouldn't. But I think it will be better if I declare a variable holding the value 3 as it does feel like a magic number. But adding another conditional feels a bit unnecessary.
  4. I did have a look at validate_user_ptr and you're right, it does require some work. I will thoroughly think of a better implementation and commit it this weekend. If they deem necessary, I will add said functions.
  5. My thinking was that we will implement things the simplest way possible. Changing functions only when necessary. At this point, (or FWIW, ever), I don't think it is necessary. We can simply check the pointers before acquiring the locks. But again, If we find while implementing the file system system calls that we may run into such situation or when we need extra flags, the function will be edited accordingly.

As for your last question, the current way would be (assume validate_user_ptr does its job correctly) is to just call validate_user_ptr on the size required of the buffer / string.

Again, many thanks for your comments. I will fix / consider all these points in the final PR of system calls.

LMK what you think.

Hi Gleb, Thanks for the comments! Really appreciated. Yes, we forgot to check that the user address does map to a kernel address. This will be added in the checks in the next commits. I am not sure about the page fault check to be honest. Not only does it complicate reading the code but it feels (at least to me) like very bad practice. I will however give it a thought before the final PR of system-calls. As for the changes: 1. While it does not matter which one we use, I agree with you that it needs to be changed. I however suggest we use uintptr_t to match the rest of the code. 2. The spec does mention that PintOS is always going to run in a 32-bit architecture, but so the two are equivalent really. I opted for uintptr_t for safety: uintptr_t was designed to always be able to carry a ptr without loss of information and also unsigned/signed ints. I chatted with Mark about this, and he said he's fine with both. 3. Based on the current values it shouldn't. But I think it will be better if I declare a variable holding the value 3 as it does feel like a magic number. But adding another conditional feels a bit unnecessary. 4. I did have a look at validate_user_ptr and you're right, it does require some work. I will thoroughly think of a better implementation and commit it this weekend. If they deem necessary, I will add said functions. 5. My thinking was that we will implement things the simplest way possible. Changing functions only when necessary. At this point, (or FWIW, ever), I don't think it is necessary. We can simply check the pointers before acquiring the locks. But again, If we find while implementing the file system system calls that we may run into such situation or when we need extra flags, the function will be edited accordingly. As for your last question, the current way would be (assume validate_user_ptr does its job correctly) is to just call validate_user_ptr on the size required of the buffer / string. Again, many thanks for your comments. I will fix / consider all these points in the final PR of system calls. LMK what you think.
gk1623 commented 2024-11-08 18:24:08 +00:00 (Migrated from gitlab.doc.ic.ac.uk)
  1. And 2. Alright, will implement.
  2. This is an arbitrary user call, we should handle everything, even if the user decides to pass 100 arguments for some reason.
  3. According to spec it’s either checking the a page is allocated at that address, or (they say more difficult but better) is this implementation with get_user and put_user.
  4. Alr sounds good.
  5. Yeah the call is correct, but the function itself is incorrect. It doesn’t check that ptr is allocated at all, let alone all the way up to ptr+size.
1. And 2. Alright, will implement. 3. This is an arbitrary user call, we should handle everything, even if the user decides to pass 100 arguments for some reason. 4. According to spec it’s either checking the a page is allocated at that address, or (they say more difficult but better) is this implementation with get_user and put_user. 5. Alr sounds good. 6. Yeah the call is correct, but the function itself is incorrect. It doesn’t check that ptr is allocated at all, let alone all the way up to ptr+size.
sb3923 commented 2024-11-08 19:07:55 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

Sorry for being unclear. I meant that we did indeed forget the check that the user address is mapped (page_get_page != NULL). We have just added this into the branch.

As it stands, system-calls passes all systems calls tests!

For 1&2, please don't worry about it I will implement it myself later on but thanks!

But yes, validate_user_ptr will have to be changed definitely. I will write another implementation into the file this weekend along with the system call PR.

For (2), I am not sure what do you mean to be honest. If the user passes 100 arguments that should be fine. We will only read how many 32-bits uints we need based on the number. This cannot lead to any problems because 1. syscall number is guaranteed to be correct by this point 2. the memory must be valid and have at least these arguments (assuming validate_user_ptr works). So the case when more arguments are handled is implicitly handled by just ignoring these arguments (which is easily testable by adding any number of args to any of the tests and verifying they do pass which they do). Did you mean the thread should be killed in that case? It feels very weird to me if I am being honest. The stack may have other things for subsequent system calls.

(5) Yes as I said I will have a look at the function later and reimplement. I will make the reviewer of the PR for the fix.

Thx.

Sorry for being unclear. I meant that we did indeed forget the check that the user address is mapped (page_get_page != NULL). We have just added this into the branch. As it stands, system-calls passes all systems calls tests! For 1&2, please don't worry about it I will implement it myself later on but thanks! But yes, validate_user_ptr will have to be changed definitely. I will write another implementation into the file this weekend along with the system call PR. For (2), I am not sure what do you mean to be honest. If the user passes 100 arguments that should be fine. We will only read how many 32-bits uints we need based on the number. This cannot lead to any problems because 1. syscall number is guaranteed to be correct by this point 2. the memory must be valid and have at least these arguments (assuming validate_user_ptr works). So the case when more arguments are handled is implicitly handled by just ignoring these arguments (which is easily testable by adding any number of args to any of the tests and verifying they do pass which they do). Did you mean the thread should be killed in that case? It feels very weird to me if I am being honest. The stack may have other things for subsequent system calls. (5) Yes as I said I will have a look at the function later and reimplement. I will make the reviewer of the PR for the fix. Thx.
gk1623 commented 2024-11-09 01:19:16 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

Ah okay I’ll keep this MR on hold then, we can discuss whether there’s any point using the page fault method later.

About the arguments - yes the memory is valid, BUT we are copying it to the args[3] array, which is obviously limited to 3 arguments. So with 4+ arguments args[i] eventually overflows the array into whatever.

I’m not saying we should fail these cases, but simply that we should ONLY save up to 3 arguments to the args[3] array (as in my implementation).

Ah okay I’ll keep this MR on hold then, we can discuss whether there’s any point using the page fault method later. About the arguments - yes the memory is valid, BUT we are copying it to the args[3] array, which is obviously limited to 3 arguments. So with 4+ arguments args[i] eventually overflows the array into whatever. I’m not saying we should fail these cases, but simply that we should ONLY save up to 3 arguments to the args[3] array (as in my implementation).
gk1623 commented 2024-11-12 15:00:30 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

added 54 commits

  • c771c8f2...7d9900c6 - 51 commits from branch master
  • 83e044cf - Let kernel handle its own page faults
  • 44f6a851 - Add get_user and put_user provided by spec.
  • 9a6abab9 - Check access to user memory using page fault method (via get_user and put_user).

Compare with previous version

added 54 commits <ul><li>c771c8f2...7d9900c6 - 51 commits from branch <code>master</code></li><li>83e044cf - Let kernel handle its own page faults</li><li>44f6a851 - Add get_user and put_user provided by spec.</li><li>9a6abab9 - Check access to user memory using page fault method (via get_user and put_user).</li></ul> [Compare with previous version](/lab2425_autumn/pintos_22/-/merge_requests/30/diffs?diff_id=138696&start_sha=c771c8f2504ca56eb638b4ec7d29d7812ff6d8da)
gk1623 commented 2024-11-12 15:34:53 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

added 1 commit

  • cf4bf90c - Implement user pointer checking for C strings

Compare with previous version

added 1 commit <ul><li>cf4bf90c - Implement user pointer checking for C strings</li></ul> [Compare with previous version](/lab2425_autumn/pintos_22/-/merge_requests/30/diffs?diff_id=138713&start_sha=9a6abab95ed569c98cb7e0cf87b0428ee023de71)
gk1623 commented 2024-11-12 15:35:55 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

requested review from @sb3923 and removed review request for @td1223

requested review from @sb3923 and removed review request for @td1223
gk1623 commented 2024-11-12 15:36:47 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

resolved all threads

resolved all threads
gk1623 commented 2024-11-12 15:37:03 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

marked the checklist item How do we handle memory ranges (buffers, strings, ...)? Literally just a for loop checking every pointer within, or what? as completed

marked the checklist item **How do we handle memory ranges (buffers, strings, ...)? Literally just a for loop checking every pointer within, or what?** as completed
gk1623 commented 2024-11-12 15:48:28 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

added 1 commit

  • 59e7a64f - Only check user pages rather than all bytes in-between, for known-size pointers

Compare with previous version

added 1 commit <ul><li>59e7a64f - Only check user pages rather than all bytes in-between, for known-size pointers</li></ul> [Compare with previous version](/lab2425_autumn/pintos_22/-/merge_requests/30/diffs?diff_id=138719&start_sha=cf4bf90cbb06eaa90961bdeab41f83a99666a90a)
gk1623 commented 2024-11-13 22:12:07 +00:00 (Migrated from gitlab.doc.ic.ac.uk)

This isn't needed anymore in favour of the first method, which is a bit cleaner exception-handling wise.

This isn't needed anymore in favour of the first method, which is a bit cleaner exception-handling wise.
gk1623 (Migrated from gitlab.doc.ic.ac.uk) closed this pull request 2024-11-13 22:12:11 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Imperial-MEng/pintos_22#30
No description provided.