Comment by ๐Ÿƒ skyjake

Re: "Scriptonite Lagrange Workarounds"

In: u/bluesman

Note that there are some differences in what is shown in the UI vs. what gets sent to the server. There is a setting for toggling the display of decoded URIs. This is enabled for UX reasons to make it more difficult to obfuscate the path.

Checking the RFC, I can see that [, ], and " should always be in encoded form in the path (Lagrange at least seems to currently mishandle "), but I'm not immediately seeing a reason why I should not send &, =, and + as decoded in the request, inside path segments. Those are 'sub-delims' in the RFC and those are allowed in the path without encoding. The serverside behavior should be equivalent regardless of encoding.

(To give some context, when Lagrange resolves link URIs found on pages it also converts them to a canonical form where only the reserved characters remain encoded. This ensures that URI comparisons will work correctly regardless of encoded characters being present or not. When making a request, non-ASCII characters and all reserved characters are encoded.)

๐Ÿƒ skyjake [sysop]

Aug 05 ยท 2 months ago

10 Later Comments โ†“

๐ŸฆŽ bluesman [OP] ยท Aug 05 at 19:50:

I need to test but it's entirely possible that &, =, and + can be left decoded now that semicolons work. Lagrange is decoding [ and ] before sending. In Zoe, I had an auto prompt line with something like "Enter [Y]es or [N]o" in the URL. This is what the server gets from Alhena and Lagrange.

ALHENA
gemini://pinkytoehold/scriptonite/gemini%3A%2F%2Fpinkytoehold%2F3DTTT.gmi;c:Play%20in%20color%3F%20Enter%20%5BY%5Des%20or%20%5BN%5Do.;pl:Play%20as%20X%20or%20O%3F%20X%20goes%20first.%20Enter%20%5BX%5D%20or%20%5BO%5D.;df:Enter%20difficulty%20level%20(1%20-%206)

LAGRANGE
gemini://pinkytoehold/scriptonite/gemini%3A%2F%2Fpinkytoehold%2F3DTTT.gmi;c:Play%20in%20color%3F%20Enter%20[Y]es%20or%20[N]o.;pl:Play%20as%20X%20or%20O%3F%20X%20goes%20first.%20Enter%20[X]%20or%20[O].;df:Enter%20difficulty%20level%20(1%20-%206)

Creating a URI in Java from the second url throws a URISyntaxException so it won't work unless I sanitize the input (which is okay but not strictly correct).

I'll do more testing on &, =, and +. A quick test suggests decoding those is fine but I'll let you know. If the spec says it's okay but it still doesn't work, then it's my bug.

๐ŸฆŽ bluesman [OP] ยท Aug 05 at 20:27:

The + is still an issue. If running a script that prompts for code and the user enters "2+2" in the type 10 dialog, Lagrange correctly sends 2%2B2 in the query. When the server then includes that in a redirect url, Lagrange converts it to 2+2. The server gets that redirect and since it assumes everything is already encoded, 2+2 becomes 2%202 or "2 2".

I can live with keeping the base64 encoding scheme so this works in Lagrange (or maybe I can sanitize for + specifically). I'm not sure though that the RFC suggests I'm doing something wrong. You say those characters are allowed in the path without encoding but how are they meant to be handled when they are encoded? If they MUST be decoded then my scheme is fundamentally broken to begin with and base64 encoding is mandatory.

๐ŸฆŽ bluesman [OP] ยท Aug 06 at 00:56:

My reading of RFC 3986 section 2.4 is that url data (in path parameters) should not be decoded or re-encoded.

โ€” https://datatracker.ietf.org/doc/html/rfc3986#section-2.4

๐Ÿƒ skyjake [...] ยท Aug 06 at 04:17:

I was also taking a closer look at the RFC, and this stands out to me (from section 2.4):

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters.

The key word being "scheme-specific". The Gemini URI scheme does not specify semantics for parameters in the path component. In other words, the path component does not further divide into path and parameter subcomponents in Gemini; it's just a path. Therefore, the client can decode the sub-delim characters if it wants.

I will ensure that [, ], and " always remain encoded in the path component, though, to adhere to the RFC.

๐ŸฆŽ bluesman [OP] ยท Aug 06 at 06:09:

Path parameters may not be mentioned in the Gemini spec but I think it's reasonable to assume they would be supported as defined by the RFC. The Gemini spec specifically excludes fragments and userinfo but makes no mention of path/matrix parameters.

I find your choice perplexing given the fact that support basically requires a client to do nothing. I think it's an odd decision that may limit future development (and not just my derided project).

I'll continue to base64 the Scriptonite segment on redirect so it's "just a path".

๐Ÿƒ skyjake [...] ยท Aug 06 at 10:47:

Gemini is intended to be non-extensible, so I'm cautious to accidentally enable behaviors that are not in line with the specification. I am therefore inclined to make it more difficult to make use of obscure features like path subcomponents.

However, I noticed this in the RFC section 2.2:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent
[...]
characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

This pretty explicitly says not to decode or encode the sub-delims for normalization purposes, so I will adhere to that in the future, which should solve the underlying issue.

๐Ÿƒ skyjake [...] ยท Aug 06 at 13:16:

@bluesman Are you able to compile Lagrange locally? Would be interesting to know if these changes are sufficient to fix the remaining issues (dev branch):

โ€” https://github.com/skyjake/lagrange/commit/7284f5ee591f781eb911f96e87630c09a8ec64d3

๐ŸฆŽ bluesman [OP] ยท Aug 06 at 13:52:

Looking at GitHub, it appears my best bet would be firing up Ubuntu in VirtualBox or use the Pi 5. (My MacOS laptop is ridiculously constrained when it comes to storage). I can certainly give it a shot when I have some time.

If we could figure out another way to share the binary (mac, windows or linux), I could probably get you an answer right away.

๐Ÿƒ skyjake [...] ยท Aug 06 at 14:45:

Here is a Linux x86_64 AppImage for testing:

โ€” https://etc.skyjake.fi/lagrange/Lagrange-1.18.7_testing-x86_64.AppImage

๐ŸฆŽ bluesman [OP] ยท Aug 06 at 15:36:

I had to install fuse but then it ran fine on Windows Subsystem for Linux - much quicker than firing up Ubuntu in VirtualBox.

Every issue I was having with the auto-prompt system seems to be fixed. I was a little worried when I saw my "2 + 2" example become "2 %2B 2" in the address bar but it works fine and the copied link is percent-encoded.

Thanks for looking at this and apologies for any consternation.

Original Post

๐ŸฆŽ bluesman

Scriptonite Lagrange Workarounds โ€” I put in workarounds for running Scriptonite in Lagrange. The issue is that Lagrange doesn't preserve certain percent-encoded characters in urls (whether on a page or in a redirect). The fix is to sanitize Scriptonite links coming in and base64 on redirect. The one thing I can't workaround is semicolons. If you want to use them in a pre-populated variable or auto-prompt, the Scriptonite segment must be base64 encoded in advance. That should be rare but there...

๐Ÿ’ฌ 13 comments ยท Aug 04 ยท 2 months ago


Source