2 July 2019
Back in 2016, I wrote Writing React apps using Bridge.NET - The Dan Way (Part Three) and I talked about trying to tighten up the representation of values in the type system. One of my pet peeves that I talked about was how "no value" is represented in reference types and, in particular, with strings.
As a reminder, I was having a rant about how I hate the uncertainty of wondering "should I expect to get null passed in here / returned from here" and I decided to draw a hard line and say that no, in my code I would never expect a reference type to have a null value - instead I would always use the Optional<T> struct that I included in my NuGet package ProductiveRage.Immutable. This allows me to make it clear when a method may return a null value (because its return type would be something like Optional<PersonDetails>) and it would allow me to make it clear when a method will and won't accept null arguments (it will if the parameter type is Optional<T> and it won't if it's not).
Strings, however, have another "no value" state - when they are blank. If I want to have a method argument whose type indicates "this argument must be a string that is not null AND that is not blank" then we can't communicate that. To address that, my blog post introduced another type; the NonBlankTrimmedString -
public class NonBlankTrimmedString
{
public NonBlankTrimmedString(string value)
{
if (string.IsNullOrWhiteSpace(value))
throw new ArgumentException("Null, blank or whitespace-only value specified");
Value = value.Trim();
}
/// <summary>
/// This will never be null, blank or have any leading or trailing whitespace
/// </summary>
public string Value { get; }
/// <summary>
/// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
/// that requires a string
/// </summary>
public static implicit operator string(NonBlankTrimmedString value)
{
if (value == null)
throw new ArgumentNullException("value");
return value.Value;
}
}
This would allow me to have a method that clearly indicates that it needs a string with a real value - eg.
void DoSomething(NonBlankTrimmedString value);
.. and it could be combined with Optional<T> to define a method whose type signature indicates that it will take a string with a real value OR it will accept a "no value" - eg.
void DoSomething(Optional<NonBlankTrimmedString> value);
This method will not accept a blank string because that's just another state that is not necessary; either you give me a real (non-blank) value or you don't. There is no half-way house of "non-null but still with no value".
As another example, I might want to write a Bridge.React component whose Props type can optionally take an additional class name to render as part of the component - in which case, I might write the class a bit like this:
public sealed class Props
{
public Props(
/* .. other property values, */
Optional<NonBlankTrimmedString> className = new Optional<NonBlankTrimmedString>())
{
// .. other properties set here
ClassName = className;
}
// .. other public properties exposed here
public Optional<NonBlankTrimmedString> ClassName { get; }
}
This is all fine and dandy and, pretty much, it just works. If I want to expand this richer type system so that it's used in API requests / responses as well then I can have Optional<T> and NonBlankTrimmedString types defined in the .NET code that runs on the server as well as in my Bridge project. And if I want to avoid code duplication then I can define the types in a Shared Project that is referenced by both the Bridge project and the server API project.
One downside to this approach, though, is that JSON payloads from API calls are going to be larger if I wrap all of my strings in NonBlankTrimmedString instances. And there will be more work for Bridge's version of Newtonsoft Json.NET to do because it has to parse more data and it has to deserialise more instances of types; for every string, instead of just deserialising a value into a string, it needs to deserialise that string value and then create an instance of a NonBlankTrimmedString to wrap it. If you have any API calls that return 100s or 1000s of strings then this can become a non-negligible cost.
The full .NET version of Newtonsoft Json.NET has some flexibility with how types are serialised to/from JSON. For example, if I wanted to tell the serialiser that NonBlankTrimmedString instances should appear in the JSON as plain strings then I could do so using a JsonConverter (there is sample code in the Newtonsoft website that demonstrates how to do it for the Version type and the principle would be exactly the same for NonBlankTrimmedString - see Custom JsonConverter<T>).
The Bridge version of the library has no support for custom JsonConverters, though, so we may appear to be a bit stuck.. if it weren't for the fact that Bridge has some low-level tricks that we can use to our advantage.
In order to allow C# code to be written that interacts with JavaScript libraries, Bridge has a few escape hatches for the type system that we can use in a careful manner. For example, I could rewrite the Bridge version of NonBlankTrimmedString to look like this:
public class NonBlankTrimmedString
{
protected NonBlankTrimmedString() { }
/// <summary>
/// This will never be null, blank or have any leading or trailing whitespace
/// </summary>
public extern string Value { [Template("{this}")] get; }
/// <summary>
/// Create a NonBlankTrimmedString instance by explicitly casting a string
/// </summary>
public static explicit operator NonBlankTrimmedString(string value)
{
if (value == null)
return null;
value = value.Trim();
if (value == "")
throw new ArgumentException("Can not cast from a blank or whitespace-only string");
return Script.Write<NonBlankTrimmedString>("value");
}
/// <summary>
/// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
/// that requires a string
/// </summary>
[Template("{value}")]
public extern static implicit operator string(NonBlankTrimmedString value);
}
This changes things up a bit. Now there is no public constructor and the only way to get a NonBlankTrimmedString instance from a plain string is to explicitly cast to it - eg.
var x = (NonBlankTrimmedString)"hi!";
If the source string is blank or whitespace-only then attempting to cast it to a NonBlankTrimmedString will result in an exception being thrown.
What's interesting about this class is that it exists only to provide type information to the C# compiler - there will never be an instance of NonBlankTrimmedString alive runtime in JavaScript. The reason for this is that the explicit cast performs some validation but then, at runtime, returns the string instance directly back; it doesn't wrap it in an instance of a NonBlankTrimmedString class. Similarly, when the "Value" property is requested in C# code, this is translated into JS as a direct reference to "this" (which we know is a plain string). This is sounding complicated as I write this, so let me try to make it clear with an example!
The following C# code:
// Start with a plain string
var source = "Hi!";
// Create a NonBlankTrimmed by explicitly casting the string
var x = (NonBlankTrimmedString)source;
// Write the value of the NonBlankTrimmedString to the console
Console.WriteLine(x.Value);
.. is translated into this JS:
// Start with a plain string
var source = "Hi!";
// Create a NonBlankTrimmed by explicitly casting the string
var x = Demo.NonBlankTrimmedString.op_Explicit(source);
// Write the value of the NonBlankTrimmedString to the console
System.Console.WriteLine(x);
The reference "x" in the JS runtime is actually just a string (and so the C# "x.Value" is translated into simply "x") and the explicit operator (the method call "Demo.NonBlankTrimmedString.op_Explicit") performs some validation but then (if the validation passes) returns the string right back but claims (for the benefit of the C# compiler and type system) that it is now a NonBlankTrimmedString.
This has a couple of benefits - now, plain string values that appear in JSON can be deserialised into NonBlankTrimmedString instances by Bridge (while the Bridge version of Json.NET doesn't support type converters, it does support deserialising types using implicit or explicit operators - so, here, it would see a string in the JSON and see that the target type was a NonBlankTrimmedString and it would use NonBlankTrimmedString's explicit operator to instantiate the target type), so the JSON returned from the server can be cleaner. And it means that the JS runtime doesn't have to actually create instances of NonBlankTrimmedString to wrap those strings up in, which makes the life of the garbage collector easier (again, may be important if you have API responses that need to return 1000s of NonBlankTrimmedString).
This is an interesting concept that I'm referring to as a "type alias" - a type that exists only for the compiler and that doesn't affect the runtime. The phrase "type alias" exists in TypeScript and in F# (and in other languages, I'm sure) but I think that it means something slightly different there.. which may mean that I've chosen a confusing name for this C# / Bridge.NET concept! In TypeScript and F#, I don't believe that they allow the level of compiler validation that I'm talking about - certainly in TypeScript, type aliases are more of a convenience that allow you say something like:
type Vector = number[];
type Vectors = Vector[];
.. so that you can then write a method signature that looks like this:
function process(data: Vectors) {
// ..
}
.. instead of:
function process(data: number[][]) {
// ..
}
.. but the two are identical. TypeScript "type aliases" make things more flexible, not more constrained. To make that clearer, if you wrote:
type CustomerID = number;
function process(id: CustomerID) {
// ..
}
.. then you could still call:
process(1); // Passing a plain number into a method whose signature specifies type CustomerID
In other words, the TypeScript alias means "anywhere that you see CustomerID, you can pass in a 'number'". This is the opposite of what I want, I want to be able to have methods that specify that they want a NonBlankTrimmedString and not just any old string.
I go into this in a little more detail in the section "Type aliases in other languages" at the end of this blog post. My point here was that maybe "type alias" is not the best phrase to use and maybe I'll revisit this in the future.
For now, though, let's get back to the NonBlankTrimmedString definition that I've proposed because it has some downsides, as well. As the type only exists at compile time and not at runtime, if I try to query the type of a NonBlankTrimmedString instance at runtime then it will report that it is a "System.String" - this is to be expected, since part of the benefit of this approach is that no additional instances are required other than the plain string itself - but if you were wanted to do some crazy reflection for some reason then it might catch you off guard.
Another downside is that if I wanted to create specialised versions of NonBlankTrimmedString then I have to duplicate some code. For example, I might want to strongly type my entity IDs and define them as classes derived from NonBlankTrimmedString. With the version of NonBlankTrimmedString from my 2016 blog post, this would be as simple as this:
// If NonBlankTrimmedString is a regular class then creating derived types is easy as this
public class OrderID : NonBlankTrimmedString
{
public OrderID(string value) : base(value) { }
}
.. but with this "type alias" approach, it becomes more verbose -
// The explicit operator needs to be reimplemented for each derived type with the type alias
// alias approach shown earlier :(
public class ClassName : NonBlankTrimmedString
{
protected ClassName() { }
public static explicit operator ClassName(string value)
{
if (value == null)
return null;
value = value.Trim();
if (value == "")
throw new ArgumentException("Can not cast from a blank or whitespace-only string");
return Script.Write<ClassName>("value");
}
}
However, we could make this a little simpler by changing the NonBlankTrimmedString type definition to this:
public class NonBlankTrimmedString
{
protected NonBlankTrimmedString() { }
/// <summary>
/// This will never be null, blank or have any leading or trailing whitespace
/// </summary>
public extern string Value { [Template("{this}")] get; }
/// <summary>
/// Create a NonBlankTrimmedString instance by explicitly casting a string
/// </summary>
public static explicit operator NonBlankTrimmedString(string value)
=> Wrap<NonBlankTrimmedString>(value);
/// <summary>
/// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
/// that requires a string
/// </summary>
[Template("{value}")]
public extern static implicit operator string(NonBlankTrimmedString value);
protected static T Wrap<T>(string value) where T : NonBlankTrimmedString
{
if (value == null)
return null;
value = value.Trim();
if (value == "")
throw new ArgumentException("Can not cast from a blank or whitespace-only string");
return Script.Write<T>("value");
}
}
.. and then derived types would look like this:
public class OrderID : NonBlankTrimmedString
{
protected OrderID() { }
public static explicit operator OrderID(string value) => Wrap<OrderID>(value);
}
Another use case where this sort of approach seemed interesting was when I was writing some client-side code that received data in the form of arrays and then did some clever calculations and drew some pretty graphs. The API response data was 10s of 1000s of arrays, where each array was 100 floating point numbers. The calculation logic took those arrays and passed them through a bunch of methods to come up with the results but I got myself in a bit of a muddle when there were one or two places that had to manipulate a subset of the data and I realised that I was confusing myself as to whether the data should be altered in place or whether local copies of those parts of the data should be taken and then changed. To make the code easier to follow, I wanted those methods to take local copies to make the changes, rather than mutating them in-place and risking messing up calculations performed on the data later in the pipeline.
What I really wanted was for those methods to have type signatures that would either take an immutable data type or a readonly data type. Immutable is the ideal because it means that not only can the receiving methods not change the data but nothing can change the data. Having readonly types on the method signatures means that the methods can't change the data but it's still technically possible for the caller to change the data. To try to illustrate this, I'll use the ReadOnlyCollection<T> type from .NET in an example:
public static void Main()
{
var items = new List<int> { 0, 1, 2, 3 };
var readOnlyItems = items.AsReadOnly();
DoSomething(
readOnlyItems,
halfwayPointCallback: () => items.RemoveAt(0)
);
}
static void DoSomething(ReadOnlyCollection<int> readOnlyItems, Action halfwayPointCallback)
{
Console.WriteLine("Number of readonlyItems: " + readOnlyItems.Count);
halfwayPointCallback();
Console.WriteLine("Number of readonlyItems: " + readOnlyItems.Count);
}
Here, the "Main" method declares a mutable list and then it create a readonly wrapper around it. The readonly wrapper is passed into the "DoSomething" method and this means "DoSomething" can not directly alter that list. However, it's still possible for the "Main" method to change the underlying list while "DoSomething" is running.
In practice, this is not something that I find commonly happens. As such, while I would prefer immutable structures at all times (because then "Main" couldn't change the contents of the list while "DoSomething" is working on it), being able to wrap the data in a readonly structure is still a significant improvement.
So, some of the more obvious options available to me were:
I went with the last one because I was all excited about experimenting with "Bridge.NET type aliases" and I wanted to see how well they could work! (In reality, the fifth option was also a good one and some of the others would also be perfectly fine for smaller data sets.. to be honest, there is a chance that they wouldn't have made too much difference even with the data that I was looking at but, again, sometimes you need to make opportunity to experiment! :)
public sealed class ReadOnlyArray<T> : IEnumerable<T>
{
[Template("{data}")]
public extern ReadOnlyArray(T[] data);
[External] // Required due to https://github.com/bridgedotnet/Bridge/issues/4015
public extern T this[int index] { [Template("{this}[{index}]")] get; }
public extern int Length { [Template("length")] get; }
[External]
public extern IEnumerator<T> GetEnumerator();
[External]
extern IEnumerator IEnumerable.GetEnumerator();
[Template("{value}")]
public extern static implicit operator ReadOnlyArray<T>(T[] value);
}
The structure of this class is similar in some ways to that of the NonBlankTrimmedString. Unlike that class, there is no validation that is required - I only want to provide access to an array in a limited manner and so it's fine to expose a public constructor (as opposed to the NonBlankTrimmedString, where it's important to check that the value is neither null nor blank nor whitespace-only and the [Template] attribute on the constructor doesn't easily allow for any validation).
Even though the constructor may be used on this class, there is still an operator to change an array into a ReadOnlyArray so that the deserialisation process is able to read an array of items into a ReadOnlyArray instance. I've chosen to use an implicit operator (rather than en explicit operator) here because there is no validation to perform - the NonBlankTrimmedString has an explicit operator because thatdoes perform some validation and so it's a casting action that could fail and so I want it to be explicit in code.
As with the NonBlankTrimmedString, this type will exist only at compile time and the compiled JavaScript will always be operating directly against the original array. As far as the JS code is aware, there is no wrapper class involved at all. The following C# -
var values = new[] { 1, 2, 3 };
Console.WriteLine(values.Length);
var readOnlyValuesCtor = new ReadOnlyArray<int>(values);
Console.WriteLine(readOnlyValuesCtor.Length);
ReadOnlyArray<int> readOnlyValuesCast = values;
Console.WriteLine(readOnlyValuesCast.Length);
.. is translated into this JS:
var values = System.Array.init([1, 2, 3], System.Int32);
System.Console.WriteLine(values.length);
var readOnlyValuesCtor = values;
System.Console.WriteLine(readOnlyValuesCtor.length);
var readOnlyValuesCast = values;
System.Console.WriteLine(readOnlyValuesCast.length);
Whether the ReadOnlyArray<int> is created by calling its constructor or by an implicit cast in the C# code, the JS is unaware of any change required of the reference and continues to operate on the original array. This is the "free" part of this approach - there is no runtime cost in terms of type conversions or additional references.
The other members of the class need a little more explanation, though. The indexer should be implemented just like the "Length" property, by having an extern property that has a getter with a [Template] attribute on it. However, there is a bug in the Bridge compiler that necessitate an additional [External] attribute be added to the property. Not the end of the world and I'm sure that the Bridge Team will fix it in a future version of the compiler.
The "GetEnumerator" methods require a tiny bit more explanation. In order for the class to implement IEnumerable<T>, these methods must be present. But we don't actually have to implement them ourselves. Whenever Bridge encounters a "foreach" in the source C# code, it translates it into JS that calls "GetEnumerator" and then steps through each value. For example, this C# code:
foreach (var value in readOnlyValuesCtor)
Console.WriteLine(value);
.. becomes this JS:
$t = Bridge.getEnumerator(readOnlyValuesCtor);
try {
while ($t.moveNext()) {
var value = $t.Current;
System.Console.WriteLine(value);
}
} finally {
if (Bridge.is($t, System.IDisposable)) {
$t.System$IDisposable$Dispose();
}
}
Because Bridge needs to support enumerating over arrays, the function "Bridge.getEnumerator" knows what to do if it is given an array reference. And since a ReadOnlyArray is an array reference at runtime, we don't have to do anything special - we don't have to provide a GetEnumerator implementation.
And there we go! As I explained above, I originally encountered this problem when passing an array into a complicated calculation process but this type could also be used for deserialising JSON into a richer type model, just like the NonBlankTrimmedString earlier - again, without any overhead in doing so (no instances of wrapper types will be present runtime and there will be no additional references for the garbage collector to track).
I was wracking my brains about whether it would be possible to do something similar with C# running in a .NET environment and I couldn't think of anything. People sometimes think "structs!" when trying to concoct ways to avoid adding references that the garbage collector needs to track but structs are only immune to this if they don't contain any object references within their fields and properties (and there are other edge cases besides this but they're not important right now).
At the end of the day, this "type alias" concept might be a bit of a niche technique and it might even be a case of me playing around, more than it being something that you might use in production.. but I thought that it was interesting nonetheless. And it has made me wish, again, that C# had support for something like this - I've written code before that defines all variety of strongly typed IDs (strings) and Keys (integers) to avoid passing the wrong type of value into the wrong place but it's always felt cumbersome (it's felt worth the effort but that didn't stop wishing me that it was less effort).
I've linked above to an article Using strongly-typed entity IDs to avoid primitive obsession, which is excellent and eloquently expresses some of my thoughts, but I thought that I'd add a summary in here as well (which also gives me an opportunity to go into more detail about the options in TypeScript and F#).
I'll start with an anecdote to set the scene. In a company that I used to work at, we had systems that would retrieve and render different data for different language options. Sometimes data would vary only by language ("English", "French", etc..) but sometimes it would be more specific and vary by language culture (eg. "English - United Kingdom", "English - United States", etc..). An older version of the system would pass around int values for the language or language culture keys. So there might be a method such as:
private string GetTranslatedName(int languageKey)
A problem that occurred over and over again is that language keys and language culture keys would get mixed up in the code base - in other words, it was quite common for someone to accidentally pass a language key into a method where a language culture key was expected (this situation was not helped by the fact that much of the developer testing was done in English and the language key and language culture key values in many of the databases were both 1 for English / English UK). Something that I was very keen to get into a new version of the system was to introduce "strongly typed keys" so that this sort of accident could no longer occur. The method's signature would be changed to something like this:
private string GetTranslatedName(LanguageKey languageKey)
.. and we would not describe language or language culture keys as ints in the code base. They would always be either an instance of LanguageKey or LanguageCultureKey - this way, if you attempted to pass a key of the wrong type into a method then you would get a compile error.
The downside is that each key type had to be defined as its own struct, with the following (quite verbose) structure:
public struct LanguageKey : IEquatable<LanguageKey>
{
public LanguageKey(int value) => Value = value;
public int Value { get; }
public bool Equals(LanguageKey other) => Value.Equals(other.Value);
public override bool Equals(object obj) => (obj is LanguageKey key) && (key.Value == Value);
public override int GetHashCode() => Value;
public static bool operator ==(LanguageKey x, LanguageKey y) => x.Value == y.Value;
public static bool operator !=(LanguageKey x, LanguageKey y) => !(x == y);
public static explicit operator LanguageKey(int value) => new LanguageKey(value);
}
Really, though, that is the only downside. As the strongly typed keys are structs without any reference properties or fields, there is no additional work for the garbage collector and there is no memory overhead vs tracking a simple int. But it does still feel a little arduous to have to have these definitions in the code base, particularly when the equivalent F# code looks like this:
[<Struct>] type LanguageKey = LanguageKey of int
It's worth noting that this is not actually referred to as a "type alias" in F#; this is a "single case union type". There is a concept called a "type alias" in F# that looks like this:
type LanguageKey = int
.. but that code simply says "allow me to use the word 'LanguageKey' anywhere in place of int" - eg. if I have the LanguageKey type alias specified as a method argument type in F# method, like this:
let getTranslatedName (language: LanguageKey) =
// (Real work to retrieve translated name would go here but we'll
// just return the string "Whatever" for the sake of this example)
"Whatever"
.. then the compiler would allow me to pass an int into that method -
// A LanguageKey type alias lets me pass any old int into the method - rubbish!
let name = getTranslatedName 123
.. and that's exactly what I wanted to avoid!
On the other hand, if the type LanguageKey was a "single case union type" then the code above would not compile:
// error FS0001: This expression was expected to have type 'LanguageKey' but here has type 'int'
let name = getTranslatedName 123
// This DOES compile because the types match
let key = LanguageKey 123
let name = getTranslatedName 123
.. and that's exactly what I did want!
(TypeScript's type aliases are like F#'s type aliases - they are more of a convenience and do not add the sort of type checking that I want)
Things get a bit more awkward if we want to deal with reference types, such as strings, because we could create a C# class similar to LanguageKey (or we could create an F# single case union type) but that would introduce a new instance of a type that must be tracked by the garbage collector - every strongly typed ID involves two references; the underlying string value and the strongly typed wrapper. Much of the time, that's no problem - I've had the odd issue with the .NET GC in the past but, on the whole, it's an amazing and reliable tool.. but because I have had these problems before, it makes me more aware of the trade-off when I introduce wrappers like this.
I'm convinced that using strongly typed IDs is the right thing to do in 99% of cases because it improves code quality and can eradicate a class of real-world mistake. But the concept became even more interesting to me as it appeared possible to introduce a form of type alias into Bridge.NET code that enables those compile time checks but with zero runtime cost. Granted, the type erasure that occurs means that runtime type checking is not possible (the Bridge code can not differentiate between a string or a NonBlankTrimmedString or a type that is derived from NonBlankTrimmedString) but the main driver for me was to improve compile time checking and so that wasn't a problem for me. Maybe it would be a problem in other scenarios, in which case these Bridge.NET "type aliases" might not be appropriate.
Posted at 21:59
Dan is a big geek who likes making stuff with computers! He can be quite outspoken so clearly needs a blog :)
In the last few minutes he seems to have taken to referring to himself in the third person. He's quite enjoying it.