Can you improve the prompting for the system such that the pointers are contextual to what the system can pin point through its vision capabilities, for example if the user specifies that they are looking for swimming pools then the pointer marks should point to them based on what the system's model vision has observed in context, enable the model to zoom into the image in the background if required for the resolution,
Example, locate solar panels in the satellite view, the system should look around, zoom, reason to identify these in the point of interest.
Can you improve the prompting for the system such that the pointers are contextual to what the system can pin point through its vision capabilities, for example if the user specifies that they are looking for swimming pools then the pointer marks should point to them based on what the system's model vision has observed in context, enable the model to zoom into the image in the background if required for the resolution,
Example, locate solar panels in the satellite view, the system should look around, zoom, reason to identify these in the point of interest.